Understanding Peer-to-Peer Transactional Replication, Part 1

  • RickNPHX

    Old Hand

    Points: 381

    Sounds like a nightmare headache to me. I don't get what the advantage would be, or what kind of applications you would need multiple subscriptions on multiple servers for. I imagine you could not just create your PPK be unique across the network, and mimic the unique record number. In our system we have a simple set up where one server acts as publisher, and replicated server keeps the subscription updated constantly in conformance with the OLTP publisher. I must be TREMENDOUSLY lucky to be in my shop.

  • SQLBlimp

    SSCertifiable

    Points: 6325

    RickNPHX (11/5/2015)


    Sounds like a nightmare headache to me. I don't get what the advantage would be, or what kind of applications you would need multiple subscriptions on multiple servers for. I imagine you could not just create your PPK be unique across the network, and mimic the unique record number. In our system we have a simple set up where one server acts as publisher, and replicated server keeps the subscription updated constantly in conformance with the OLTP publisher. I must be TREMENDOUSLY lucky to be in my shop.

    Hi. The advantages come in a scale-out situation. With classic transactional replication as you describe (which I also have in my shop in numerous places), the subscriber is in essence read-only; all update transactions must originate in OLTP. If you modify data in the subscriber, that could break the publication. Classic transactional is good to populate read-only subscribers for ad-hoc queries or ETL. When you have one canonical data set and need the bandwidth of a scale-out situation, Peer-to-Peer gives you flexibility because you can update at all nodes. I've seen that in more than one shop, and where the app is not sturdy enough to handle the extra GUID column that merge replication imposes on every participating table.

    I hope that I've explained the applicable situation.

    Thanks.

    John.

  • RickNPHX

    Old Hand

    Points: 381

    May I ask, what kind of application does this support in your case? I can understand easy enough that the GUID involved in the merge technique causes undesirable overhead that will drag on the system. I'm understanding that many clients are updating the database(s), but I'm fuzzy on why the clients need to also act as servers. Sorry I didn't quite get that.

  • SQLBlimp

    SSCertifiable

    Points: 6325

    RickNPHX (11/5/2015)


    May I ask, what kind of application does this support in your case? I can understand easy enough that the GUID involved in the merge technique causes undesirable overhead that will drag on the system. I'm understanding that many clients are updating the database(s), but I'm fuzzy on why the clients need to also act as servers. Sorry I didn't quite get that.

    Hi -

    First, let's clarify a little terminology. By Servers, I assume you mean a replication publisher; the source of the data, and by Client, I assume you mean the replication subscriber, the target of the data.

    The thing with Peer-to-Peer, all of the nodes are both subscriber and publisher, meaning that an update to any will propagate to all.

    It is difficult to insert images into these boards... but here is an idea of classical transactional.

    +---------------+

    | Publisher | [R/W]

    +---------------+

    |

    |

    v

    +----------------+

    | Subscriber | [R/O - at least for replicated articles]

    +----------------+

    Now here is an idea of the operation of peer-to-peer:

    +----------------+ +----------------+

    | Peer1 |<----->| Peer2 |

    +----------------+ >+----------------+

    ^ /

    | /

    v / All Connections are R/W

    +----------------+ /

    | Peer3 |</

    +----------------+

    Does this help make sense? If you are scaling out, then Peer-to-Peer offers you a trigger-free, extra-column-free (really important for many packaged products) and the

    ability to load-balance data updates.

    Thanks

    John

  • samot-dwarf

    SSC Eights!

    Points: 984

    I agree to the article beside the part about the clustering key and Fillfactor. The author argues, that he sets the fillfactor to 90 to prevent page splits. This would make only sense, if he uses GUIDs, which are used in the merge replication, but not in peer-to-peer.

    Lets assume, we have server1 which uses the IDs 1 to 1 Mio and server2 which uses 1,000,001 to 2 mio. Both servers have used the half of their range and you rebuild the index. Now you have a single page with ID 500,000 (from server 1) followed by ID 1,000,001 from server 2.

    When server 1 now inserts a new record, it needs a pagesplit, which would (ideally) place the 1,000,001 on a new page, so that 500,001 could be inserted on the now free space in the half-empty page. When you insert more records on the server 1, they would all land on this page too and when it is full, it would create a new page. Inserts on Server 2 would never result in a page split, since it occurs always on the latest record.

    And when you would use partitioning (on the ID column with using the id 1 mio as split value), you could prevent the whole pagesplitt stuff per design.

    Another point: how often do you insert records compared to reads? In the most tables the inserts are single row inserts at the end of a page and mostly "user driven" where a single minor page split does not make any noticable differe to the response time.

    On the other hand a fill factor of 90 % means, that 10 % of your table / database are empty - on pages, where you NEVER insert any record (when you are using an sequential integer as clustering key). So you have to read / buffer 10 % more pages, when you do a select / update / delete (and even some inserts because of FKs), 10 % more backup time / restore time / space on the disk etc.

    If the table is read only (as logs) or the columns are all fixed size (not nullable int / datetime etc.), you can use 100 %, when the size of a record can grow because of user changes (e.g. changing the name to a longer one, adding a comment, or setting a NULL column to a value), I tend to use 98 or 99 as fillfactor (since the most data are static particularly when they are no longer "hot")

  • Jeff Moden

    SSC Guru

    Points: 995652

    samot-dwarf - Sunday, December 23, 2018 10:37 AM

    I agree to the article beside the part about the clustering key and Fillfactor. The author argues, that he sets the fillfactor to 90 to prevent page splits. This would make only sense, if he uses GUIDs, which are used in the merge replication, but not in peer-to-peer.

    Lets assume, we have server1 which uses the IDs 1 to 1 Mio and server2 which uses 1,000,001 to 2 mio. Both servers have used the half of their range and you rebuild the index. Now you have a single page with ID 500,000 (from server 1) followed by ID 1,000,001 from server 2.

    When server 1 now inserts a new record, it needs a pagesplit, which would (ideally) place the 1,000,001 on a new page, so that 500,001 could be inserted on the now free space in the half-empty page. When you insert more records on the server 1, they would all land on this page too and when it is full, it would create a new page. Inserts on Server 2 would never result in a page split, since it occurs always on the latest record.

    And when you would use partitioning (on the ID column with using the id 1 mio as split value), you could prevent the whole pagesplitt stuff per design.

    Another point: how often do you insert records compared to reads? In the most tables the inserts are single row inserts at the end of a page and mostly "user driven" where a single minor page split does not make any noticable differe to the response time.

    On the other hand a fill factor of 90 % means, that 10 % of your table / database are empty - on pages, where you NEVER insert any record (when you are using an sequential integer as clustering key). So you have to read / buffer 10 % more pages, when you do a select / update / delete (and even some inserts because of FKs), 10 % more backup time / restore time / space on the disk etc.

    If the table is read only (as logs) or the columns are all fixed size (not nullable int / datetime etc.), you can use 100 %, when the size of a record can grow because of user changes (e.g. changing the name to a longer one, adding a comment, or setting a NULL column to a value), I tend to use 98 or 99 as fillfactor (since the most data are static particularly when they are no longer "hot")

    Do you do any "ExpAnsive" updates (a VARCHAR() or NVARCHAR()  goes from NULL to something or becomes wider) or "out of order" inserts on the "hot" section of data in your 98/99% Fill Factor indexes?  If so, Fill Factor isn't going to matter much (except for evenly distributed indexes such as random GUIDs, like you said).  You will have some pretty nasty logical and physical (page density) problems.  If the index is truly and "Append Only" index, then Fill Factor will also not matter because INSERTs don't take Fill Factor into consideration and will simply fill the pages to their max depending on the size of the rows you're inserting.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply