Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Scaling Out the Distribution Database


Scaling Out the Distribution Database

Author
Message
Kendal Van Dyke
Kendal Van Dyke
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 983
RML51, the transaction retention period is used by the distribution cleanup agent to mark subscriptions as inactive if they are further behind than the retention period and to remove transactions that are older than the specified period. How long commands and transactions are retained after they've been delivered to all subscribers depends on the immediate_sync setting for the publication. If TRUE, transactions will be retained until they are older than the retention period; if FALSE they will be removed by the distribution cleanup agent the next time it runs after the transaction has been delivered to all subscribers.

Toby, I agree with you - David did a great job explaining the how but not the why. The way I see it, you're doing the same amount of IO whether or not you've got one distribution DB or ten. By setting up multiple distribution DBs you may even be adding IO overhead by making your disk heads jump around from one DB to another to read\write transactions and commands. Maybe I can see the logic in creating multiple distribution DBs if you are putting them on separate physical drives...but if you've got those drives available I'd just keep on distribution DB and spread it across as many spindles as you can to begin with.

When it comes to replication the idea is to keep the distribution database lean and mean. If you've got distribution DB growth issues I'd look at identifying the root cause first. Make sure immedite_sync is set to FALSE and that your distribution cleanup agent is running.

Chris Skorlinski from the Microsoft SQL Server Replication Support Team has two great articles about distribution DB growth that I recommend reading before considering the more invasive approach of creating multiple distribution DBs:
How to resolve when Distribution Database is growing huge (+25gig)
How Replication setting Immediate_sync may cause Transactional Replication Distribution database growth

FWIW I asked Chris about the author's suggestion and he indicated that he's never had a case where he found it necessary to create multiple distribution DBs to alleviate performance issues.

Kendal Van Dyke
http://kendalvandyke.blogspot.com/
Kendal Van Dyke
Kendal Van Dyke
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 983
David.Poole (3/30/2010)
On one particular publication we changed the retention period to 3 hours because if something goes wrong with the subscriber then the transactions start building up in the distributor and if that comes under stress the publisher log files start to bloat until disk space on the production box is threatened.


David,
I'm curious - for those publications is immediate_sync set to TRUE? Or is it set to FALSE and you're just writing off any subscriber that falls further than 3 hours behind?

Kendal Van Dyke
http://kendalvandyke.blogspot.com/
John.Sansom
John.Sansom
SSC-Addicted
SSC-Addicted (431 reputation)SSC-Addicted (431 reputation)SSC-Addicted (431 reputation)SSC-Addicted (431 reputation)SSC-Addicted (431 reputation)SSC-Addicted (431 reputation)SSC-Addicted (431 reputation)SSC-Addicted (431 reputation)

Group: General Forum Members
Points: 431 Visits: 1558

FWIW I asked Chris about the author's suggestion and he indicated that he's never had a case where he found it necessary to create multiple distribution DBs to alleviate performance issues.


Great discussion!

For what it's worth, I have seen scenarios where multiple Distribution databases provide a performance improvement.

For example, let’s say we have two Publications that form part of an application platform. One Publication is sourced from a very intensive OLTP database and another Publication has relatively moderate activity. Given that both Publications share the same Distributor database, the OLTP Publication could be responsible for 80% of the overall activity to the Distributor, thereby adversely affecting/constraining the performance of any other Publications that share the same Distributor.

By providing a resource intensive Publication its own dedicated Distributor database, you can isolate the performance from other Publications/Applications.

Who knew Replication could be such fun :-)


John Sansom (@sqlBrit) | www.johnsansom.com
Kendal Van Dyke
Kendal Van Dyke
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 983
For example, let’s say we have two Publications that form part of an application platform. One Publication is sourced from a very intensive OLTP database and another Publication has relatively moderate activity. Given that both Publications share the same Distributor database, the OLTP Publication could be responsible for 80% of the overall activity to the Distributor, thereby adversely affecting/constraining the performance of any other Publications that share the same Distributor.


Were the distribution databases on separate physical drives? I could see why that would help, but if they're on the same set of spindles I'm still not convinced that if your indexes and statistics are in good shape that having multiple databases offers a significant performance improvement. (I'll buy that it may offer a marginal improvement in some very specific circumstances)

Kendal Van Dyke
http://kendalvandyke.blogspot.com/
SQLRNNR
SQLRNNR
SSC-Insane
SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)SSC-Insane (21K reputation)

Group: General Forum Members
Points: 21075 Visits: 18259
Great article David.



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server, MVP


SQL RNNR

Posting Performance Based Questions - Gail Shaw

alen teplitsky
alen teplitsky
SSCommitted
SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)SSCommitted (1.6K reputation)

Group: General Forum Members
Points: 1595 Visits: 4621
Kendal Van Dyke (3/30/2010)
David,
Can you elaborate on the following statement you made:
the number of records in the MSRepl_transactions table is likely to be very high and once it has gone beyond a certain size

and provide guidance on what that "certain size" is and how someone knows if they've exceeded it?

Also, do you have any suggestions for other steps that someone should take to optimize their existing distribution database before deciding that it's time to create a new one?

Kendal Van Dyke
@SQLDBA


we're having this problem. first thing is we're looking to upgrade to 32GB of RAM soon on the distributor. we're at 14GB now and seeing memory pressure sometimes.

otherwise best thing is to buy a lot of disk and split everything up on it's own RAID5 volumes. with HP servers, the Proliant G6 supports up to 16 drives in a server
David.Poole
David.Poole
Hall of Fame
Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)

Group: General Forum Members
Points: 3674 Visits: 3114
If you have everything hammering through one database you are talking about one transaction table having to deal with all transactions. Ditto the commands table.

Try reindexing a very large table. TEMPDB will be thrashed as well, if nothing else. I did find an article (which I can't find now I need it) that showed that indexes are less effective when they get more than 'n' levels deep.

Yes you can have the different distribution databases on different drives with obvious benefits but separate databases have at least got separate files. It's a myth that separate files get separate threads but you are dealing with far smaller files and each distribution database has fewer records to deal with.

We implemented the policy of distribution database per publisher policy and most of our replication woes just went away.

If you reindex a table then the table is locked briefly. What do you want to do, reindex a huge table locking it up for a relatively long length of time or a small table locking it briefly?

LinkedIn Profile

Newbie on www.simple-talk.com
M&M
M&M
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2595 Visits: 3898
David,

Thanks for this wonderful article. Please share your comments.
1) Could we do this on SQL server 2000
2) Is there a way by which we can track at which point the distribution database is stuck and is unable to send the transactions to subscriber?

M&M
Kendal Van Dyke
Kendal Van Dyke
Old Hand
Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)Old Hand (370 reputation)

Group: General Forum Members
Points: 370 Visits: 983
David.Poole (3/30/2010)
If you have everything hammering through one database you are talking about one transaction table having to deal with all transactions. Ditto the commands table.


But if you have two, three, four, etc. distribution databases that doesn't change the number of commands and transactions that have to be written. The IO is still the same.

Yes you can have the different distribution databases on different drives with obvious benefits but separate databases have at least got separate files. It's a myth that separate files get separate threads but you are dealing with far smaller files and each distribution database has fewer records to deal with.


Yes, that is a myth. The CSS team has a great blog post about it here.

I think the discussion has more merit if we put some metrics out for comparison. My busiest distributor is a PowerEdge 2950 running SQL 2008 standard edition (x86) on Windows 2003 (x86) with 6 local drives (2 in RAID 1 for OS and transaction logs and the other 4 in RAID 10 (with 64KB partition offset, 64 KB block size, and 64 BK stripe size) for data) and 4 GB of RAM. It's responsible for 82 publications, 240 subscriptions, and based on distribution agent history for the last 48 hours it's pushing through just shy of 80 million commands per day. All distribution agents run continuously and there are rarely any latency issues that need to be dealt with.

How do those numbers and hardware compare to your distributor? (Please don't take this the wrong way - it's not a "prove it" tone I'm taking here...I'm just trying to understand how our environments compare to each other so I can better understand your reasoning for using multiple distribution DBs)

Kendal Van Dyke
http://kendalvandyke.blogspot.com/
David.Poole
David.Poole
Hall of Fame
Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)Hall of Fame (3.7K reputation)

Group: General Forum Members
Points: 3674 Visits: 3114
No offence taken.

We have four production distributors two of which are virtual servers.

We put roughly 24 billion transactions through them each day. We have spiked up to 370 billion transactions but this was exceptional.

On the subject of whether this works with SQL2000 distributors the short answer is "I don't know". Some of of publishers are SQL2000 but as the distributors have to be the highest version of SQL that are acting as publishers none of our distributors are SQL2000.

LinkedIn Profile

Newbie on www.simple-talk.com
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search