Hugo Kornelis (1/26/2015)
Raghavendra Mudugal (1/26/2015)
Thank you for the post, Steve, very interesting one.
Row filters in transactional publications can add significant overhead because the article filter clause is evaluated for each log row written for a published table, to determine whether the row should be replicated. Row filters in transactional publications should be avoided if each replication node can support the full data load, and the overall data set is reasonably small.
any takers on the bold letters? (small as-in 100s?)... like replicating Master data (from the msdn example under static row filter) we add 100 products today and we replicate it and may be next day add another 50 or 75 in master and then replicate it...? if sales has millions of records added each day and then filtering on a specific region like it says.. "can add significant overhead". so performance wise is parameter row filter is better?
I''d say that the answer to your question is in the sentence before the bolded part: "if each replication node can support the full data load" (and though not explicitly mentioned, I would add the network to that).
If your network is saturated, or a subscriber does not have the storage space or processing power to handle the full load, then using a row filter is a great way to reduce the load, by filtering rows before they are published. But if your networks and subscribers can handle the full load without issues, then the recommendation here suggests not using a row filter, and instead just pushing all rows and then, optionally, delete or index the subscribers to prevent the unneeded extra rows from impacting their performance.
I, actually understood that; when there is a role of publisher and subscriber, definitely network comes into the consideration (apart from a case or a scenario where publisher itself acts as a subscriber, which I can't think of any); My question still remains unanswered.
".. can support full data load" along with that "and overall the data subset is reasonably small" - given this, even though a subscriber has enough storage space and processing power* - I CAN take all the US region data (say that a subscriber is scheduled or manually executed to pull the data every 48hours and lets say there are 300,000 new rows in Sales and 125,000 belongs to US) by applying static row filter from the sales table? (provided sufficient disk space and processing power, so I can forget about the performance slowness caused by overheads?)
* - processing power: cpu, ram, internet bandwidth, alternate power supply
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.