Replication Filters

Question

Post reply

Replication Filters

Steve Jones - SSC Editor

SSC Guru

Points: 738043
More actions
January 24, 2015 at 11:41 am

#74390

Comments posted to this topic are about the item Replication Filters

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Koen Verbeeck SSC Guru Points: 259155 More actions · Answer 1

Interesting question, thanks Steve.

Definately learned something.

Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP

Raghavendra Mudugal SSChampion Points: 10658 More actions · Answer 2

Thank you for the post, Steve, very interesting one.

Row filters in transactional publications can add significant overhead because the article filter clause is evaluated for each log row written for a published table, to determine whether the row should be replicated. Row filters in transactional publications should be avoided if each replication node can support the full data load, and the overall data set is reasonably small.

any takers on the bold letters? (small as-in 100s?)... like replicating Master data (from the msdn example under static row filter) we add 100 products today and we replicate it and may be next day add another 50 or 75 in master and then replicate it...? if sales has millions of records added each day and then filtering on a specific region like it says.. "can add significant overhead". so performance wise is parameter row filter is better?

ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

TomThomson SSC Guru Points: 104773 More actions · Answer 3

Nice straightforward question.

Half the people who have replied so far think they can use a column filter to select rows, which is a bit startling; and more than four fifths of the answers so far are wrong, which is even more so.

Tom

Neil Burton SSC-Insane Points: 23065 More actions · Answer 4

TomThomson (1/26/2015)
Nice straightforward question.
Half the people who have replied so far think they can use a column filter to select rows, which is a bit startling; and more than four fifths of the answers so far are wrong, which is even more so.

I totally mis-interpreted both the question and something on BOL (pre-coffee) and as soon as the answer came up incorrect I knew why!!

On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
—Charles Babbage, Passages from the Life of a Philosopher

How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537

mhynds SSC-Addicted Points: 452 More actions · Answer 5

The article mentioned seems to indicate that you could also use column filtering (although why you'd want to on such a small table is beyond me) but reading the question asks: what options do I have. Not which one would be the most appropriate.

Welcome to test writing 101.

doug.davidson SSCommitted Points: 1662 More actions · Answer 6

mhynds (1/26/2015)
The article mentioned seems to indicate that you could also use column filtering (although why you'd want to on such a small table is beyond me) but reading the question asks: what options do I have. Not which one would be the most appropriate.
Welcome to test writing 101.

And further if you interpret the issue in the following way, a column filter is appropriate.

The replication store only contains a subset for the US sales region, thus, maintaining the SalesRegion column becomes a redundancy at that point and the column should be filter out.

Andre Ranieri SSCrazy Points: 2819 More actions · Answer 7

Andre Ranieri

SSCrazy

Points: 2819

January 26, 2015 at 12:11 pm

#1773204

Tricky Tricky

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 8

Raghavendra Mudugal (1/26/2015)
Thank you for the post, Steve, very interesting one.
Row filters in transactional publications can add significant overhead because the article filter clause is evaluated for each log row written for a published table, to determine whether the row should be replicated. Row filters in transactional publications should be avoided if each replication node can support the full data load, and the overall data set is reasonably small.
any takers on the bold letters? (small as-in 100s?)... like replicating Master data (from the msdn example under static row filter) we add 100 products today and we replicate it and may be next day add another 50 or 75 in master and then replicate it...? if sales has millions of records added each day and then filtering on a specific region like it says.. "can add significant overhead". so performance wise is parameter row filter is better?

I''d say that the answer to your question is in the sentence before the bolded part: "if each replication node can support the full data load" (and though not explicitly mentioned, I would add the network to that).

If your network is saturated, or a subscriber does not have the storage space or processing power to handle the full load, then using a row filter is a great way to reduce the load, by filtering rows before they are published. But if your networks and subscribers can handle the full load without issues, then the recommendation here suggests not using a row filter, and instead just pushing all rows and then, optionally, delete or index the subscribers to prevent the unneeded extra rows from impacting their performance.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Andy Warren SSC Guru Points: 119902 More actions · Answer 9

I've never fully explored it, but it will let you use an IN in the where clause. I think the behavior is that it gets evaluated for the snapshot ok, but I don't think it picks up changes made after that.

Andy
Connect with me on LinkedIn

Raghavendra Mudugal SSChampion Points: 10658 More actions · Answer 10

Hugo Kornelis (1/26/2015)
Raghavendra Mudugal (1/26/2015)
Thank you for the post, Steve, very interesting one.
Row filters in transactional publications can add significant overhead because the article filter clause is evaluated for each log row written for a published table, to determine whether the row should be replicated. Row filters in transactional publications should be avoided if each replication node can support the full data load, and the overall data set is reasonably small.
any takers on the bold letters? (small as-in 100s?)... like replicating Master data (from the msdn example under static row filter) we add 100 products today and we replicate it and may be next day add another 50 or 75 in master and then replicate it...? if sales has millions of records added each day and then filtering on a specific region like it says.. "can add significant overhead". so performance wise is parameter row filter is better?
I''d say that the answer to your question is in the sentence before the bolded part: "if each replication node can support the full data load" (and though not explicitly mentioned, I would add the network to that).
If your network is saturated, or a subscriber does not have the storage space or processing power to handle the full load, then using a row filter is a great way to reduce the load, by filtering rows before they are published. But if your networks and subscribers can handle the full load without issues, then the recommendation here suggests not using a row filter, and instead just pushing all rows and then, optionally, delete or index the subscribers to prevent the unneeded extra rows from impacting their performance.

I, actually understood that; when there is a role of publisher and subscriber, definitely network comes into the consideration (apart from a case or a scenario where publisher itself acts as a subscriber, which I can't think of any); My question still remains unanswered.

".. can support full data load" along with that "and overall the data subset is reasonably small" - given this, even though a subscriber has enough storage space and processing power* - I CAN take all the US region data (say that a subscriber is scheduled or manually executed to pull the data every 48hours and lets say there are 300,000 new rows in Sales and 125,000 belongs to US) by applying static row filter from the sales table? (provided sufficient disk space and processing power, so I can forget about the performance slowness caused by overheads?)

* - processing power: cpu, ram, internet bandwidth, alternate power supply

ww; Raghu
--
The first and the hardest SQL statement I have wrote- "select * from customers" - and I was happy and felt smart.

Hany Helmy SSChampion Points: 13921 More actions · Answer 11

Thanx, learned somthin new 2day.

Thanks & Best Regards,
Hany Helmy
SQL Server Database Consultant

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 12

akljfhnlaflkj

SSC Guru

Points: 76202

January 27, 2015 at 6:33 am

#1773366

Thanks for the question.

Replication Filters

Cookies on SQLServerCentral