Data Distribution - Part 2

Question

Data Distribution - Part 2

Leo Peysakhovich

Hall of Fame

Points: 3973
More actions
October 13, 2010 at 8:01 pm

#388002

Comments posted to this topic are about the item Data Distribution - Part 2

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

clementhuge SSC-Addicted Points: 475 More actions · Answer 1

Hello,

The article is interesting but I believe we have to avoid triggers at all price for performance and maintainability reason. I usually put a timestamp or datetime column like ModificationDate on each table I like to extract data and avoid deletion at all price as well.

No matter what and even if you have triggers, you will not have continuous flow, it is not possible.

Your job if running will not be triggered even though you send an alert to trigger it.

Overall you stay with an asynchroneous mecanism (replication is somewhat asynchroneous as well as it triggers a log to read by the disztribution and apply the changes anyway).

With my previous work, I would rather set up a datetime column and extract the data storing the extract date at the local server to filter what has to be changed. This way you can manage the frequency and you do not touch the OLTP performance of your most important server, the one upfront.

If you even want to improve it, you can add a tinyint column referring the operation (insert, update or delete). This way you know what to do when you extract.

Clement

LP-181697 SSC Eights! Points: 966 More actions · Answer 2

clementhuge (10/14/2010)
Hello,
The article is interesting but I believe we have to avoid triggers at all price for performance and maintainability reason. I usually put a timestamp or datetime column like ModificationDate on each table I like to extract data and avoid deletion at all price as well.
No matter what and even if you have triggers, you will not have continuous flow, it is not possible.
Your job if running will not be triggered even though you send an alert to trigger it.
Overall you stay with an asynchroneous mecanism (replication is somewhat asynchroneous as well as it triggers a log to read by the disztribution and apply the changes anyway).
With my previous work, I would rather set up a datetime column and extract the data storing the extract date at the local server to filter what has to be changed. This way you can manage the frequency and you do not touch the OLTP performance of your most important server, the one upfront.
If you even want to improve it, you can add a tinyint column referring the operation (insert, update or delete). This way you know what to do when you extract.

I think that this is misconcept that "we have to avoid triggers at all price for performance and maintainability reason". We should avoid lengthy triggers and avoid place business logic in the trigger. This article is talking about very simple triggers which are helping the processes and do not posting the performance issues. We have many databases and tables with millions of rows. And this mechanism very robast. For 3 years it syncronizing data from one server to 10-12 servers. And those servers running 10,000 clients at pick time. We are not experiencing any performance hits nor any other issues for 3 years.

alen teplitsky SSC-Dedicated Points: 30011 More actions · Answer 3

i think that if you're adding triggers to try to keep track of every single change you have problems, security problems. it's a huge performance hit to have triggers on every table and for those of us with databases on SAN's it's a lot of money for the extra storage.

if you have people making changes they shouldn't be then you need to go to management to make security changes. we used to be all SQL logins and people would always share passwords so there was no control. we've changed to mostly windows authentication now and it's a lot easier to control things

LP-181697 SSC Eights! Points: 966 More actions · Answer 4

alen teplitsky (10/14/2010)
i think that if you're adding triggers to try to keep track of every single change you have problems, security problems. it's a huge performance hit to have triggers on every table and for those of us with databases on SAN's it's a lot of money for the extra storage.
if you have people making changes they shouldn't be then you need to go to management to make security changes. we used to be all SQL logins and people would always share passwords so there was no control. we've changed to mostly windows authentication now and it's a lot easier to control things

It all depence from the level of the changes and the data you need to have in real time. For example, if you need only demographic data like address, phone, and email to be real time then it will be resonable to do such a process. And if you need many tables then this approach may not work. If the rate of changes is low it may works for the big databases. If the rate of changes if extenceve then it may not work at all. This article shows one of the methodologies. And if it used wrong way it will create harm as anything else. Whith non properly defined indexes performance can be killed. But no one complaining that indexes are bad idea or will not be working. As always, you need to do impact analysis and understand what and how to use.

Mike Dougherty-384281 SSCrazy Points: 2794 More actions · Answer 5

...And if it used wrong way it will create harm as anything else. Whith non properly defined indexes performance can be killed. But no one complaining that indexes are bad idea or will not be working. As always, you need to do impact analysis and understand what and how to use.

Seconded.

Self-referential best practice suggestion: Unconditional application of a rule is always wrong.

alen teplitsky SSC-Dedicated Points: 30011 More actions · Answer 6

i have a db that i designed to hold data from security logs from servers. security was a big concern for me. it's not the most glam solution but what i did was i created staging tables to temporarily hold the data coming in via ETL. there is a sql login that has write access to these tables with the password in plain text on 30 some servers if not more. every hour or so there is a process to move data to the real tables from which the reports run. these real tables are locked down to the point where only a few logins even have read access. most of the data is presented via email.

it doesn't work for everyone, but for ETL instead of keeping track of the data changes what about staging the data first and then inserting it into tables with more security on them to negate the need for auditing?

LP-181697 SSC Eights! Points: 966 More actions · Answer 7

It is always load zone for the data. Article is not suggested to enter data directly in trigger from server to server. It telling that there is job on each server that will be started on the real time notification. How you organize the job and the level of protection is defined by the implementation. It can be very simple or very complex validation layer before data is synchronized. But remember, it synchronized in real time (1-3 seconds.

clementhuge SSC-Addicted Points: 475 More actions · Answer 8

I do understand that everything handled properly will not alter too much performance but I do not think the use of triggers is a good practice. I prefer ETL/asynchroneous processes to handle the extract via a Datetime column on the table you like to extract.

I disagree with architecturing around triggers, that is all.

Clement

Misha_SQL SSCertifiable Points: 5407 More actions · Answer 9

Thank you for the article, Leo! I found it interesting. A couple of questions please. Could you clarify what is behind this line:

raiserror (51080, 10, 1); -- Alert that will start job

I am just curious about possible implementations. I have been starting jobs from T-SQL code using sp_start_job procedure, but is there a way to actually trigger it from the alert?

Secondly, I would like to better understand the mechanism for transferring changed records. Here is my understanding. We have a control table on the Destination server and, based on this table, we know the ID's of the records we need to bring in (where Processed_FLG is 'N'). We then wrap these ID's into an XML and pass this XML to the stored procedures on the Source server. When these procedures run, they grab necessary records and somehow push them to the Destination (via linked server or how?). Please let me know if this is correct.

Thank you!

LP-181697 SSC Eights! Points: 966 More actions · Answer 10

mishaluba (10/15/2010)
Thank you for the article, Leo! I found it interesting. A couple of questions please. Could you clarify what is behind this line:
raiserror (51080, 10, 1); -- Alert that will start job
I am just curious about possible implementations. I have been starting jobs from T-SQL code using sp_start_job procedure, but is there a way to actually trigger it from the alert?
Secondly, I would like to better understand the mechanism for transferring changed records. Here is my understanding. We have a control table on the Destination server and, based on this table, we know the ID's of the records we need to bring in (where Processed_FLG is 'N'). We then wrap these ID's into an XML and pass this XML to the stored procedures on the Source server. When these procedures run, they grab necessary records and somehow push them to the Destination (via linked server or how?). Please let me know if this is correct.
Thank you!

raiserror (51080, 10, 1); - when you registering alert you associate it with the error message. I created custom message 51080. It means when this error is raised the alert is fired by sql server. Then alert can be associated with a job. E.g. when alert is raised it fired the job. But then job runs completely independently from the transaction that was fired the raiserror command.

Idea of the process is simple: control table that is keeping changes is on the source. When job fired it transferring knowledge about changes to the destination server where it written in similar control table with status not processed. And then it processing changes. There are many variations how to organize the process. It can be done by using centralized hub if you have to distribute changes to many servers or it can be done directly server to server if you need it only for one server.

Patrick2525 Mr or Mrs. 500 Points: 547 More actions · Answer 11

Sorry for joining this conversation late...

Is there something missing in the code when you are setting the XML variable:

declare @xmlvar1 xml;

declare @emp table (emp_id int primary key);

set @xmlvar1 = '

1

2

'

insert into @emp(emp_id)

select T1.lst.value('.','int') as emp_id

FROM @xmlvar1.nodes('/emp/id') T1(lst);

select * from @emp;

It doesn't look like XML, and when I try to run this I'm not getting the expected results.

Thanks!