March 30, 2017 at 8:47 am
I have change tracking enabled on DB1 and need to update DB2 with the detected changes in the order that they were recorded by SQL Server (2012) change tracking.
I have change tracking enabled on all the DB1 tables necessary, and am familiar with getting the necessary data using CHANGETABLE.
Now I need to design a process to apply these changes to DB2 in the order in which they occurred.
So, for instance, on DB1 the data returned by querying CHANGETABLE for the Employee table might look like this:
Version -Operation-EmpId
1 - Insert - 1
2 - Delete - 2
3 - Delete - 1
So I would need to Insert EmpId1, then Delete EmpId2, then DeleteEmpId1 in version number order.
In a dataflow task, I can return an ordered result set. But after that, I don't see how to process the rows in order. Since SSIS uses parallel processing, a conditional split based upon the Operation wouldn't follow the version order.
So would I have to use a stored procedure to put loop through the rows, and do the I/U/D based upon the Operation value of each row?
Thanks!
March 30, 2017 at 9:28 am
If you must process the change tracking results in order by version ID, you would have to use a row-by-row solution such as the OLE DB Command or a while loop in T-SQL. By default, SSIS will process data a buffer at a time (not a row at a time), so you can't be certain the rows would be processed in the order you describe without row-by-row processing.
I'm curious - what business case requires that you process change tracking results in chronological order?
Tim Mitchell
TimMitchell.net | @tmitch.net | Tyleris.com
ETL Best Practices
March 30, 2017 at 10:00 am
Hi Tim,
Thanks for the reply.
This solution is meant to be a near-real-time update to our reporting database from our transaction database. Our current process runs overnight, so the reporting db is always one day old. This will help to keep our reporting database fresher.
So if this process runs, say, every 10 minutes, it's important that transaction order is maintained. Does that make sense? I could provide more details if you're interested.
(Big fan of yours, BTW. Keep up the great work!)
Steve
March 30, 2017 at 10:22 am
Hi Steve,
Thanks for the kind words!
It makes sense to use change tracking for what you describe, in capturing incremental changes for frequencies less than a day. However, since change tracking doesn't log individual changes to a row (it only tracks the delta), it would be difficult to make change tracking reliable in keeping the individual inserts/updates/deletes in order. It would be possible for a row to have been updated multiple times (or perhaps inserted and subsequently updated) in between load cycles, but change tracking would only show the most recent operation.
What I typically do with change tracking is to set a fixed beginning and ending version number to work with for a given load, and only process changes between those version numbers. I store the version ID in a separate table, which is used as a sort of bookmark to track the version number last loaded. I describe how I do this here. Although that article demonstrates using SSIS, the same principles would apply in a plain T-SQL load as well. Because this pattern is based on the last CT version processed by the ETL, it would work for any load frequency.
As an aside, I'm a big fan of using change tracking for change detection. In loads where the source is SQL Server, and the source can be modified to use change tracking, with few exceptions I'll go with CT over the other tools for change detection.
Tim
Tim Mitchell
TimMitchell.net | @tmitch.net | Tyleris.com
ETL Best Practices
March 30, 2017 at 10:39 am
That's exactly the process I employ: processing changes within a range. I modeled my implementation upon many of your suggestions in that article.
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply