Change Tracking: SSIS vs stored procedures - discussion

  • We have a nightly process that drops and re-creates reporting tables. I'm looking to SQL Server (2012) change tracking (not Change Data Capture) to improve performance and maintainability.

    I'm looking at two solutions: Stored procedures, or stored procedures and SSIS. Given that there are no native SSIS tasks for change tracking (other than the CDC task), I'm not sure what advantages SSIS offers over pure stored procedures.

    Any SSIS solution will necessarily have SQL to perform version checks, UPDATEs, etc. And using SSIS Conditional Split to separate DELETEs from INSERTs/UPDATEs is easily handled in t-SQL.

    So I'm just raising the question and asking how others might be implementing change tracking, and what you have experienced.

    Thanks and regards,

    Steve

  • Hi Steve,

    I'm a big fan of change tracking in SQL Server for change detection. When I integrate change tracking into ETL, I usually do it within SSIS. Using SSIS to run the load, even if all you are doing is calling stored procedures, makes easier the process of logging, error handling, and precedence management.

    Last year I wrote a post about using change tracking with SSIS here: https://www.timmitchell.net/post/2016/01/22/using-change-tracking-in-ssis/[/url]

    hth,

    Tim

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

  • Thank you, Tim. I'm reading your posts, and they are very helpful. I am building a proof of concept now.

  • Hello Tim,

    Can you tell me why you use a start and an end version ID? Since the version ID is database-wide, it seems like we can get the delta changes by getting all the changes in a table that are >= current version ID, and we can get that current version ID with CHANGE_TRACKING_CURRENT_VERSION(). Any changes to individual tables would have a higher value than that returned by CHANGE_TRACKING_CURRENT_VERSION(), so the start version ID for that table isn't required.

    Am I missing something?

    Thanks and regards,

    Steve

  • Hi Steve,

    I use the start and end point so there are clearly defined boundaries for each load. Without that end point ID, it is possible that a row in the tracked table could be inserted or updated while the ETL operation is in progress that could be inadvertently skipped the next time the ETL is triggered. Using those defined start and end points adds a layer of insurance to make sure we don't miss any data.

    Tim

    Tim Mitchell, Microsoft Data Platform MVP
    Data Warehouse and ETL Consultant
    TimMitchell.net | @Tim_Mitchell | Tyleris.com
    ETL Best Practices

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply