Home Forums SQL Server 2005 Business Intelligence Best Practice for loading DW tables from live source with timestamp/Rowversion column to identify changes RE: Best Practice for loading DW tables from live source with timestamp/Rowversion column to identify changes

  • sotn (5/2/2013)


    I want to build a DW and I need to identify records that have changed, and luckily each table in our ERP system has a timestamp/rowversion column, but I cannot see how to easily and efficiently compare the current rowversion value to one I could save to a staging table.

    I was planning a staging table to contain the unique ID of the table and the rowversion value which will allow me using the unique ID to find records not in staging but it is the edits I am having trouble with, as the rwovdersion needs to handled differently.

    What is the normal best practice for identifying changed records via a rowversion column rather than checking the values of say the 20 fields I am interested in, or it is best practice to not refer to rowversion at all and instead have something like this.

    insert into staging(f1,f2,f3,f4,f5)

    select live.f1, live.f2, live.f3, live.f4, live.5

    from live

    inner join staging on live.uniqueid = staging.uniqueID

    AND (live.f1 <> staging.f1 OR live.f2 <> staging.f2 OR live.f3 <> staging.f3 OR live.f4 <> staging.f4 OR live.f5 <> staging.f5)

    There must be an efficient way to copy the rowversion value to the staging table and do an efficient join query, I would have thought.

    insert into staging(f1,f2,f3,f4,f5)

    select live.f1, live.f2, live.f3, live.f4, live.5

    from live

    inner join staging on live.uniqueid = staging.uniqueID AND live.timestamp <> staging.timestamp

    Thanks in advance.

    I've always thought using a rowversion column to manage data extracts for a DW is a real pain. For the 8 bytes of storage required by the rowversion type, you can use a datetime or datetime2 column to record the date a row is modified. This requires code (either within every procedure that updates the table or a trigger) to handle the update of the datetime/datetime2 column, but an AFTER UPDATE trigger that just sets a single column = getdate() creates only a very minimal amount of overhead. Extracts to the DW become much easier - create a table to log the starting datetime of each extract that completes successfully, and for the next extract, look for modifications that occurred after the most recent starting datetime in the log table. An index on the modification datetime column makes this process run very efficiently.

    Jason Wolfkill