Change Detection with (44) Columns!

  • I am mirroring about (23) tables in another DW. I have built my first incremental loads package, for the first table, and it successfully checks for new rows, updated rows and deletes missing rows. I was very excited. 🙂

    But now I have moved on to the next table and it has (44) columns!!! Which means that I have to check for change in the 2 million rows of 44 columns! I know that this is going to be a big strain of the server resources and probably take some time but even worse, I have to build the most ridiculously long expression in my Conditional Split task to check every single source row's values against every single destination row's values (eg. column1 != LkUp_column1 || column2 != LkUp_column2.... column44 != LkUp_column44).

    There has to be a better way to do this? I read about using a hash. Is this relatively easy to implement or do I really have to write the War-and-Peace expression in the Conditional Split?

  • Jerid421 (11/21/2016)


    I am mirroring about (23) tables in another DW. I have built my first incremental loads package, for the first table, and it successfully checks for new rows, updated rows and deletes missing rows. I was very excited. 🙂

    But now I have moved on to the next table and it has (44) columns!!! Which means that I have to check for change in the 2 million rows of 44 columns! I know that this is going to be a big strain of the server resources and probably take some time but even worse, I have to build the most ridiculously long expression in my Conditional Split task to check every single source row's values against every single destination row's values (eg. column1 != LkUp_column1 || column2 != LkUp_column2.... column44 != LkUp_column44).

    There has to be a better way to do this? I read about using a hash. Is this relatively easy to implement or do I really have to write the War-and-Peace expression in the Conditional Split?

    A better way to do this is to add and maintain a date modified column to your source data. Use this to drives INSERTs and UPDATEs (some additional work would be needed to handle deletes).

    Or you could consider implementing Change Tracking.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Why not just use EXCEPT?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/21/2016)


    Why not just use EXCEPT?

    EXCEPT on a 2-million-row resultset is tidier than a column-by-column comparison, but still not to be recommended, IMO.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (11/21/2016)


    Jeff Moden (11/21/2016)


    Why not just use EXCEPT?

    EXCEPT on a 2-million-row resultset is tidier than a column-by-column comparison, but still not to be recommended, IMO.

    Now you have me curious... Why not recommended? What's wrong with it compared to some other method?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I would consider adding a date modified column but the whole point of copying these tables is because I don't have admin. rights over them.... read-only. So, I am sucking them into my own DW so that I can integrate them into my own data model with some other tables.

    I downloaded a Check Sum Task (third-party) today and used it in my package to detect the change. I think that it might have worked. I just need the source data to "change" tonight so that I can run the package and see what the results of the package are.

    I've never heard of EXCEPT. I'll have to look into it tomorrow.

  • Jeff Moden (11/21/2016)


    Phil Parkin (11/21/2016)


    Jeff Moden (11/21/2016)


    Why not just use EXCEPT?

    EXCEPT on a 2-million-row resultset is tidier than a column-by-column comparison, but still not to be recommended, IMO.

    Now you have me curious... Why not recommended? What's wrong with it compared to some other method?

    It's the amount of data, not the comparison method, which concerns me. And presumably it's only going to get worse over time.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • If you are prepared to accept the occasional false positive, the checksum method is not so bad.

    The Except method is, however, more robust.

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

  • Phil Parkin (11/21/2016)


    Jeff Moden (11/21/2016)


    Phil Parkin (11/21/2016)


    Jeff Moden (11/21/2016)


    Why not just use EXCEPT?

    EXCEPT on a 2-million-row resultset is tidier than a column-by-column comparison, but still not to be recommended, IMO.

    Now you have me curious... Why not recommended? What's wrong with it compared to some other method?

    It's the amount of data, not the comparison method, which concerns me. And presumably it's only going to get worse over time.

    Ah. Understood. Time for me to do some experiments with larger tables in this area. The largest table I ever did this with was something like 40 or 50 columns wide and only a million or so rows. Thanks for bringing it up, Phil.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (11/22/2016)


    Phil Parkin (11/21/2016)


    Jeff Moden (11/21/2016)


    Phil Parkin (11/21/2016)


    Jeff Moden (11/21/2016)


    Why not just use EXCEPT?

    EXCEPT on a 2-million-row resultset is tidier than a column-by-column comparison, but still not to be recommended, IMO.

    Now you have me curious... Why not recommended? What's wrong with it compared to some other method?

    It's the amount of data, not the comparison method, which concerns me. And presumably it's only going to get worse over time.

    Ah. Understood. Time for me to do some experiments with larger tables in this area. The largest table I ever did this with was something like 40 or 50 columns wide and only a million or so rows. Thanks for bringing it up, Phil.

    I'd be interested in seeing your results Jeff. And thank you Phil for the link to Change Tracking. Since here we are still on an older version of SQL Server I had not come upon this one.

    ----------------------------------------------------

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply