• huishi.ca (3/19/2012)


    awesome article , however, I have a different scenario. What about the source and target tables are really large?

    close to 1 billion records? I am not sure if it's realistic to go through each record using script component to

    calcuate the HASH value. Also in lookup component, just full cache on even two columns of the target tables would pretty

    much suck most of the memory as it contains over 1 billions records.

    Any comments on large tables?

    thanks

    Hui

    For extremely large tables it might make more sense to implement change capturing in the source system to limit the number of records you have to work with in the package. If that's not an option this method will only really work with an SSIS box that has a ton of memory. You can modify the caching in the lookup as well to take pressure of the memory.

    Also I'd highly recommend using SSIS Multiple Hash instead of this method, it's much easier to incorporate into your ETL infrastructure than a custom script.