• Hi,

    I've found the SHA-256 algorithm meets my needs so far in ETL loads.

    I use a C# script task within SSIS to compare hash values in the Dimension tables, which obviously by their nature tend to be smaller than Fact tables.

    Using the cryptography namespace in C# gets around some of the limitations of the T-SQL Hashbytes function specifically with data types and NULL handling.

    The trick for Fact tables is to implement Change Tracking at source (less overhead than CDC) which can then be used for net data extraction based on primary keys detected by the "changetable" function. So the incremental loads are tiny (provided the gap between data pulls is reasonably small) compared to "initial" load. Annoyingly, in the source system I work with, we have Updates, Inserts and Deletes on production database transactional tables so these have to be handled by ETL process. Nevertheless, with the appropriate SSIS tasks we can achieve good performance and main thing is the update process on Fact table is still a batch process rather than a row-based process. The T-SQL Merge statement is really powerful for comparing rows as an alternative to hashing algorithms. So far I'm using hashbytes and provided there is no loss of synchronicity via the Change Tracking mechanism, this can also be used for row comparisons Fact as well as Dimension.