• Mike C (4/13/2010)


    Nice article! I implemented something very similar for loading data warehouses and datamarts a while back. One thing to keep in mind is that you can actually use a BINARY(16) column to hold MD5 hashes or BINARY(20) for SHA-1 since they're guaranteed to return a binary hash with exact length of 128-bits or 160-bits respectively. Also HASHBYTES is restricted in what it can hash -- there's an 8,000 byte restriction on the input, so you can't hash LOB data or columns wider than 8,000 bytes with it effectively. You can work around that limitation by chopping up your input or by generating the hashes in your ETL tool of choice (SSIS script components allow direct access to the System.Security.Cryptography namespace, for example).

    Thanks for the positive feedback!

    The tricky thing about keeping the output as binary makes it hard on the eyes when pasted into one's article text for illustration. No, it doesn't have to be converted in actual use. But also not all MD5 functions return binary depending on the ETL tool. I used HASHBYTES to stay on topic with SQL Server.