Someone was asking if the HASHBYTES function was a good one to use in T-SQL as far as performance goes.. I wish I had a good reference for the function, but the best one I had on SQLServerCentral was this piece on using it to load a data warehouse. I also wrote an editorial on it not working with strings beyond 8k, which seems to be a bug, or a lack of resources devoted to ensuring string functions work with varchar(max).
The HASHBYTES function returns a hash of an input string. A hash is essential a calculation based on the values of the input, and two inputs that are the same, ought to produce the same hash. One catch with this function is that you provide the algorithm used, which can be one of these:
Each of these produces different output, returning a varbinary(max) value. As an example, suppose I hash “Steve Jones”
SELECT HASHBYTES('MD2', 'Steve Jones') 'MD2' UNION SELECT HASHBYTES('MD4', 'Steve Jones') 'MD4' UNION SELECT HASHBYTES('MD5', 'Steve Jones') 'MD5' UNION SELECT HASHBYTES('SHA', 'Steve Jones') 'SHA' UNION SELECT HASHBYTES('SHA', 'Steve Jones') 'SHA1'
The results look like this:
As you can see, there’s a different hash for the same value using different algorithms. However if I were to compare the same string to itself, I can easily tell if something has changed. If the hashes aren’t the same, there’s a difference. I’m not sure this is a great use, but the more obvious use is that I can hash a password and then have the user enter their own version, hash it, and compare the results. In this way, the system never needs to know the value.
Just make sure you use the same algorithm
Filed under: Blog Tagged: security, syndicated, T-SQL