http://www.sqlservercentral.com/blogs/steve_jones/2009/05/18/sql-server-encryption-hashing-limitations/

Printed 2014/07/23 10:46AM

SQL Server Encryption - Hashing Limitations

By Steve Jones, 2009/05/18

As I'm digging more into SQL Server 2008 encryption I'm learning some interesting things about how SQL Server handles encryption. Recently I was doing some research on the HASHBYTES function, which performs a one-way hash on data. This is useful in terms of validating secure passwords or determining is some data has changed. You do this by comparing the hash values, without necessarily needing to compare the actual data.

As I was reading about this function and researching the SHA-1 algorithm (one of a couple that can be used), I was surprised to find out that while the SHA-1 algorithm can handle 2^64 - 1 bytes of data, and SQL Server's text 2^32-1 bytes, this function is limited to 8,000 bytes of data.

I could understand back when we had 2k page sizes, or even the first evolution of 8k pages that not all functions would handle more text data, but isn't time in SQL Server 2008 that all functions that can operate on text data can handle more than 8k worth of data?

It seems silly these days, with more and more data being stored in SQL Server, and larger and larger sizes, wouldn't there be the need for functions to work with more than 8k of data?

Now HASHBYTES works on varbinary data, so character data gets converted (or needs to be), but there are definitely places where you might want to use this with binary data as well. Consider someone storing audio files in SQL Server, which definitely might exceed 8kb. Imagine that the end of a  recording is changed, something beyond the 8kb point. If two hashes were compared, they would be the same since this function truncates the data at 8kb.

What if someone wanted to use hashing to audit this data for changes?

It's time that SQL Server remove the 8k limits for data throughout the product. I'm hoping that SQL 11, or SQL 12 for sure, will do this. There are ways around this now (work in 8k chunks), but that's cumbersome and unnecessary. If we can do it with a UDF and a series of calls, can't the engine do it easier?

In the meantime, if you use hashing functions, be aware of this limitation.


Copyright © 2002-2014 Simple Talk Publishing. All Rights Reserved. Privacy Policy. Terms of Use. Report Abuse.