RE: QueryHash between Instances (2008 and 2014)

SSC-Insane

Points: 23660

April 16, 2015 at 9:33 am

Grant Fritchey (4/16/2015)

patrickmcginnis59 10839 (4/16/2015)
The hash key is computed as (object_id * database_id) mod (hash table size). For plans that are associated with adhoc or prepared plans, the object_id is an internal hash of the batch text.
https://msdn.microsoft.com/en-us/library/cc293624.aspx
If thats the hash you're talking about, then it looks like something as simple as a change in the size of the table can do it. I would think that if a hash is just used as a lookup helper, its not that important that it match between database servers, because its an internal operation, it only means that the hash is the same for each hash on a value for that server so that it can quickly get to a value previously stored. A cryptographic hash like md5 and such are a different matter, they're meant for publication, portability and communication because their use case is different.
That's for the query plan hash, not the query hash. However, if the query hash does use object_id values, then clearly it would change. Although, if you restore a database, I don't think you get new object IDs. That's a relatively easy test to set up although I don't have time at the moment. I'll try to get around to it a little later.

Gotcha! The full quote (that I chopped off a bit):

Each plan cache store contains a hash table to keep track of all the plans in that particular store. Each bucket in the hash table contains zero, one, or more cached plans. When determining which bucket to use, SQL Server uses a very straightforward hash algorithm. The hash key is computed as (object_id * database_id) mod (hash table size). For plans that are associated with adhoc or prepared plans, the object_id is an internal hash of the batch text. The DMV sys.dm_os_memory_cache_hash_tables contains information about each hash table, including its size. You can query this view to retrieve the number of buckets for each of the plan cache stores using the following query:

So in that case, they do mention that the object_id isn't really the table object which makes sense. I think the full page is worth a read (linked again here https://msdn.microsoft.com/en-us/library/cc293624.aspx)

but really in my opinion its sort of a moot point, again, hashing in general is used for a few things, cryptographic hashes for instance are really designed to give us a digest of a text, and they're pretty compute intensive and designed to avoid collisions (although I know in a few systems I've seen they are actually used as keys, but when lookup speed is long enough compared to building the digest that its not a terrible thing).

In this case, the hash really needs to be much faster and in that case, its only going to point to a particular "bucket" and that means that collisions are expected, and furthermore this means that its not meant to be a comparison, its meant to point to a much smaller subset of the search space that you THEN ABSOLUTELY do comparisons on the resulting subsets to ABSOLUTELY then assure you have the match.

In other words, a hash MISMATCH only means you don't have to compare the keys, its only when you get a hash MATCH when you compare the keys, but the bottom line is YOU STILL HAVE TO COMPARE THE KEYS.

Given the simplicity of the algorithms that Microsoft published on the page, I think its a mistake to use the keys for comparisons, pull the query text and compare them instead. These hashes are only indexes to buckets.

However, this is just a hack explanation from some random uneducated joe on the internet who's only qualified to run a convenience store cash register, and I'm only really practicing my typing as the motivation of this post. The possibility that I may or may not have coded a hash lookup in the past is completely moot. By all means, use the hash if you want.