QueryHash between Instances (2008 and 2014)

  • Hi,

    I was wondering if anyone has a reasonably definitive answer as to whether the QueryHash generated on one instance will be the same as the QueryHash generated on a different instance for the same query? I thought that the QueryHash is generated purely on the SQL text, so I would have thought that they would be the same, but I don’t think that they are.

    My scenario is this:

    We have a farm of servers for a given task and we are about to deploy another server to this farm. The aim is to migrate to 2014, so the plan is to deploy the new server as 2014 (all the others are 2008).

    During the test phase of the deployment, the new server will hold the same data and run queries in parallel to the existing servers. The gives us the opportunity to review the differences between 2008 and 20014 for the same queries being run.

    One part of the review was to gather execution plans from the plan cache on both servers and compare the plans for the same query run on both servers – I planned to use the QueryHash as a quick way to identify the same queries from both servers for the comparison.

    I managed to do a quick test before the server is deployed:

    I have a test database installed on two instances (one 2008 and one 2014). The database on the 2014 instance is a restore of the one on 2008.

    I ran a simple query on both instances and captured the plan (ie “SELECT * FROM TABLE_A”).

    I grabbed the QueryHash from both plans (ie one from each instance).

    I thought the QueryHash was based purely on the SQL text, so I would have thought that the two QueryHash values would be the same, but they aren’t.

    Does anyone know if the QueryHash values for the same query would be the same between 2008 and 2014, or if there are other factors that could affect the QueryHash that I haven’t taken into account (ie environmental options that are not explicit to the SQL text). Or is it simply that the way of generating the QueryHash has changed between versions of SQLServer.

    Incidentally, I have other ways of achieving the comparisons, so I’m only really interested in understanding the way the QueryHash value is generated and why it would be different.

    Thanks for any information that can shed some light on this.

  • I don't work for Microsoft, so I can't tell you anything about the internals behind the scenes. However, from what I can see, they have changed the hash formula. I have a generic query that runs against DMVs:

    SELECT TOP 10

    SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1,

    (CASE deqs.statement_end_offset

    WHEN -1 THEN DATALENGTH(dest.text)

    ELSE deqs.statement_end_offset

    - deqs.statement_start_offset

    END) / 2 + 1) AS querystatement,

    deqp.query_plan,

    deqs.query_hash,

    deqs.execution_count,

    deqs.last_elapsed_time,

    deqs.last_logical_reads,

    deqs.last_logical_writes,

    deqs.last_worker_time,

    deqs.max_elapsed_time,

    deqs.max_logical_reads,

    deqs.max_logical_writes,

    deqs.max_worker_time,

    deqs.total_elapsed_time,

    deqs.total_logical_reads,

    deqs.total_logical_writes,

    deqs.total_worker_time

    FROMsys.dm_exec_query_stats AS deqs

    CROSS APPLY sys.dm_exec_query_plan(deqs.plan_handle) AS deqp

    CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest

    ORDER BY deqs.total_elapsed_time DESC;

    It's nothing special, just a demo script that I chose because it had a moderate level of complexity, but, only hit columns and tables that would be largely identical between the two servers. I spun up two servers, both Enterprise, one 2008R2 and the other 2014. Here are the query hash values:

    0x40B8F425DDC3D692 --2014

    0x958B3949E630C004 --2008R2

    It's different, so something behind the scenes is dealing with these in a different manner. Since I did defaults all the way on both these, I don't think it's anything other than difference in the internals.

    For what it's worth, the execution plans were radically different, despite this just being DMVs.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The hash key is computed as (object_id * database_id) mod (hash table size). For plans that are associated with adhoc or prepared plans, the object_id is an internal hash of the batch text.

    https://msdn.microsoft.com/en-us/library/cc293624.aspx

    If thats the hash you're talking about, then it looks like something as simple as a change in the size of the table can do it. I would think that if a hash is just used as a lookup helper, its not that important that it match between database servers, because its an internal operation, it only means that the hash is the same for each hash on a value for that server so that it can quickly get to a value previously stored. A cryptographic hash like md5 and such are a different matter, they're meant for publication, portability and communication because their use case is different.

  • patrickmcginnis59 10839 (4/16/2015)


    The hash key is computed as (object_id * database_id) mod (hash table size). For plans that are associated with adhoc or prepared plans, the object_id is an internal hash of the batch text.

    https://msdn.microsoft.com/en-us/library/cc293624.aspx

    If thats the hash you're talking about, then it looks like something as simple as a change in the size of the table can do it. I would think that if a hash is just used as a lookup helper, its not that important that it match between database servers, because its an internal operation, it only means that the hash is the same for each hash on a value for that server so that it can quickly get to a value previously stored. A cryptographic hash like md5 and such are a different matter, they're meant for publication, portability and communication because their use case is different.

    That's for the query plan hash, not the query hash. However, if the query hash does use object_id values, then clearly it would change. Although, if you restore a database, I don't think you get new object IDs. That's a relatively easy test to set up although I don't have time at the moment. I'll try to get around to it a little later.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the replies.

    I've just done another quick test (just on one instance - 2008):

    Created a table TEST (checked it's object_ID)

    Executed a select against the table and got the query_hash (SELECT * FROM TEST)

    Dropped an recreated the table (checked that it has a different object_ID)

    Re-executed the select against the table and got the query_hash (SELECT * FROM TEST)

    In this case the query_hash was the same, so i don't think it relates to object_ids, etc.

    I then, for completeness, executed the select but with an alias, to change the text (SELECT * FROM TEST A)

    This did, as expected, generate a different query_hash.

    So it looks like the query_hash is generated on the text.

    Pure supposition, but it l could be that the algorithm has changed between 2008 and 2014, or the algorithm uses some other criteria other than just the text (ie something instance related)?

  • And I was just coming back to check on this and set up my test. Thanks for posting that. It's useful to know. Curious, did you look at the plan hash between those two runs? Did it change? Or do I need to go set up my own tests? Ha!

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Your earlier comments nudged me in the right direction. 🙂

    Yes, checked the QueryPlanHash and they were the same.

    I’ve also managed to get access to a few more instances and I did a similar test (must spend my lunch breaks more wisely!). So, the test run was, as before:

    Create a table (QUERY_TEST)

    Run a simple select (SELECT * FROM QUERY_TEST) and get the plan and gather the QueryHash and the QueryPlanHash.

    Results:

    SERVER_A - SQLServer 2008 (SP3) - Developer Edition (same result on two databases on the same instance)

    QueryHash="0x1CB4A4D879E486A0" QueryPlanHash="0x032C551AF57C9A05"

    SERVER_B - SQLServer2008 (SP1) – Standard Edition

    QueryHash="0x1CB4A4D879E486A0" QueryPlanHash="0x032C551AF57C9A05"

    SERVER_C - SQLServer2008 R2 (SP2) – Standard Edition

    QueryHash="0x1CB4A4D879E486A0" QueryPlanHash="0x032C551AF57C9A05"

    SERVER_D - SQLServer2012 (SP1) – Enterprise Edition

    QueryHash="0x5E01B9F1C349F61F" QueryPlanHash="0x70C960B55AEE3336"

    SERVER_E - SQLServer2014 – Developer Edition

    QueryHash="0x7860F65DB6FA5A35" QueryPlanHash="0x12BBF379FB2571E2"

    From this experience it seems that the QueryHash change is related to SQL versions rather than instance or server differences. It’s also interesting that the QueryPlanHash follows the same pattern.

    So, is the QueryHash (and QueryPlanHash) algorithm changing between SQL versions (it appears that way) ?

  • My assumption, based on both our tests, yes, it's changing between versions. Not that much of a surprise. Every version has a new optimizer. Why wouldn't there be tweaks to the hash process? It is inconvenient though. Your initial thought of using it to track performance issues between versions was great. It just won't work unfortunately.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Yes, that makes sense. As you say, unfortunately it’s a bit inconvenient – but luckily I have some other options to pursue in this case. 🙂

    Thanks for your help and support on this, much appreciated.

  • Grant Fritchey (4/16/2015)


    patrickmcginnis59 10839 (4/16/2015)


    The hash key is computed as (object_id * database_id) mod (hash table size). For plans that are associated with adhoc or prepared plans, the object_id is an internal hash of the batch text.

    https://msdn.microsoft.com/en-us/library/cc293624.aspx

    If thats the hash you're talking about, then it looks like something as simple as a change in the size of the table can do it. I would think that if a hash is just used as a lookup helper, its not that important that it match between database servers, because its an internal operation, it only means that the hash is the same for each hash on a value for that server so that it can quickly get to a value previously stored. A cryptographic hash like md5 and such are a different matter, they're meant for publication, portability and communication because their use case is different.

    That's for the query plan hash, not the query hash. However, if the query hash does use object_id values, then clearly it would change. Although, if you restore a database, I don't think you get new object IDs. That's a relatively easy test to set up although I don't have time at the moment. I'll try to get around to it a little later.

    Gotcha! The full quote (that I chopped off a bit):

    Each plan cache store contains a hash table to keep track of all the plans in that particular store. Each bucket in the hash table contains zero, one, or more cached plans. When determining which bucket to use, SQL Server uses a very straightforward hash algorithm. The hash key is computed as (object_id * database_id) mod (hash table size). For plans that are associated with adhoc or prepared plans, the object_id is an internal hash of the batch text. The DMV sys.dm_os_memory_cache_hash_tables contains information about each hash table, including its size. You can query this view to retrieve the number of buckets for each of the plan cache stores using the following query:

    So in that case, they do mention that the object_id isn't really the table object which makes sense. I think the full page is worth a read (linked again here https://msdn.microsoft.com/en-us/library/cc293624.aspx)

    but really in my opinion its sort of a moot point, again, hashing in general is used for a few things, cryptographic hashes for instance are really designed to give us a digest of a text, and they're pretty compute intensive and designed to avoid collisions (although I know in a few systems I've seen they are actually used as keys, but when lookup speed is long enough compared to building the digest that its not a terrible thing).

    In this case, the hash really needs to be much faster and in that case, its only going to point to a particular "bucket" and that means that collisions are expected, and furthermore this means that its not meant to be a comparison, its meant to point to a much smaller subset of the search space that you THEN ABSOLUTELY do comparisons on the resulting subsets to ABSOLUTELY then assure you have the match.

    In other words, a hash MISMATCH only means you don't have to compare the keys, its only when you get a hash MATCH when you compare the keys, but the bottom line is YOU STILL HAVE TO COMPARE THE KEYS.

    Given the simplicity of the algorithms that Microsoft published on the page, I think its a mistake to use the keys for comparisons, pull the query text and compare them instead. These hashes are only indexes to buckets.

    However, this is just a hack explanation from some random uneducated joe on the internet who's only qualified to run a convenience store cash register, and I'm only really practicing my typing as the motivation of this post. The possibility that I may or may not have coded a hash lookup in the past is completely moot. By all means, use the hash if you want.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply