T-SQL Tuesday #26 - Second Chances - Correlate Missing Index with Related Query

T-SQL Tuesday 26 Everybody deserves a second chance in love, life, and in TSQL as well! Now that it's the beginning of a New Year, 2012, it's the perfect time to start over again, and make up for the things that you missed, should the opportunity present itself. That's why I am blogging right now, because I love the topic of this T-SQL Tuesday #26, aptly themed Second Chances! It's been a great title for a book, for a movie, a song, a rehabilitation program, and indeed a great title and theme for TSQL Tuesday! This second chance, is being brought to us this month, January 2012, by SQL Server BI Specialist, Dave Howard (blog|twitter).

Dave kicks off this T-SQL Tuesday #26, with a New Year's invitation to participate:

"Every year new year brings with it a fresh start, a second chance to do things differently. In the spirit of second chances, for this month’s T-SQL Tuesday, please pick any one of the previous 25 T-SQL Tuesday topics and write about it. You can choose one of the topics you participated in before, or you can choose one you missed for some reason… Continue reading ?"

Thank you, sir! I truly appreciate this! And, why? Because, I really, really, wanted to participate in T-SQL #25, Tips & Tricks, hosted by Allen White (blog|twitter). Unfortunately, I was quite busy and pre-occupied then, and by the time I got notice of it, was already that Tuesday. I even opened a new word document to get started, but just didn't have the time to finish! So, that is why I am choosing to bring you some wonderful Tips & Tricks for SQL here!

Often at a client, I am called in to do an analysis on indexes and make recommendations on improving performance. Used, Unused, Missing, Duplicates, Fragmented, etc. - I have all these neat little scripts. Making suggestions on what indexes should be added to a table, has become somewhat easier with the Missing Index DMVs (sys.dm_db_missing_index_details , sys.dm_db_missing_index_groups, sys.dm_db_missing_index_group_stats, and sys.dm_db_missing_index_columns.

Even so, they do have their limitations - see Limitations of the Missing Index Features. For example, it does not suggest filtered indexes, or even clustered indexes. I am not going to go into the virtue of using or not using the missing index feature. It definately is a good start and can be quite helpful. But, caution, you must be aware that it is not a 100% solution. Perhaps use DTA, or traces as a supplement in your analysis.

I have searched high and wide for this solution, and even tried to hack together a comprehensive t-sql statement that would somehow join the sys.dm_db_missing_index_details and the sys.dm_exec_query_plan together, so I can correlate the missing index suggestions with the queries involved in those suggestions. But, I'm still working on it, and there probably isn't a clean way (if at all) , but it does involve joining an XML element to one of the regular t-sql columns. If anyone is feeling up to the challenge, then please go ahead, and post back here. I will take a stab at it here, and post my t-sql solution that blends both the use of T-SQL and XQuery.

In the meanwhile, I will show you what I have in terms of my research, existing scripts and solutions out there, and share them right here.

In one article I found by SQL MVP Grant Fritchey, (blog|twitter) who knows his way around SQL Server Execution Plans, addressed the issue by trying to come up with a query to correlate missing index information with query statistics. He concluded that there was no way to combine the data from the missing index DMVs, to the query stats and execution plan DMVs. But, that didn't stop me from trying :-). Grant correctly stated that there was no common plan handle or plan hash column to join them. (Strike one!) Therefore, his solution was to use - {insert twilight zone theme music here} dun, dun, dun, - an XQuery. Sunglasses on please. That query, originally found in his article here: Missing Index Information and Query Stats, is listed below.

SELECT top 10 qs.execution_count,
    SUBSTRING(qt.text,qs.statement_start_offset/2 +1,
                 (CASE WHEN qs.statement_end_offset = -1
                       THEN LEN(CONVERT(nvarchar(max), qt.text)) * 2
                       ELSE qs.statement_end_offset end -
                            qs.statement_start_offset
                 )/2
             ) AS query_text,
     qt.dbid, dbname= DB_NAME (qt.dbid), qt.objectid
     FROM sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
   CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS p
WHERE  p.query_plan.exist(
'declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan//MissingIndexes') = 1
and qt.dbid=DB_ID() -- I added this to get the data for the current db
ORDER BY qs.total_elapsed_time DESC

Note: I added additional columns to show the dbid, database name, and objectid. I also included "and qt.dbid=DB_ID()" in the WHERE clause to filter the data for the current database context.

However, if you run the above query, although it does identify the queries that have missing indexes, it doesn't go as far as showing us the columns that would benefit from index creation, the included/equality/inequality columns, or the impact. These are the things you would expect to see from your typical Missing Index DMV query. Of course, you can return the XML query plan and examine it manually (which requires removing the sunglasses, and putting on the magnifiers)

One of the variations of Missing Index Scripts I use for collecting this information, and find most useful, as it includes a template "Create Index" statement, based on the underlying tables and columns, is the one found on the site SQLServerPlanet, posted by author Derek Dieter.

SELECT  sys.objects.object_id, sys.objects.name
, (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) AS Impact
,  'CREATE NONCLUSTERED INDEX ix_IndexName ON ' + sys.objects.name COLLATE DATABASE_DEFAULT + ' ( ' + IsNull(mid.equality_columns, '') + CASE WHEN mid.inequality_columns IS NULL
                THEN ''
    ELSE CASE WHEN mid.equality_columns IS NULL
                    THEN ''
        ELSE ',' END + mid.inequality_columns END + ' ) ' + CASE WHEN mid.included_columns IS NULL
                THEN ''
    ELSE 'INCLUDE (' + mid.included_columns + ')' END + ';' AS CreateIndexStatement
, mid.equality_columns
, mid.inequality_columns
, mid.included_columns
    FROM sys.dm_db_missing_index_group_stats AS migs
            INNER JOIN sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle
            INNER JOIN sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle AND mid.database_id = DB_ID()
            INNER JOIN sys.objects WITH (nolock) ON mid.OBJECT_ID = sys.objects.OBJECT_ID
    WHERE     (migs.group_handle IN
        (
        SELECT     TOP (500) group_handle
            FROM          sys.dm_db_missing_index_group_stats WITH (nolock)
            ORDER BY (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) DESC))
        AND OBJECTPROPERTY(sys.objects.OBJECT_ID, 'isusertable')=1
    ORDER BY 3 DESC

As my original intention, searching for a commonality, was to somehow derive and join the objectid, along my journey I discovered that the object id returned from the Query Stat DMVs was not the object id returned from the Missing Index DMVs. The former references the object which is the Stored Procedure (type='P') , from where the query text is derived, and the latter references the object of the underlying User Table (type='U') for the columns to be used to suggest how the indexes be created. (Strike two!)

Continuing down the road to nowhere, or so I thought, I discovered a magnificent script by SQL MVP/MCM, and master of the XQuery, Jonathan Kehayias (blog|twitter). In his own R&D, his quest for finding plans that included missing index information, resulted in a massive X-Query, that returned ALL the desired information, by basically shredding the ShowPlan XML, and extracting all the pertinent data by each element in the XML. This is quite a feat, and certainly worth sharing here: Digging into the SQL plan cache: Finding Missing Indexes. This script is a must have, in your DBA Performance Tuning Toolkit!

As you can see the Xquery results, it shows us the SQL Query, and all the related missing index information correlating both:

One note on this XQuery, is that it seems to take some time, because there are several subqueries. and lots of CROSS APPLYs and OUTER APPLYs here. So be careful about running this against a production database. You can also filter the results in this script for the current database by adding this line to the where clause:

AND DB_ID(REPLACE(REPLACE(n.value('(//MissingIndex/@Database)[1]', 'VARCHAR(128)'),'[',''),']',''))=DB_ID()

Finally, I decided to take the best of ALL worlds, and attempt to come up with a hybrid script, that uses regular t-sql and xquery, essentially joining all the Query/Plan Stat DMVs to the Missing Index DMVs, on the objectid from sys.dm_db_missing_index_details, and the OBJECT_ID derived from the shredded XML. (Base hit to first) Concatenating these elements; the database, schema and table, and passing it into the function Object_Id (), we are able to join all the DMVs and extract the information we're looking for.

So, here is my first draft hybrid script. (I call it a Y-Query - because if it doesn't work, I wonder why I bothered creating it 😉

WITH XMLNAMESPACES
   (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT  qs.execution_count,
    SUBSTRING(qt.text,qs.statement_start_offset/2 +1,
                 (CASE WHEN qs.statement_end_offset = -1
                       THEN LEN(CONVERT(nvarchar(max), qt.text)) * 2
                       ELSE qs.statement_end_offset end -
                            qs.statement_start_offset
                 )/2
             ) AS query_text,
     qt.dbid, dbname= DB_NAME (qt.dbid), qt.objectid
     ,OBJECT_ID(n.value('(//MissingIndex/@Database)[1]', 'VARCHAR(128)') + '.' +
           n.value('(//MissingIndex/@Schema)[1]', 'VARCHAR(128)') + '.' +
           n.value('(//MissingIndex/@Table)[1]', 'VARCHAR(128)')) AS [OBJECT_ID]
           ,object_name( mid.OBJECT_ID) AS [TABLE_NAME]
           , (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) AS Impact,
           'CREATE INDEX [missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_'
 + CONVERT (varchar, mid.index_handle)
  + '_' + LEFT (PARSENAME(mid.statement, 1), 32) + ']'
  + ' ON ' + mid.statement
  + ' (' + ISNULL (mid.equality_columns,'')
    + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
    + ISNULL (mid.inequality_columns, '')    + ')'
  + ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement
, mid.equality_columns
, mid.inequality_columns
, mid.included_columns
     FROM sys.dm_db_missing_index_group_stats AS migs
            INNER JOIN sys.dm_db_missing_index_groups AS mig ON migs.group_handle = mig.index_group_handle
            INNER JOIN sys.dm_db_missing_index_details AS mid ON mig.index_handle = mid.index_handle
AND mid.database_id = DB_ID()
         --   INNER JOIN sys.objects WITH (nolock) ON mid.OBJECT_ID = qt.objectid ,
    , sys.dm_exec_query_stats AS qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
   CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS p --where qt.dbid=DB_ID()
  CROSS APPLY query_plan.nodes('//StmtSimple') AS q(n)
WHERE n.exist('QueryPlan/MissingIndexes') = 1
/*WHERE  p.query_plan.exist('declare default element namespace "http://schemas.microsoft.com/sqlserver/2004/07/showplan";
/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple/QueryPlan//MissingIndexes') = 1 */
and mid.OBJECT_ID=OBJECT_ID(n.value('(//MissingIndex/@Database)[1]', 'VARCHAR(128)') + '.' +
           n.value('(//MissingIndex/@Schema)[1]', 'VARCHAR(128)') + '.' +
           n.value('(//MissingIndex/@Table)[1]', 'VARCHAR(128)'))
and qt.dbid=DB_ID()
ORDER BY (avg_total_user_cost * avg_user_impact) * (user_seeks + user_scans) DESC

However, I did run it, and it does seem to return everything I need in short order, including the sql query, the missing index information, and the "Create Index" suggested statement. Again, all the above scripts should be used as a baseline, and not create every index suggested. You ideally want to evaluate each index and weigh the performance benefit vs the costs.

I hope the this article is useful. I know that in highlighting various solutions with respect to Missing Indexes and related Queries, I'm confident you all will benefit (Home Run!)

David Howard, thanks for hosting this edition of T-SQL Tuesday #26, and for the second chance. [Actually, this is my third chance, since I thought today's t-sql tuesday was last week - Mike Fal via twitter, set me straight :-)]

Follow me on Twitter @PearlKnows, join me on LinkedIN, or find me at http://www.pearlknows.com

T-SQL Tuesday #26 - Second Chances - Correlate Missing Index with Related Query

Rate

Share

Share

Rate

T-SQL Tuesday #26 - Second Chances - Correlate Missing Index with Related Query

Rate

Share

Share

Rate

Related content

Book Review: Big Red - Voyage of a Trident Submarine

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

Inserting Markup into a String with SQL

Networking - Part 4

Speaking at Community Events - More Thoughts