A faster way to parse QueryPlan XML???

  • Long story short, I'm in the process of doing some index cleanup. Getting rid of or merging redundant indexes and getting rid of indexes who cost/benefit doesn't justify their existence.

    As a part of this effort, I'm attempting to associate the various indexes to the query plans that use them by digging into the Query Store plans...

    So far, so good. I am able to parse the XML using XQuery and I'm finding the information I'm looking for. The problem is that it is SO SLOWWWWW! Especially on larger plans

    The other problem is that I simply don't know XQuery well enough to know if the slowness is my fault (the way I coded it) or if it's just and inherently slow process. Given that SQL Server can create plans in milliseconds and can parse the XML into a visual plan tree just as fast, I'm thinking that it's me.

    If any of you are XML gurus and can offer a more performant solution, your help would be greatly appreciated.

    Below is my current code:

    WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
    SELECT --TOP (100)
    database_id = DB_ID(),
    proc_object_id = qsq.object_id,
    qsp.query_id,
    qsp.plan_id,
    n.NodeId,
    table_object_id = OBJECT_ID(DB_NAME() + N'.' + n.table_name),
    index_id = i.index_id,
    n.PhysicalOp
    FROM
    AlignDev.sys.query_store_plan qsp WITH (NOLOCK)-- using nolock to prevent potential concurrency issues with system processes.
    JOIN AlignDev.sys.query_store_query qsq WITH (NOLOCK)
    ON qsp.query_id = qsq.query_id
    CROSS APPLY ( VALUES (TRY_CONVERT(xml, qsp.query_plan)) ) qpx (query_plan_xml)
    CROSS APPLY qpx.query_plan_xml.nodes('//RelOp') ro (x)
    CROSS APPLY ro.x.nodes('./IndexScan/Object') o (x)
    CROSS APPLY ( VALUES (
    ro.x.value(N'@NodeId', N'int'),
    ro.x.value(N'@PhysicalOp', N'varchar(100)'),
    CONCAT(TRIM(TRANSLATE(o.x.value(N'@Schema', N'sysname'), '[]', ' ')), N'.', TRIM(TRANSLATE(o.x.value(N'@Table', N'sysname'), '[]', ' ')) ),
    TRIM(TRANSLATE(o.x.value(N'@Index', N'sysname'), '[]', ' '))
    ) ) n (NodeId, PhysicalOp, table_name, index_name)
    LEFT JOIN AlignDev.sys.indexes i WITH (NOLOCK)-- use LEFT JOIN to show indexes that no longer exist in sys.indexes.
    ON n.index_name = i.name
    WHERE 1 = 1
    AND qsq.object_id > 0
    AND qsp.plan_id = 2452;
  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • Yeah, I am looking for the answer too. Just want to bump this thread once again.

    • This reply was modified 3 years, 9 months ago by  robcarr.
  • I think the faster way to parse XML is to do it outside of SQL.  In general, SQL is not "fast" at parsing XML data.  I think your interpretation of:

    Given that SQL Server can create plans in milliseconds and can parse the XML into a visual plan tree just as fast, I'm thinking that it's me.

    is a little off.  SQL Server creates the plan in the engine and stores the plan data as XML so it can be consumed by a DBA or a tool.  The visual plan tree is being processed by an application (SSMS for example), not by the SQL Server Engine which is why it can be parsed quite quickly.

    Something that MAY (ie SHOULD) be faster would be to pull the data into a custom C# application and parse it as you need it from there.

    Just because it CAN be done in SQL doesn't mean it always SHOULD be done in SQL, and in my experience XML is one of those things that is MUCH faster on the application side.

    Offhand, I do not know of any out-of-box tools that will parse this for you, so I imagine you would need to develop something in-house for it.  But C# has some libraries that can parse XML data fairly efficiently.

     

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Working with the XML outside of SQL Server (or even using a CLR) is little out of scope... at least for my current project.

    The fact is that I can use what I have and make it work. Each plan only needs to be parse once and done. The reason for the post was to simply see if I was missing something or making a mistake that would be obvious to someone who was better at working with XQuery.

     

  • I don't see anything obvious (although I am not an XML guru by any means).

    Have you thought about looking at the indexes and the data SQL stores on them rather than scraping through XML?  A good blog post on this is available here:

    https://www.sqlshack.com/gathering-sql-server-indexes-statistics-and-usage-information/

    with the section I'd poke at being:

    SELECT
    OBJECT_NAME([IX].[object_id]) AS [Table_Name]
    , [IX].[name] AS [Index_Name]
    , [IX].[type_desc] AS [Index_Type]
    , SUM([PS].[used_page_count]) * 8 AS [IndexSizeKB]
    , [IXUS].[user_seeks] AS [NumOfSeeks]
    , [IXUS].[user_scans] AS [NumOfScans]
    , [IXUS].[user_lookups] AS [NumOfLookups]
    , [IXUS].[user_updates] AS [NumOfUpdates]
    , [IXUS].[last_user_seek] AS [LastSeek]
    , [IXUS].[last_user_scan] AS [LastScan]
    , [IXUS].[last_user_lookup] AS [LastLookup]
    , [IXUS].[last_user_update] AS [LastUpdate]
    FROM[sys].[indexes]AS [IX]
    INNER JOIN [sys].[dm_db_index_usage_stats] AS [IXUS]
    ON [IXUS].[index_id] = [IX].[index_id]
    AND [IXUS].[object_id] = [IX].[object_id]
    INNER JOIN [sys].[dm_db_partition_stats]AS [PS]
    ON [PS].[object_id] = [IX].[object_id]
    WHEREOBJECTPROPERTY( [IX].[object_id]
    , 'IsUserTable'
    ) = 1
    GROUP BYOBJECT_NAME([IX].[object_id])
    , [IX].[name]
    , [IX].[type_desc]
    , [IXUS].[user_seeks]
    , [IXUS].[user_scans]
    , [IXUS].[user_lookups]
    , [IXUS].[user_updates]
    , [IXUS].[last_user_seek]
    , [IXUS].[last_user_scan]
    , [IXUS].[last_user_lookup]
    , [IXUS].[last_user_update];

    My thoughts with this is you can see when an index was last used and which ones appear to be unused.  Then you can use this query:

    SELECT
    [Tab].[name] AS [Table_Name]
    , [IX].[name] AS [Index_Name]
    , [IX].[type_desc] AS [Index_Type]
    , [Col].[name] AS [Index_Column_Name]
    , [IXC].[is_included_column] AS [Is_Included_Column]
    , [IX].[fill_factor]
    , [IX].[is_disabled]
    , [IX].[is_primary_key]
    , [IX].[is_unique]
    FROM[sys].[indexes] AS [IX]
    INNER JOIN [sys].[index_columns] AS [IXC]
    ON [IX].[object_id] = [IXC].[object_id]
    AND [IX].[index_id] = [IXC].[index_id]
    INNER JOIN [sys].[columns] AS [Col]
    ON [IX].[object_id] = [Col].[object_id]
    AND [IXC].[column_id] = [Col].[column_id]
    INNER JOIN [sys].[tables] AS [Tab]
    ON [IX].[object_id] = [Tab].[object_id];

    to find duplicate indexes.

    My thought is parsing through the XML is good as long as you are getting no new queries or having "old" plans being pushed off the end of query store.

    The other risk you may have (not sure if it applies to you or not) is if you have infrequently used queries such as a "year end" process that relies on specific indexes being in place.  You want to be careful when removing indexes or merging them.  Never know what side effects you may cause by doing that.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Nothing in the system views will tell me what specific plans are using which index.

    The goal is to get an idea of which stored procedures are likely to regress if/when I start disabling certain indexes.

    This is simply a small part of a larger effort. The bulk of my efforts (identifying which indexes need attention) are indeed making use of the standard system views.

    • This reply was modified 3 years, 9 months ago by  Jason A. Long.
  • Oh gotcha... Yep, that makes sense then.  You are more curious at the moment which queries may break by changing the index, not which indexes are in use and which are not.

    I think your approach is likely the best one, although time consuming.  Since it is a run once type thing (well, possibly a run twice - once before changes and once after to make sure your index seeks are still index seeks and your non-clustered index scans are still non-clustered index scans), makes sense to not bother with a C# app.

    That being said, it feels like a good tool to build for future DBA's... I may look into something like this.  Adding it to my ever expanding "to do" list - create an XML Plan to Excel parsing application.  Have it so you can copy-paste a single XML plan or point it to a SQL instance and it will query either query store OR the plan cache and dump the results to a data grid view with the option to export to excel and have it parse out for different "analytics" like "indexes used by query", "tables used by query", "poorly performing queries", "queries with bad estimates", etc.  And if I open source the whole thing, then others can take the tool and add their own templates.  Have a few "to do" projects like that that I plan on working on at some point.  Might never happen, but I can see how it would be beneficial.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Yes sir, you have the idea.

    Yes, I'm definitely looking for unused (and underused) indexes. I'm also looking at eliminating redundant indexes, indexes that have a first position key that matches the clustered index key and indexes that are similar enough that they can be combined.

    As far as this specific portion, It'll always be a "run once". If an index that's being used in a plan changes to the extent that it can no longer be used, SQL Server is forced to create an entirely new plan. Then it's just a matter of comparing the performance of the old vs new plans.

  • Wouldn't it be easier to take a snapshot of sys.dm_db_index_usage_stats from one day till the next and do a comparison?  The added benefit here is that it includes indexes from ALL the databases.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Wouldn't it be easier to take a snapshot of sys.dm_db_index_usage_stats from one day till the next and do a comparison?  The added benefit here is that it includes indexes from ALL the databases.

    Jeff, Forgive me if I'm misunderstanding your suggestion. In this particular instance, I'm less interested in knowing how often an index is being used than I am in knowing what specific query plans include it.

    It is fairly easy to look at a query (or or more accurately, it's query plan) and see what indexes are being used. I want to be able to come from the opposite direction... to look at an index and see what queries (query plans) are using it.

    Please let me know if I have indeed misunderstood...

     

  • Jason A. Long wrote:

    Jeff Moden wrote:

    Wouldn't it be easier to take a snapshot of sys.dm_db_index_usage_stats from one day till the next and do a comparison?  The added benefit here is that it includes indexes from ALL the databases.

    Jeff, Forgive me if I'm misunderstanding your suggestion. In this particular instance, I'm less interested in knowing how often an index is being used than I am in knowing what specific query plans include it.

    It is fairly easy to look at a query (or or more accurately, it's query plan) and see what indexes are being used. I want to be able to come from the opposite direction... to look at an index and see what queries (query plans) are using it.

    Please let me know if I have indeed misunderstood...

    Nope... it is I that have misunderstood.  I got it backwards from what you were looking for.  Still, you might want to look at the DMV first because the answer might be that nothing is using it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Nope... it is I that have misunderstood.  I got it backwards from what you were looking for.  Still, you might want to look at the DMV first because the answer might be that nothing is using it.

    I absolutely agree. This is not intended to be a replacement for traditional methods of identifying indexes "undesirable" indexes or indexes that need attention.

    It is simply an attempt to find out what query executions are generating the usage stats and getting an idea for which queries could potentially be impacted by a given index change.

     

     

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply