ReportServer.Calalog.Content = Very Slow/Expensive XML query

  • First thing first, here is the query that I'm using...

    WITH ContentToVarBin AS (

    SELECT

    c.Name,

    CAST(c.Content AS VARBINARY(MAX)) AS Content

    FROM

    ReportServer.dbo.Catalog c

    WHERE

    [Type] = 2

    ), ContentToXML AS (--The second CTE strips off the BOM (Byte Order Mark) if it exists...

    SELECT

    cvb.Name,

    CAST(CASE

    WHEN LEFT(cvb.Content, 3) = 0xEFBBBF

    THEN STUFF(cvb.Content, 1, 3, '')

    ELSE cvb.Content

    END AS XML) AS ContentXML

    FROM

    ContentToVarBin cvb

    )

    SELECT

    ctx.Name,

    DataSet.value('(./*:DataSet/@Name)[1]','nvarchar(max)') AS DataSetName,

    ISNULL(Query.value('(./*:CommandType/text())[1]','nvarchar(1024)'),'TextQuery') AS CommandType,

    Query.value('(./*:CommandText/text())[1]','nvarchar(max)') AS CommandText

    FROM ContentToXML ctx

    --Get all the Query elements (The "*:" ignores any xml namespaces)

    CROSS APPLY ctx.ContentXML.nodes('//*:Query') Queries(Query)

    CROSS APPLY ctx.ContentXML.nodes('//*:DataSets') DataSet(DataSet)

    The query is based on code that I found here[/url]...

    The basic idea is to query out all of data set names & command text (proc names) from all of the reports in the catalog, so that we have an accurate idea for which procs are being used in which reports...

    The query above pulls the information exactly the way I want it to but the performance leaves a lot to be desired and execution plan... Well let's just say it's expensive.

    The problem is that I'm not a fan of anything XML when it comes to SQL... Consequently, I'm not as proficient at writing XML based queries as I probably could/should be. I was hoping that someone may have some performance related advise to offer.

    The basic stats...

    - The Catalog table has 1,473 "Type = 2" rows (Reports).

    - The Content column, for those rows is 1.2 gigs.

    - Total rows returned for the query is 4,161

    - Query execution time = 5:20 Mins.

    Thanks in advance,

    Jason

  • Jason A. Long (5/28/2015)


    First thing first, here is the query that I'm using...

    WITH ContentToVarBin AS (

    SELECT

    c.Name,

    CAST(c.Content AS VARBINARY(MAX)) AS Content

    FROM

    ReportServer.dbo.Catalog c

    WHERE

    [Type] = 2

    ), ContentToXML AS (--The second CTE strips off the BOM (Byte Order Mark) if it exists...

    SELECT

    cvb.Name,

    CAST(CASE

    WHEN LEFT(cvb.Content, 3) = 0xEFBBBF

    THEN STUFF(cvb.Content, 1, 3, '')

    ELSE cvb.Content

    END AS XML) AS ContentXML

    FROM

    ContentToVarBin cvb

    )

    SELECT

    ctx.Name,

    DataSet.value('(./*:DataSet/@Name)[1]','nvarchar(max)') AS DataSetName,

    ISNULL(Query.value('(./*:CommandType/text())[1]','nvarchar(1024)'),'TextQuery') AS CommandType,

    Query.value('(./*:CommandText/text())[1]','nvarchar(max)') AS CommandText

    FROM ContentToXML ctx

    --Get all the Query elements (The "*:" ignores any xml namespaces)

    CROSS APPLY ctx.ContentXML.nodes('//*:Query') Queries(Query)

    CROSS APPLY ctx.ContentXML.nodes('//*:DataSets') DataSet(DataSet)

    The query is based on code that I found here[/url]...

    The basic idea is to query out all of data set names & command text (proc names) from all of the reports in the catalog, so that we have an accurate idea for which procs are being used in which reports...

    The query above pulls the information exactly the way I want it to but the performance leaves a lot to be desired and execution plan... Well let's just say it's expensive.

    The problem is that I'm not a fan of anything XML when it comes to SQL... Consequently, I'm not as proficient at writing XML based queries as I probably could/should be. I was hoping that someone may have some performance related advise to offer.

    The basic stats...

    - The Catalog table has 1,473 "Type = 2" rows (Reports).

    - The Content column, for those rows is 1.2 gigs.

    - Total rows returned for the query is 4,161

    - Query execution time = 5:20 Mins.

    Thanks in advance,

    Jason

    I actually did a presentation that covered this very topic at a recent BI user group but don't have the code handy at the moment.

    My experience has been that, unless you get a parallel query plan when querying ReportServer.dbo.Catalog.Content the the performance is a disaster.

    See this article for forcing a parallel plan. I bet this will help a great deal:

    http://sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx

    All you have to do is create the function that Adam provides in his article then add an addition CROSS APPLY that references the function like so:

    CROSS APPLY make_parallel()

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (5/29/2015)


    I actually did a presentation that covered this very topic at a recent BI user group but don't have the code handy at the moment.

    My experience has been that, unless you get a parallel query plan when querying ReportServer.dbo.Catalog.Content the the performance is a disaster.

    See this article for forcing a parallel plan. I bet this will help a great deal:

    http://sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx

    All you have to do is create the function that Adam provides in his article then add an addition CROSS APPLY that references the function like so:

    CROSS APPLY make_parallel()

    Alan - You sir, are a gentleman and a scholar! I'll give this a shot and let you know how it turns out.

  • Well now... Turns out that the SQL Server instance that's running Reporting Services only has a single CPU core available to it...

    I suppose that's why a query that has an estimated cost of 9,386,250 all on it's own, wasn't creating a parallel plan all on it's own.

    Time for a conversation with the Powers that Be, and see if I can get some additional resources for that instance.

  • Jason A. Long (5/29/2015)


    Well now... Turns out that the SQL Server instance that's running Reporting Services only has a single CPU core available to it...

    I suppose that's why a query that has an estimated cost of 9,386,250 all on it's own, wasn't creating a parallel plan all on it's own.

    Time for a conversation with the Powers that Be, and see if I can get some additional resources for that instance.

    Yeah, unfortunately make_parallel can't help you with that:-P

    I have a couple functions for digging out data sets, parameter info and stuff like that. I'll post them here when I find them. They may help a little. In the meantime I did notice that you are converting some values to nvarchar(max). If you can get away with it change those to varchar(something else) - that may help a little (not much).

    One thing I have done in the past is created a SQL job and had it run at night to run the query and write the results to a table. That way you have good info through yesterday that you can further filter on. Not the most ideal solution but it's better than listening to that one little CPU break it's back for 30 minutes.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (5/29/2015)


    Yeah, unfortunately make_parallel can't help you with that:-P

    I have a couple functions for digging out data sets, parameter info and stuff like that. I'll post them here when I find them. They may help a little. In the meantime I did notice that you are converting some values to nvarchar(max). If you can get away with it change those to varchar(something else) - that may help a little (not much).

    Good call on nvarchar(max) data types... Considering the data, they don't need to be nvarchar and they don't need to be max. Small wins are still wins...

    If you have some same code and/or functions that make life with XML easier, that you're willing to share, that would be awesome... I despise the very idea of XML being in a RDBMS. Unfortunately, that doesn't exempt me from having to contend with it from time to time (it just keeps me from dedicating time to learning how to work with it). Having a good code reference on hand would be nice. 😀

    Alan.B (5/29/2015)


    One thing I have done in the past is created a SQL job and had it run at night to run the query and write the results to a table. That way you have good info through yesterday that you can further filter on. Not the most ideal solution but it's better than listening to that one little CPU break it's back for 30 minutes.

    That was the plan from the beginning. I just wanted to see if I could get the execution times down before I started cramming things into SSIS packages.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply