Slow Query (FOR XML)

  • Hello Experts

    I am going to start by explaining my set up.

    Hardware:

    SQL Server 2012 Standard Edition

    4 Cores, 8 GB RAM

    Table in question has four columns:

    RecordID...........INT NOT NULL (Unique but not IDENTITY)

    InvoiceID...........INT NULL

    Row_Num..........SMALLINT NULL

    Text_Val............VARCHAR(48) NULL

    Data Looks like this:

    RecordID..........InvoiceID........Row_Num............Text_Val

    1.....................1001................1.......................Test Value 1

    2.....................1001................2.......................Test Value 2

    3.....................1001................3.......................Test Value 3

    4.....................1002................1.......................New Value 1

    5.....................1002................2.......................New Value 2

    As you can see from the data example, the InvoiceID repeats and there are multiple text values for each InvoiceID. My task is to concatenate the rows with similar InvoiceID into one. So the result set should look like this:

    InvoiceID..........Text_Val

    1001................Test Value 1 Test Value 2 Test Value 3

    1002................New Value 1 New Value 2

    The query I am using to achieve this:

    SELECTa.InvoiceID, STUFF((SELECT' ' + b.Text_Val

    FROMdbo.InvoiceTable b

    WHEREb.InvoiceID = a.InvoiceID

    ORDER BY b.Row_Num

    FOR XML PATH ('')), 1, 1, '') AS [Description]

    FROMdbo.InvoiceTable AS a

    GROUP BY InvoiceID;

    My issue is that the query takes a long time to execute. Table has 60 odd million rows and the execution time for the query is an hour and a half. Any suggestion on bring down the run time would be greatly appreciated. I would like the run time to be under 5 mins 🙂

    Thank you

  • As this should only be done for reporting, I'd suggest you to use only the rows needed instead of reading the 60 million rows. No one will ever need that many rows.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Still, I would like to know if it possible to optimise the query when reading entire table.

    Thanks

  • I'm not sure how much can it be improved, but to give more advice you need to share more information as described in here: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • To read the entire table... probably not. Getting rid of the FOR XML will be faster. Other than that, faster and more disks, faster and more controllers. Since you're reading everything, you need to speed up I/O.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • sandhuz24 (8/10/2015)


    Still, I would like to know if it possible to optimise the query when reading entire table.

    Thanks

    Quick thought, too much information missing, without it one is just guessing!

    😎

    Post the DDL (create table) script including any indices, constraints etc. and at least the actual execution plan of the query.

  • Further on the subject, what is killing the performance is most certainly the distinct sort and table scans for the xml subquery. Adding an index on InvoiceID, Row_Num and include the Text_Val should fix this.

    😎

    Consider this sample

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.InvoiceTable') IS NOT NULL DROP TABLE dbo.InvoiceTable;

    GO

    CREATE TABLE dbo.InvoiceTable

    (

    RecordID INT NOT NULL CONSTRAINT PK_DBO_INVOICETABLE_RECORDID PRIMARY KEY CLUSTERED

    ,InvoiceID INT NULL

    ,Row_Num SMALLINT NULL

    ,Text_Val VARCHAR(48) NULL

    );

    GO

    /* Test data generator */

    DECLARE @SAMPLE_SIZE INT = 1000000;

    DECLARE @LINE_COUNT INT = @SAMPLE_SIZE / 4;

    ;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)

    ,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    , NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7)

    ,SAMPLE_DATA AS

    (

    SELECT

    NM.N AS RecordID

    ,(ABS(CHECKSUM(NEWID())) % @LINE_COUNT) + 1 AS InvoiceID

    ,CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))

    + CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))

    + CHAR(45) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 1000,0)

    + CHAR(58) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 10000,0) AS Text_Val

    FROM NUMS NM

    )

    INSERT INTO dbo.InvoiceTable(RecordID,InvoiceID,Row_Num,Text_Val)

    SELECT

    SD.RecordID

    ,SD.InvoiceID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY SD.InvoiceID

    ORDER BY SD.RecordID

    )AS Row_Num

    ,SD.Text_Val

    FROM SAMPLE_DATA SD;

    GO

    RAISERROR('----------------------------------------

    NO INDEX

    ----------------------------------------',0,0) WITH NOWAIT;

    SET STATISTICS TIME,IO ON;

    SELECT

    IT.InvoiceID

    ,STUFF((

    SELECT

    CHAR(32) + SI.Text_Val

    FROM dbo.InvoiceTable SI

    WHERE SI.InvoiceID = IT.InvoiceID

    ORDER BY SI.Row_Num

    FOR XML PATH('')),1,1,'') AS [Description]

    FROM dbo.InvoiceTable IT

    GROUP BY IT.InvoiceID

    SET STATISTICS TIME,IO OFF;

    GO

    CREATE NONCLUSTERED INDEX NCLIDX_DBO_INVOICETABLE_INVOICEID_ROWNUM_INCL_TEXTVAL ON dbo.InvoiceTable

    (InvoiceID ASC, Row_Num ASC) INCLUDE (Text_Val);

    GO

    GO

    RAISERROR('----------------------------------------

    WITH INDEX

    ----------------------------------------',0,0) WITH NOWAIT;

    SET STATISTICS TIME,IO ON;

    SELECT

    IT.InvoiceID

    ,STUFF((

    SELECT

    CHAR(32) + SI.Text_Val

    FROM dbo.InvoiceTable SI

    WHERE SI.InvoiceID = IT.InvoiceID

    ORDER BY SI.Row_Num

    FOR XML PATH('')),1,1,'') AS [Description]

    FROM dbo.InvoiceTable IT

    GROUP BY IT.InvoiceID;

    SET STATISTICS TIME,IO OFF;

    GO

    Statistics output

    ----------------------------------------

    NO INDEX

    ----------------------------------------

    Table 'Worktable'. Scan count 245406, logical reads 3703109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'InvoiceTable'. Scan count 2, logical reads 12044, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 9797 ms, elapsed time = 10152 ms.

    ----------------------------------------

    WITH INDEX

    ----------------------------------------

    Table 'InvoiceTable'. Scan count 245407, logical reads 744781, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    SQL Server Execution Times:

    CPU time = 2652 ms, elapsed time = 2883 ms.

  • Thanks Eirikur!

    Sorry for the delay in getting back to the post. Been really busy at work. I had the indexes on the table as you suggest in your post but I was missing the index on the Row_Num column. You correctly point out that the distinct Sort is taking the longest and hence including the column in the index made the difference.

    For anyone else visiting the post, I should point out that even though the query does not run in under 5 mins, the improvement was considerable - 70% reduction in run time.

    Cheers,

    P.

  • sandhuz24 (10/7/2015)


    Thanks Eirikur!

    Sorry for the delay in getting back to the post. Been really busy at work. I had the indexes on the table as you suggest in your post but I was missing the index on the Row_Num column. You correctly point out that the distinct Sort is taking the longest and hence including the column in the index made the difference.

    For anyone else visiting the post, I should point out that even though the query does not run in under 5 mins, the improvement was considerable - 70% reduction in run time.

    Cheers,

    P.

    You are most welcome and thank you for the feedback

    😎

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply