Slow Query (FOR XML)

Question

Slow Query (FOR XML)

sandhuz24

Old Hand

Points: 350
More actions
August 10, 2015 at 12:26 pm

#303261

Hello Experts
I am going to start by explaining my set up.
Hardware:
SQL Server 2012 Standard Edition
4 Cores, 8 GB RAM
Table in question has four columns:
RecordID...........INT NOT NULL (Unique but not IDENTITY)
InvoiceID...........INT NULL
Row_Num..........SMALLINT NULL
Text_Val............VARCHAR(48) NULL
Data Looks like this:
RecordID..........InvoiceID........Row_Num............Text_Val
1.....................1001................1.......................Test Value 1
2.....................1001................2.......................Test Value 2
3.....................1001................3.......................Test Value 3
4.....................1002................1.......................New Value 1
5.....................1002................2.......................New Value 2
As you can see from the data example, the InvoiceID repeats and there are multiple text values for each InvoiceID. My task is to concatenate the rows with similar InvoiceID into one. So the result set should look like this:
InvoiceID..........Text_Val
1001................Test Value 1 Test Value 2 Test Value 3
1002................New Value 1 New Value 2
The query I am using to achieve this:
_{SELECTa.InvoiceID, STUFF((SELECT' ' + b.Text_Val}
FROMdbo.InvoiceTable b
WHEREb.InvoiceID = a.InvoiceID
ORDER BY b.Row_Num
FOR XML PATH ('')), 1, 1, '') AS [Description]
FROMdbo.InvoiceTable AS a
GROUP BY InvoiceID;
My issue is that the query takes a long time to execute. Table has 60 odd million rows and the execution time for the query is an hour and a half. Any suggestion on bring down the run time would be greatly appreciated. I would like the run time to be under 5 mins 🙂
Thank you

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Luis Cazares SSC Guru Points: 183698 More actions · Answer 1

As this should only be done for reporting, I'd suggest you to use only the rows needed instead of reading the 60 million rows. No one will ever need that many rows.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

sandhuz24 Old Hand Points: 350 More actions · Answer 2

Still, I would like to know if it possible to optimise the query when reading entire table.

Thanks

Luis Cazares SSC Guru Points: 183698 More actions · Answer 3

I'm not sure how much can it be improved, but to give more advice you need to share more information as described in here: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Grant Fritchey SSC Guru Points: 398680 More actions · Answer 4

To read the entire table... probably not. Getting rid of the FOR XML will be faster. Other than that, faster and more disks, faster and more controllers. Since you're reading everything, you need to speed up I/O.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 5

sandhuz24 (8/10/2015)
Still, I would like to know if it possible to optimise the query when reading entire table.
Thanks

Quick thought, too much information missing, without it one is just guessing!

😎

Post the DDL (create table) script including any indices, constraints etc. and at least the actual execution plan of the query.

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 6

Further on the subject, what is killing the performance is most certainly the distinct sort and table scans for the xml subquery. Adding an index on InvoiceID, Row_Num and include the Text_Val should fix this.

😎

Consider this sample

USE tempdb;

GO

SET NOCOUNT ON;

IF OBJECT_ID(N'dbo.InvoiceTable') IS NOT NULL DROP TABLE dbo.InvoiceTable;

GO

CREATE TABLE dbo.InvoiceTable

(

RecordID INT NOT NULL CONSTRAINT PK_DBO_INVOICETABLE_RECORDID PRIMARY KEY CLUSTERED

,InvoiceID INT NULL

,Row_Num SMALLINT NULL

,Text_Val VARCHAR(48) NULL

);

GO

/* Test data generator */

DECLARE @SAMPLE_SIZE INT = 1000000;

DECLARE @LINE_COUNT INT = @SAMPLE_SIZE / 4;

;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)

,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7)

,SAMPLE_DATA AS

(

SELECT

NM.N AS RecordID

,(ABS(CHECKSUM(NEWID())) % @LINE_COUNT) + 1 AS InvoiceID

,CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))

+ CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))

+ CHAR(45) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 1000,0)

+ CHAR(58) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 10000,0) AS Text_Val

FROM NUMS NM

)

INSERT INTO dbo.InvoiceTable(RecordID,InvoiceID,Row_Num,Text_Val)

SELECT

SD.RecordID

,SD.InvoiceID

,ROW_NUMBER() OVER

(

PARTITION BY SD.InvoiceID

ORDER BY SD.RecordID

)AS Row_Num

,SD.Text_Val

FROM SAMPLE_DATA SD;

GO

RAISERROR('----------------------------------------

NO INDEX

----------------------------------------',0,0) WITH NOWAIT;

SET STATISTICS TIME,IO ON;

SELECT

IT.InvoiceID

,STUFF((

SELECT

CHAR(32) + SI.Text_Val

FROM dbo.InvoiceTable SI

WHERE SI.InvoiceID = IT.InvoiceID

ORDER BY SI.Row_Num

FOR XML PATH('')),1,1,'') AS [Description]

FROM dbo.InvoiceTable IT

GROUP BY IT.InvoiceID

SET STATISTICS TIME,IO OFF;

GO

CREATE NONCLUSTERED INDEX NCLIDX_DBO_INVOICETABLE_INVOICEID_ROWNUM_INCL_TEXTVAL ON dbo.InvoiceTable

(InvoiceID ASC, Row_Num ASC) INCLUDE (Text_Val);

GO

RAISERROR('----------------------------------------

WITH INDEX

----------------------------------------',0,0) WITH NOWAIT;

SET STATISTICS TIME,IO ON;

SELECT

IT.InvoiceID

,STUFF((

SELECT

CHAR(32) + SI.Text_Val

FROM dbo.InvoiceTable SI

WHERE SI.InvoiceID = IT.InvoiceID

ORDER BY SI.Row_Num

FOR XML PATH('')),1,1,'') AS [Description]

FROM dbo.InvoiceTable IT

GROUP BY IT.InvoiceID;

SET STATISTICS TIME,IO OFF;

GO

Statistics output

----------------------------------------

NO INDEX

----------------------------------------

Table 'Worktable'. Scan count 245406, logical reads 3703109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'InvoiceTable'. Scan count 2, logical reads 12044, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 9797 ms, elapsed time = 10152 ms.

----------------------------------------

WITH INDEX

----------------------------------------

Table 'InvoiceTable'. Scan count 245407, logical reads 744781, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 2652 ms, elapsed time = 2883 ms.

sandhuz24 Old Hand Points: 350 More actions · Answer 7

Thanks Eirikur!

Sorry for the delay in getting back to the post. Been really busy at work. I had the indexes on the table as you suggest in your post but I was missing the index on the Row_Num column. You correctly point out that the distinct Sort is taking the longest and hence including the column in the index made the difference.

For anyone else visiting the post, I should point out that even though the query does not run in under 5 mins, the improvement was considerable - 70% reduction in run time.

Cheers,

P.

Eirikur Eiriksson SSC Guru Points: 182882 More actions · Answer 8

sandhuz24 (10/7/2015)
Thanks Eirikur!
Sorry for the delay in getting back to the post. Been really busy at work. I had the indexes on the table as you suggest in your post but I was missing the index on the Row_Num column. You correctly point out that the distinct Sort is taking the longest and hence including the column in the index made the difference.
For anyone else visiting the post, I should point out that even though the query does not run in under 5 mins, the improvement was considerable - 70% reduction in run time.
Cheers,
P.

You are most welcome and thank you for the feedback

😎