August 10, 2015 at 12:26 pm
Hello Experts
I am going to start by explaining my set up.
Hardware:
SQL Server 2012 Standard Edition
4 Cores, 8 GB RAM
Table in question has four columns:
RecordID...........INT NOT NULL (Unique but not IDENTITY)
InvoiceID...........INT NULL
Row_Num..........SMALLINT NULL
Text_Val............VARCHAR(48) NULL
Data Looks like this:
RecordID..........InvoiceID........Row_Num............Text_Val
1.....................1001................1.......................Test Value 1
2.....................1001................2.......................Test Value 2
3.....................1001................3.......................Test Value 3
4.....................1002................1.......................New Value 1
5.....................1002................2.......................New Value 2
As you can see from the data example, the InvoiceID repeats and there are multiple text values for each InvoiceID. My task is to concatenate the rows with similar InvoiceID into one. So the result set should look like this:
InvoiceID..........Text_Val
1001................Test Value 1 Test Value 2 Test Value 3
1002................New Value 1 New Value 2
The query I am using to achieve this:
SELECTa.InvoiceID, STUFF((SELECT' ' + b.Text_Val
FROMdbo.InvoiceTable b
WHEREb.InvoiceID = a.InvoiceID
ORDER BY b.Row_Num
FOR XML PATH ('')), 1, 1, '') AS [Description]
FROMdbo.InvoiceTable AS a
GROUP BY InvoiceID;
My issue is that the query takes a long time to execute. Table has 60 odd million rows and the execution time for the query is an hour and a half. Any suggestion on bring down the run time would be greatly appreciated. I would like the run time to be under 5 mins 🙂
Thank you
August 10, 2015 at 12:35 pm
As this should only be done for reporting, I'd suggest you to use only the rows needed instead of reading the 60 million rows. No one will ever need that many rows.
August 10, 2015 at 12:44 pm
Still, I would like to know if it possible to optimise the query when reading entire table.
Thanks
August 10, 2015 at 12:50 pm
I'm not sure how much can it be improved, but to give more advice you need to share more information as described in here: http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
August 10, 2015 at 3:41 pm
To read the entire table... probably not. Getting rid of the FOR XML will be faster. Other than that, faster and more disks, faster and more controllers. Since you're reading everything, you need to speed up I/O.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
August 10, 2015 at 8:14 pm
sandhuz24 (8/10/2015)
Still, I would like to know if it possible to optimise the query when reading entire table.Thanks
Quick thought, too much information missing, without it one is just guessing!
😎
Post the DDL (create table) script including any indices, constraints etc. and at least the actual execution plan of the query.
August 10, 2015 at 10:25 pm
Further on the subject, what is killing the performance is most certainly the distinct sort and table scans for the xml subquery. Adding an index on InvoiceID, Row_Num and include the Text_Val should fix this.
😎
Consider this sample
USE tempdb;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.InvoiceTable') IS NOT NULL DROP TABLE dbo.InvoiceTable;
GO
CREATE TABLE dbo.InvoiceTable
(
RecordID INT NOT NULL CONSTRAINT PK_DBO_INVOICETABLE_RECORDID PRIMARY KEY CLUSTERED
,InvoiceID INT NULL
,Row_Num SMALLINT NULL
,Text_Val VARCHAR(48) NULL
);
GO
/* Test data generator */
DECLARE @SAMPLE_SIZE INT = 1000000;
DECLARE @LINE_COUNT INT = @SAMPLE_SIZE / 4;
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)
,(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
, NUMS(N) AS (SELECT TOP(@SAMPLE_SIZE) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7)
,SAMPLE_DATA AS
(
SELECT
NM.N AS RecordID
,(ABS(CHECKSUM(NEWID())) % @LINE_COUNT) + 1 AS InvoiceID
,CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))
+ CHAR(65 + ( ABS(CHECKSUM(NEWID())) % 26 ))
+ CHAR(45) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 1000,0)
+ CHAR(58) + CONVERT(VARCHAR(6),ABS(CHECKSUM(NEWID())) % 10000,0) AS Text_Val
FROM NUMS NM
)
INSERT INTO dbo.InvoiceTable(RecordID,InvoiceID,Row_Num,Text_Val)
SELECT
SD.RecordID
,SD.InvoiceID
,ROW_NUMBER() OVER
(
PARTITION BY SD.InvoiceID
ORDER BY SD.RecordID
)AS Row_Num
,SD.Text_Val
FROM SAMPLE_DATA SD;
GO
RAISERROR('----------------------------------------
NO INDEX
----------------------------------------',0,0) WITH NOWAIT;
SET STATISTICS TIME,IO ON;
SELECT
IT.InvoiceID
,STUFF((
SELECT
CHAR(32) + SI.Text_Val
FROM dbo.InvoiceTable SI
WHERE SI.InvoiceID = IT.InvoiceID
ORDER BY SI.Row_Num
FOR XML PATH('')),1,1,'') AS [Description]
FROM dbo.InvoiceTable IT
GROUP BY IT.InvoiceID
SET STATISTICS TIME,IO OFF;
GO
CREATE NONCLUSTERED INDEX NCLIDX_DBO_INVOICETABLE_INVOICEID_ROWNUM_INCL_TEXTVAL ON dbo.InvoiceTable
(InvoiceID ASC, Row_Num ASC) INCLUDE (Text_Val);
GO
GO
RAISERROR('----------------------------------------
WITH INDEX
----------------------------------------',0,0) WITH NOWAIT;
SET STATISTICS TIME,IO ON;
SELECT
IT.InvoiceID
,STUFF((
SELECT
CHAR(32) + SI.Text_Val
FROM dbo.InvoiceTable SI
WHERE SI.InvoiceID = IT.InvoiceID
ORDER BY SI.Row_Num
FOR XML PATH('')),1,1,'') AS [Description]
FROM dbo.InvoiceTable IT
GROUP BY IT.InvoiceID;
SET STATISTICS TIME,IO OFF;
GO
Statistics output
----------------------------------------
NO INDEX
----------------------------------------
Table 'Worktable'. Scan count 245406, logical reads 3703109, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'InvoiceTable'. Scan count 2, logical reads 12044, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 9797 ms, elapsed time = 10152 ms.
----------------------------------------
WITH INDEX
----------------------------------------
Table 'InvoiceTable'. Scan count 245407, logical reads 744781, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 2652 ms, elapsed time = 2883 ms.
October 7, 2015 at 2:30 am
Thanks Eirikur!
Sorry for the delay in getting back to the post. Been really busy at work. I had the indexes on the table as you suggest in your post but I was missing the index on the Row_Num column. You correctly point out that the distinct Sort is taking the longest and hence including the column in the index made the difference.
For anyone else visiting the post, I should point out that even though the query does not run in under 5 mins, the improvement was considerable - 70% reduction in run time.
Cheers,
P.
October 7, 2015 at 2:48 am
sandhuz24 (10/7/2015)
Thanks Eirikur!Sorry for the delay in getting back to the post. Been really busy at work. I had the indexes on the table as you suggest in your post but I was missing the index on the Row_Num column. You correctly point out that the distinct Sort is taking the longest and hence including the column in the index made the difference.
For anyone else visiting the post, I should point out that even though the query does not run in under 5 mins, the improvement was considerable - 70% reduction in run time.
Cheers,
P.
You are most welcome and thank you for the feedback
😎
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply