Query tuning with conditional aggregation

Question

Query tuning with conditional aggregation

prefet

SSC-Addicted

Points: 492
More actions
January 21, 2019 at 9:27 am

#382603

Hi all,
I have a stored procedure that returns a table 1Million + rows that I then need to pivot (I can't modify the stored procedure as it makes a call to a proprietary .exe). I first tried using the pivot operator to do what I need but that takes about 7 minutes to generate a 1440 row table with 900+ columns. I managed to reduced the execution time to a little over 2 minutes by using conditional aggregation with dynamically generated column names. I'm still looking for a way to reduce the execution even further to be acceptable for the user but am stumped as to any further optimizations I could do.

INSERT INTO Exampledb.dbo.DestinationTable_EF278 SELECT Timestamp, [NC_S\Column 1] = SUM(CASE WHEN TagName='NC_S\Column 1' THEN value END), [NC_S\Column 2] = SUM(CASE WHEN TagName='NC_S\Column 2' THEN value END), [NC_S\Column 3] = SUM(CASE WHEN TagName='NC_S\Column 3' THEN value END), [...]Repeat 900 times[...] [NC_S\Column 901] = SUM(CASE WHEN TagName='NC_S\Column 901' THEN value END), FROM Exampledb.dbo.SourceTable_EF278 GROUP BY Timestamp ORDER BY Timestamp
Any ideas how to make this faster ?
Thanks !
prefet

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286573 More actions · Answer 1

prefet - Monday, January 21, 2019 9:27 AM
Hi all,
I have a stored procedure that returns a table 1Million + rows that I then need to pivot (I can't modify the stored procedure as it makes a call to a proprietary .exe). I first tried using the pivot operator to do what I need but that takes about 7 minutes to generate a 1440 row table with 900+ columns. I managed to reduced the execution time to a little over 2 minutes by using conditional aggregation with dynamically generated column names. I'm still looking for a way to reduce the execution even further to be acceptable for the user but am stumped as to any further optimizations I could do.

INSERT INTO Exampledb.dbo.DestinationTable_EF278 SELECT Timestamp, [NC_S\Column 1] = SUM(CASE WHEN TagName='NC_S\Column 1' THEN value END), [NC_S\Column 2] = SUM(CASE WHEN TagName='NC_S\Column 2' THEN value END), [NC_S\Column 3] = SUM(CASE WHEN TagName='NC_S\Column 3' THEN value END), [...]Repeat 900 times[...] [NC_S\Column 901] = SUM(CASE WHEN TagName='NC_S\Column 901' THEN value END), FROM Exampledb.dbo.SourceTable_EF278 GROUP BY Timestamp ORDER BY Timestamp
Any ideas how to make this faster ?
Thanks !
prefet

Don't return 900+ columns? I mean seriously how useful can a query be with that many columns? Maybe find a completely different approach to whatever you are trying to do.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Michael L John One Orange Chip Points: 27038 More actions · Answer 2

prefet - Monday, January 21, 2019 9:27 AM

(I can't modify the stored procedure as it makes a call to a proprietary .exe)

That statement begs for more explanation.

How is this proc calling a .exe, and what does the .exe do?

Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/

prefet SSC-Addicted Points: 492 More actions · Answer 3

@micheal-2, the .exe is part of the historian meant to handle large amounts of time series data. The sproc is here down the line for ingestion in our other tools hence the 900 columns.

Jonathan AC Roberts SSCoach Points: 18257 More actions · Answer 4

Do you have 1 row per Timestamp, TagName? If so I think your query is probably about as fast as it will go. If you have many rows per Timestamp, TagName then you would get better performance if you aggregate first:

;WITH CTE AS ( SELECT Timestamp, TagName, SUM(value) value FROM Exampledb.dbo.SourceTable_EF278 GROUP BY Timestamp, TagName ) SELECT Timestamp, [NC_S\Column 1] = SUM(CASE WHEN TagName='NC_S\Column 1' THEN value END), [NC_S\Column 2] = SUM(CASE WHEN TagName='NC_S\Column 2' THEN value END), [NC_S\Column 3] = SUM(CASE WHEN TagName='NC_S\Column 3' THEN value END), [...]Repeat 900 times[...] [NC_S\Column 901] = SUM(CASE WHEN TagName='NC_S\Column 901' THEN value END) FROM CTE GROUP BY Timestamp ORDER BY Timestamp

prefet SSC-Addicted Points: 492 More actions · Answer 5

Thanks. I tried with the additional CTE but it didn't reduce execution time.

ScottPletcher SSC Guru Points: 100942 More actions · Answer 6

You need to either:

Cluster the table by timestamp (if that's how you (almost) always query against the table)
Or
Create a non-clus index on (Timestamp) include (TagName, Value)

Either one of those will avoid having to sort all the data every time the query runs.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

Grant Fritchey SSC Guru Points: 398679 More actions · Answer 7

A query without any kind of filtering such as this is frequently only helped by upgrading hardware. Yes, possibly indexing differently could help... maybe (can't see the execution plan, so I have no clue). However, that's only going to help a little. Then it's just down to hardware. When you're moving everything, you need the hardware to support that.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

ScottPletcher - Tuesday, January 22, 2019 10:42 AM
You need to either:
Cluster the table by timestamp (if that's how you (almost) always query against the table)
Or
Create a non-clus index on (Timestamp) include (TagName, Value)
Either one of those will avoid having to sort all the data every time the query runs.

A streaming aggregate, which is often an order of magnitude faster than a hash aggregate, won't be facilitated by either of these suggestions, but by a clustered index on TimeStamp, TagName or an ordinary index on (Timestamp,TagName) include (Value).

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Grant Fritchey SSC Guru Points: 398679 More actions · Answer 9

We're on 2008, so this won't help, however, a possible Hail Mary on this would be to look at columnstore indexes.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 10

Grant Fritchey - Thursday, January 24, 2019 7:50 AM
We're on 2008, so this won't help, however, a possible Hail Mary on this would be to look at columnstore indexes.

No streaming aggregates in 2008?
I think columnstore indexes were introduced in 2012 with limited functionality - but this is the version we're using and cs indexes have proven themselves to be very useful for some of our forecasting scripts.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jonathan AC Roberts SSCoach Points: 18257 More actions · Answer 11

I was wondering how much time was taken going through a 900 line case statement 1 million times.
It would certainly contribute to the runtime of the query.

ScottPletcher SSC Guru Points: 100942 More actions · Answer 12

ChrisM@Work - Thursday, January 24, 2019 6:36 AM
ScottPletcher - Tuesday, January 22, 2019 10:42 AM
You need to either:
Cluster the table by timestamp (if that's how you (almost) always query against the table)
Or
Create a non-clus index on (Timestamp) include (TagName, Value)
Either one of those will avoid having to sort all the data every time the query runs.
A streaming aggregate, which is often an order of magnitude faster than a hash aggregate, won't be facilitated by either of these suggestions, but by a clustered index on TimeStamp, TagName or an ordinary index on (Timestamp,TagName) include (Value).

But the overall effect on performance might be worse, possibly much worse, with TagName in the keys, depending on the order the rows are inserted. I don't know that a hash match aggregate will be that slow here, and it will certainly be much faster than having to sort all the rows, which is the real performance killer.

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 13

ScottPletcher - Thursday, January 24, 2019 8:12 AM
ChrisM@Work - Thursday, January 24, 2019 6:36 AM
ScottPletcher - Tuesday, January 22, 2019 10:42 AM
You need to either:
Cluster the table by timestamp (if that's how you (almost) always query against the table)
Or
Create a non-clus index on (Timestamp) include (TagName, Value)
Either one of those will avoid having to sort all the data every time the query runs.
A streaming aggregate, which is often an order of magnitude faster than a hash aggregate, won't be facilitated by either of these suggestions, but by a clustered index on TimeStamp, TagName or an ordinary index on (Timestamp,TagName) include (Value).
But the overall effect on performance might be worse, possibly much worse, with TagName in the keys, depending on the order the rows are inserted. I don't know that a hash match aggregate will be that slow here, and it will certainly be much faster than having to sort all the rows, which is the real performance killer.

Timestamp and TagName are *required* as the leading keys in any index to support a streaming aggregate in this case (a third column as the first column in the index _might_ also work provided that the value is the same for all qualifying rows). If either column is missing or both columns are present but there's a third column between them, then a sort operator would almost certainly appear in the plan - if the optimizer chose a streaming aggregate over a hash aggregate. And yes, that sort would be slow.

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ScottPletcher SSC Guru Points: 100942 More actions · Answer 14

ChrisM@Work - Thursday, January 24, 2019 8:22 AM
ScottPletcher - Thursday, January 24, 2019 8:12 AM
ChrisM@Work - Thursday, January 24, 2019 6:36 AM
ScottPletcher - Tuesday, January 22, 2019 10:42 AM
You need to either:
Cluster the table by timestamp (if that's how you (almost) always query against the table)
Or
Create a non-clus index on (Timestamp) include (TagName, Value)
Either one of those will avoid having to sort all the data every time the query runs.
A streaming aggregate, which is often an order of magnitude faster than a hash aggregate, won't be facilitated by either of these suggestions, but by a clustered index on TimeStamp, TagName or an ordinary index on (Timestamp,TagName) include (Value).
But the overall effect on performance might be worse, possibly much worse, with TagName in the keys, depending on the order the rows are inserted. I don't know that a hash match aggregate will be that slow here, and it will certainly be much faster than having to sort all the rows, which is the real performance killer.
Timestamp and TagName are *required* as the leading keys in any index to support a streaming aggregate in this case (a third column as the first column in the index _might_ also work provided that the value is the same for all qualifying rows). If either column is missing or both columns are present but there's a third column between them, then a sort operator would almost certainly appear in the plan - if the optimizer chose a streaming aggregate over a hash aggregate. And yes, that sort would be slow.

I'd be amazed if the SQL optimizer were sophisticated enough to analyze all the CASE statements and determine they relied only on TagName in addition to Timestamp and to be able to then do a more efficient aggregation. I'm not saying SQL can't do it, but I'd like to see it to actually do it before I was sure it would be capable of that.

Also, for each new value of Timestamp, SQL would clear the hash buckets, right? As long as all the buckets for a given timestamp fit into memory, a hash agg isn't going to have perf issues, is it?

SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.