Query tuning with conditional aggregation

Question

Query tuning with conditional aggregation

Viewing 4 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

Once you've experimented enough to know how it's done, building a sandbox for testing can be quick enough that it doesn't impact on your work. So here's a sandbox for testing out some of the assumptions made on this thread.
I'm making the following assumptions:
A streaming aggregate will outperform a hash aggregate
A streaming aggregate will require an index on Timestamp and TagName
Pre-aggregation as suggested by Jonathan AC Roberts will provide a performance lift.

First, the sample data script:
-- Create a sample data set IF OBJECT_ID('tempdb..#SampleData') IS NOT NULL DROP TABLE #SampleData; WITH E1 (N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) d (n)), -- 10 rows E2 (N) AS (SELECT 1 FROM E1 a, E1 b), -- 100 rows E4 (N) AS (SELECT 1 FROM E2 a, E2 b), -- 10000 rows E8 (N) AS (SELECT 1 FROM E4 a, E4 b) -- 100000000 rows SELECT TOP(1000000) [Timestamp] = DATEADD(DAY,CHECKSUM(NEWID()) % 10,GETDATE()), [Tagname] = 'NC_S\Column ' + CAST(1 + (ABS(CHECKSUM(NEWID())) % 900) AS VARCHAR(20)), [Value] = ABS(CHECKSUM(NEWID())) % 1000 INTO #SampleData FROM E8;

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 2

Next a couple of queries, both built dynamically - one with preaggregation, one without:

-------------------------------------
-- test with 500 columns
DECLARE @stmt VARCHAR(MAX) = '';
WITH
E1 (N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) d (n)),
E2 (N) AS (SELECT 1 FROM E1 a, E1 b),
E4 (N) AS (SELECT 1 FROM E2 a, E2 b),
iTally AS (SELECT TOP(500) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM E4)

SELECT @stmt = @stmt + ', [NC_S\Column ' + CAST(n AS VARCHAR(4)) + '] = SUM(CASE WHEN TagName = ''NC_S\Column ' + CAST(n AS VARCHAR(4)) + ''' THEN [value] END) ' + CHAR(10)
FROM iTally

SET @stmt = ';WITH CTE AS
(
SELECT [Timestamp],
TagName,
SUM(value) value
FROM #SampleData
GROUP BY [Timestamp], TagName
)SELECT
[Timestamp]' + @stmt + 'FROM CTE
GROUP BY [Timestamp]; ';

EXEC(@stmt)
------------------------------------
-- test with 500 columns
DECLARE @stmt VARCHAR(MAX) = '';
WITH
E1 (N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) d (n)),
E2 (N) AS (SELECT 1 FROM E1 a, E1 b),
E4 (N) AS (SELECT 1 FROM E2 a, E2 b),
iTally AS (SELECT TOP(500) n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM E4)

SELECT @stmt = @stmt + ', [NC_S\Column ' + CAST(n AS VARCHAR(4)) + '] = SUM(CASE WHEN TagName = ''NC_S\Column ' + CAST(n AS VARCHAR(4)) + ''' THEN [value] END) ' + CHAR(10)
FROM iTally

SET @stmt = 'SELECT
[Timestamp]' + @stmt + 'FROM #SampleData
GROUP BY [Timestamp]; ';

EXEC(@stmt)
-------------------------------------

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

Finally, four conditions to test:

without preagg, single key clustered index: stream aggregate, 00:01:30
without preagg, dual key clustered index: stream aggregate, 00:01:26
with preagg, single key clustered index: hash aggregate x2, 00:00:05
with preagg, dual key clustered index: stream aggregate x2, 00:00:03

Conclusions:
Preaggregation with the right index is, in this test harness, 30x faster than without preaggregation.
The two-key index is irrelevant unless there's a preaggregation step.

Adding more and more output columns to the query increases the execution time arithmetically:
100 columns: 00:00:17
200 columns: 00:00:35
400 columns: 00:01:10
800 columns: 00:02:23

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jeff Moden SSC Guru Points: 1004619 More actions · Answer 4

We're just guess here. Please post the full DDL for the Exampledb.dbo.SourceTable_EF278 table including all constraints and indexes.

It would also be helpful if you if you provided some examples of what the the values in the TimeStamp and Value columns looked like.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Query tuning with conditional aggregation

Cookies on SQLServerCentral