SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Compatibility of streaming aggregate operator and partitioning.


Compatibility of streaming aggregate operator and partitioning.

Author
Message
SpringTownDBA
SpringTownDBA
UDP Broadcaster
UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)

Group: General Forum Members
Points: 1434 Visits: 1499
I've got a 350 million row partitioned table that I'm trying to aggregate. The plan I get uses a HASH Aggregate. I would like to get a query plan that uses a streaming aggregate so results can be returned to the client faster. Here is the original query (Schema is at the end):


SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID



Schema for the Table and indexes are fairly straight forward. The clustered index is on (userid, datetime). There is a Non-clustered index on (Username, sessionid). Both indexes are partitioned on (DateTime). Since the query is limited to one partition and the columns being grouped on are either in the clustered index or the NC index key, the query optimizer should be able to generate a plan using a streaming aggregate that scans the NC index.

However, I am unable to produce the desired plan. I've tried 4 variations: with/without MAXDOP 1, with/without query "ORDER GROUP" query hint. The plans for these are in the attachment.


--This is the query that should be producing a streaming agg
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID

--Still get a Hash Aggregate without parallelism
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID OPTION (MAXDOP 1)

--Prevent HASH Aggregation, and we get a distinct sort instead
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID OPTION (ORDER GROUP)

--Still get distinct sort
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID OPTION (MAXDOP 1, ORDER GROUP)




It there something preventing the use of a streaming aggregate for this query, or is this a bug/limitation of the QO?

Create Schema for the table:

CREATE PARTITION FUNCTION [PartitionByYear](datetime) AS RANGE RIGHT FOR
VALUES (N'2005-01-01T00:00:00.000', N'2006-01-01T00:00:00.000', N'2007-01-01T00:00:00.000',
N'2008-01-01T00:00:00.000', N'2009-01-01T00:00:00.000', N'2010-01-01T00:00:00.000',
N'2011-01-01T00:00:00.000', N'2012-01-01T00:00:00.000', N'2014-01-01T00:00:00.000')
GO

/****** Object: PartitionScheme [PartYearsToArchiveAndPrimary] Script Date: 4/19/2013 11:04:53 PM ******/
CREATE PARTITION SCHEME [PartYearsToArchiveAndPrimary] AS PARTITION [PartitionByYear]
TO ([Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [PRIMARY], [PRIMARY])

GO
CREATE TABLE [dbo].[AttemptedLogin](
[ObjectId] [int] IDENTITY(1,1) NOT NULL,
[DateTime] [datetime] NOT NULL,
[UserId] [int] NOT NULL,
[UserName] [varchar](30) NULL,
[Password] [varchar](30) NULL,
[SessionID] [varchar](30) NULL,
[IPAddress] [varchar](16) NULL,
[ProxyAddress] [varchar](16) NULL,
[CountryCode] [char](2) NULL,
CONSTRAINT [PK_NewAttemptedLogin] PRIMARY KEY NONCLUSTERED
(
[ObjectId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PartYearsToArchiveAndPrimary]([DateTime])

GO
/****** Object: Index [CIX_AttemptedLogin] Script Date: 4/19/2013 11:02:19 PM ******/
CREATE CLUSTERED INDEX [CIX_AttemptedLogin] ON [dbo].[AttemptedLogin]
(
[UserId] ASC,
[DateTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
) ON [PartYearsToArchiveAndPrimary]([DateTime])
GO

/****** Object: Index [IX_AttemptedLogin_UserNameSessionID] Script Date: 4/19/2013 11:02:20 PM ******/
CREATE NONCLUSTERED INDEX [IX_AttemptedLogin_UserNameSessionID] ON [dbo].[AttemptedLogin]
(
[UserName] ASC,
[SessionID] ASC
)
WHERE ([SessionId] IS NOT NULL)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
FILLFACTOR = 90) ON [PartYearsToArchiveAndPrimary]([DateTime])
GO



Thanks!
Attachments
queries.sqlplan (5 views, 62.00 KB)
SpringTownDBA
SpringTownDBA
UDP Broadcaster
UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)UDP Broadcaster (1.4K reputation)

Group: General Forum Members
Points: 1434 Visits: 1499
Forgot the include information about the machine:

Microsoft SQL Server 2012 (SP1) - 11.0.3128.0 (X64)
Dec 28 2012 20:23:12
Copyright (c) Microsoft Corporation
Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)


Paul White
Paul White
SSC-Dedicated
SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)SSC-Dedicated (34K reputation)

Group: General Forum Members
Points: 34632 Visits: 11359
Scanning the nonclustered index, there is an implied first key on the partition ID, so the following can use a Stream Aggregate:

SELECT sessionid, username 
FROM dbo.AttemptedLogin
WHERE sessionid IS NOT null
GROUP BY $Partition.PartitionByYear([DateTime]), UserName, SessionID



Similarly, if a single partition is statically specified, the following also results in a Stream Aggregate, since the keys of the nonclustered index are (UserName, SessionID):


SELECT SessionID, UserName FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null
GROUP BY UserName, SessionID



The problem in the original code is that it adds UserID to the mix, and fails to use a Stream Aggregate:


SELECT UserId, SessionID, UserName FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null
GROUP BY UserId, UserName, SessionID



Now UserID forms part of the key of the nonclustered index since it is not declared as unique, but it comes after the nonclustered keys. Writing the GROUP BY clause to reflect the key order results in the Stream Aggregate you are looking for:

SELECT UserId, SessionID, UserName FROM dbo.AttemptedLogin 
WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null
GROUP BY UserName, SessionID, UserId



Should the order of the GROUP BY keys matter? No. In that sense, it is a limitation, in the same way that writing a query with several window functions might result in an unnecessary sort depending on the written order of the query.

It may be more natural (and easier for future maintenance) if you explicitly include UserID in the nonclustered index definition.

edit: In response to your other Twitter question, yes the same considerations apply to DISTINCT:


-- Sort
SELECT DISTINCT UserId, UserName, SessionID FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear([DateTime]) = 1 AND sessionid IS NOT null

-- No sort
SELECT DISTINCT UserName, SessionID, UserId FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear([DateTime]) = 1 AND sessionid IS NOT null



Another option is to keep the GROUP BY or DISTINCT column order as it is, and add a presentation ORDER BY UserName, SessionID, UserId



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
TheSQLGuru
TheSQLGuru
SSC-Dedicated
SSC-Dedicated (31K reputation)SSC-Dedicated (31K reputation)SSC-Dedicated (31K reputation)SSC-Dedicated (31K reputation)SSC-Dedicated (31K reputation)SSC-Dedicated (31K reputation)SSC-Dedicated (31K reputation)SSC-Dedicated (31K reputation)

Group: General Forum Members
Points: 31404 Visits: 8670
I am glad you saw this one Paul! I didn't think anyone else could give as clear of a description of why the OP was getting the observed behavior.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search