Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Compatibility of streaming aggregate operator and partitioning. Expand / Collapse
Author
Message
Posted Friday, April 19, 2013 10:15 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, April 1, 2014 3:26 PM
Points: 316, Visits: 1,497
I've got a 350 million row partitioned table that I'm trying to aggregate. The plan I get uses a HASH Aggregate. I would like to get a query plan that uses a streaming aggregate so results can be returned to the client faster. Here is the original query (Schema is at the end):

SELECT userid, sessionid, username FROM dbo.AttemptedLogin 
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID

Schema for the Table and indexes are fairly straight forward. The clustered index is on (userid, datetime). There is a Non-clustered index on (Username, sessionid). Both indexes are partitioned on (DateTime). Since the query is limited to one partition and the columns being grouped on are either in the clustered index or the NC index key, the query optimizer should be able to generate a plan using a streaming aggregate that scans the NC index.

However, I am unable to produce the desired plan. I've tried 4 variations: with/without MAXDOP 1, with/without query "ORDER GROUP" query hint. The plans for these are in the attachment.

--This is the query that should be producing a streaming agg
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID

--Still get a Hash Aggregate without parallelism
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID OPTION (MAXDOP 1)

--Prevent HASH Aggregation, and we get a distinct sort instead
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID OPTION (ORDER GROUP)

--Still get distinct sort
SELECT userid, sessionid, username FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null
GROUP BY userid, UserName, SessionID OPTION (MAXDOP 1, ORDER GROUP)


It there something preventing the use of a streaming aggregate for this query, or is this a bug/limitation of the QO?

Create Schema for the table:
CREATE PARTITION FUNCTION [PartitionByYear](datetime) AS RANGE RIGHT FOR 
VALUES (N'2005-01-01T00:00:00.000', N'2006-01-01T00:00:00.000', N'2007-01-01T00:00:00.000',
N'2008-01-01T00:00:00.000', N'2009-01-01T00:00:00.000', N'2010-01-01T00:00:00.000',
N'2011-01-01T00:00:00.000', N'2012-01-01T00:00:00.000', N'2014-01-01T00:00:00.000')
GO

/****** Object: PartitionScheme [PartYearsToArchiveAndPrimary] Script Date: 4/19/2013 11:04:53 PM ******/
CREATE PARTITION SCHEME [PartYearsToArchiveAndPrimary] AS PARTITION [PartitionByYear]
TO ([Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [PRIMARY], [PRIMARY])

GO
CREATE TABLE [dbo].[AttemptedLogin](
[ObjectId] [int] IDENTITY(1,1) NOT NULL,
[DateTime] [datetime] NOT NULL,
[UserId] [int] NOT NULL,
[UserName] [varchar](30) NULL,
[Password] [varchar](30) NULL,
[SessionID] [varchar](30) NULL,
[IPAddress] [varchar](16) NULL,
[ProxyAddress] [varchar](16) NULL,
[CountryCode] [char](2) NULL,
CONSTRAINT [PK_NewAttemptedLogin] PRIMARY KEY NONCLUSTERED
(
[ObjectId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PartYearsToArchiveAndPrimary]([DateTime])

GO
/****** Object: Index [CIX_AttemptedLogin] Script Date: 4/19/2013 11:02:19 PM ******/
CREATE CLUSTERED INDEX [CIX_AttemptedLogin] ON [dbo].[AttemptedLogin]
(
[UserId] ASC,
[DateTime] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON
) ON [PartYearsToArchiveAndPrimary]([DateTime])
GO

/****** Object: Index [IX_AttemptedLogin_UserNameSessionID] Script Date: 4/19/2013 11:02:20 PM ******/
CREATE NONCLUSTERED INDEX [IX_AttemptedLogin_UserNameSessionID] ON [dbo].[AttemptedLogin]
(
[UserName] ASC,
[SessionID] ASC
)
WHERE ([SessionId] IS NOT NULL)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
FILLFACTOR = 90) ON [PartYearsToArchiveAndPrimary]([DateTime])
GO

Thanks!


  Post Attachments 
queries.sqlplan (0 views, 62.88 KB)
Post #1444673
Posted Friday, April 19, 2013 10:21 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, April 1, 2014 3:26 PM
Points: 316, Visits: 1,497
Forgot the include information about the machine:
Microsoft SQL Server 2012 (SP1) - 11.0.3128.0 (X64) 
Dec 28 2012 20:23:12
Copyright (c) Microsoft Corporation
Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

Post #1444674
Posted Saturday, April 20, 2013 12:05 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Yesterday @ 3:56 PM
Points: 11,192, Visits: 11,096
Scanning the nonclustered index, there is an implied first key on the partition ID, so the following can use a Stream Aggregate:

SELECT sessionid, username 
FROM dbo.AttemptedLogin
WHERE sessionid IS NOT null
GROUP BY $Partition.PartitionByYear([DateTime]), UserName, SessionID

Similarly, if a single partition is statically specified, the following also results in a Stream Aggregate, since the keys of the nonclustered index are (UserName, SessionID):

SELECT SessionID, UserName FROM dbo.AttemptedLogin 
WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null
GROUP BY UserName, SessionID

The problem in the original code is that it adds UserID to the mix, and fails to use a Stream Aggregate:

SELECT UserId, SessionID, UserName FROM dbo.AttemptedLogin 
WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null
GROUP BY UserId, UserName, SessionID

Now UserID forms part of the key of the nonclustered index since it is not declared as unique, but it comes after the nonclustered keys. Writing the GROUP BY clause to reflect the key order results in the Stream Aggregate you are looking for:

SELECT UserId, SessionID, UserName FROM dbo.AttemptedLogin 
WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null
GROUP BY UserName, SessionID, UserId

Should the order of the GROUP BY keys matter? No. In that sense, it is a limitation, in the same way that writing a query with several window functions might result in an unnecessary sort depending on the written order of the query.

It may be more natural (and easier for future maintenance) if you explicitly include UserID in the nonclustered index definition.

edit: In response to your other Twitter question, yes the same considerations apply to DISTINCT:

-- Sort
SELECT DISTINCT UserId, UserName, SessionID FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear([DateTime]) = 1 AND sessionid IS NOT null

-- No sort
SELECT DISTINCT UserName, SessionID, UserId FROM dbo.AttemptedLogin
WHERE $Partition.PartitionByYear([DateTime]) = 1 AND sessionid IS NOT null

Another option is to keep the GROUP BY or DISTINCT column order as it is, and add a presentation ORDER BY UserName, SessionID, UserId




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #1444683
Posted Saturday, April 20, 2013 6:36 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Friday, July 25, 2014 3:19 AM
Points: 4,320, Visits: 6,113
I am glad you saw this one Paul! I didn't think anyone else could give as clear of a description of why the OP was getting the observed behavior.

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1444705
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse