Compatibility of streaming aggregate operator and partitioning.

  • I've got a 350 million row partitioned table that I'm trying to aggregate. The plan I get uses a HASH Aggregate. I would like to get a query plan that uses a streaming aggregate so results can be returned to the client faster. Here is the original query (Schema is at the end):

    SELECT userid, sessionid, username FROM dbo.AttemptedLogin

    WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null

    GROUP BY userid, UserName, SessionID

    Schema for the Table and indexes are fairly straight forward. The clustered index is on (userid, datetime). There is a Non-clustered index on (Username, sessionid). Both indexes are partitioned on (DateTime). Since the query is limited to one partition and the columns being grouped on are either in the clustered index or the NC index key, the query optimizer should be able to generate a plan using a streaming aggregate that scans the NC index.

    However, I am unable to produce the desired plan. I've tried 4 variations: with/without MAXDOP 1, with/without query "ORDER GROUP" query hint. The plans for these are in the attachment.

    --This is the query that should be producing a streaming agg

    SELECT userid, sessionid, username FROM dbo.AttemptedLogin

    WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null

    GROUP BY userid, UserName, SessionID

    --Still get a Hash Aggregate without parallelism

    SELECT userid, sessionid, username FROM dbo.AttemptedLogin

    WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null

    GROUP BY userid, UserName, SessionID OPTION (MAXDOP 1)

    --Prevent HASH Aggregation, and we get a distinct sort instead

    SELECT userid, sessionid, username FROM dbo.AttemptedLogin

    WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null

    GROUP BY userid, UserName, SessionID OPTION (ORDER GROUP)

    --Still get distinct sort

    SELECT userid, sessionid, username FROM dbo.AttemptedLogin

    WHERE $Partition.PartitionByYear(datetime) = 1 AND sessionid IS NOT null

    GROUP BY userid, UserName, SessionID OPTION (MAXDOP 1, ORDER GROUP)

    It there something preventing the use of a streaming aggregate for this query, or is this a bug/limitation of the QO?

    Create Schema for the table:

    CREATE PARTITION FUNCTION [PartitionByYear](datetime) AS RANGE RIGHT FOR

    VALUES (N'2005-01-01T00:00:00.000', N'2006-01-01T00:00:00.000', N'2007-01-01T00:00:00.000',

    N'2008-01-01T00:00:00.000', N'2009-01-01T00:00:00.000', N'2010-01-01T00:00:00.000',

    N'2011-01-01T00:00:00.000', N'2012-01-01T00:00:00.000', N'2014-01-01T00:00:00.000')

    GO

    /****** Object: PartitionScheme [PartYearsToArchiveAndPrimary] Script Date: 4/19/2013 11:04:53 PM ******/

    CREATE PARTITION SCHEME [PartYearsToArchiveAndPrimary] AS PARTITION [PartitionByYear]

    TO ([Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [Archive], [PRIMARY], [PRIMARY])

    GO

    CREATE TABLE [dbo].[AttemptedLogin](

    [ObjectId] [int] IDENTITY(1,1) NOT NULL,

    [DateTime] [datetime] NOT NULL,

    [UserId] [int] NOT NULL,

    [UserName] [varchar](30) NULL,

    [Password] [varchar](30) NULL,

    [SessionID] [varchar](30) NULL,

    [IPAddress] [varchar](16) NULL,

    [ProxyAddress] [varchar](16) NULL,

    [CountryCode] [char](2) NULL,

    CONSTRAINT [PK_NewAttemptedLogin] PRIMARY KEY NONCLUSTERED

    (

    [ObjectId] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,

    ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PartYearsToArchiveAndPrimary]([DateTime])

    GO

    /****** Object: Index [CIX_AttemptedLogin] Script Date: 4/19/2013 11:02:19 PM ******/

    CREATE CLUSTERED INDEX [CIX_AttemptedLogin] ON [dbo].[AttemptedLogin]

    (

    [UserId] ASC,

    [DateTime] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,

    DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON

    ) ON [PartYearsToArchiveAndPrimary]([DateTime])

    GO

    /****** Object: Index [IX_AttemptedLogin_UserNameSessionID] Script Date: 4/19/2013 11:02:20 PM ******/

    CREATE NONCLUSTERED INDEX [IX_AttemptedLogin_UserNameSessionID] ON [dbo].[AttemptedLogin]

    (

    [UserName] ASC,

    [SessionID] ASC

    )

    WHERE ([SessionId] IS NOT NULL)

    WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,

    DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,

    FILLFACTOR = 90) ON [PartYearsToArchiveAndPrimary]([DateTime])

    GO

    Thanks!

  • Forgot the include information about the machine:

    Microsoft SQL Server 2012 (SP1) - 11.0.3128.0 (X64)

    Dec 28 2012 20:23:12

    Copyright (c) Microsoft Corporation

    Enterprise Edition: Core-based Licensing (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

  • Scanning the nonclustered index, there is an implied first key on the partition ID, so the following can use a Stream Aggregate:

    SELECT sessionid, username

    FROM dbo.AttemptedLogin

    WHERE sessionid IS NOT null

    GROUP BY $Partition.PartitionByYear([DateTime]), UserName, SessionID

    Similarly, if a single partition is statically specified, the following also results in a Stream Aggregate, since the keys of the nonclustered index are (UserName, SessionID):

    SELECT SessionID, UserName FROM dbo.AttemptedLogin

    WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null

    GROUP BY UserName, SessionID

    The problem in the original code is that it adds UserID to the mix, and fails to use a Stream Aggregate:

    SELECT UserId, SessionID, UserName FROM dbo.AttemptedLogin

    WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null

    GROUP BY UserId, UserName, SessionID

    Now UserID forms part of the key of the nonclustered index since it is not declared as unique, but it comes after the nonclustered keys. Writing the GROUP BY clause to reflect the key order results in the Stream Aggregate you are looking for:

    SELECT UserId, SessionID, UserName FROM dbo.AttemptedLogin

    WHERE $Partition.PartitionByYear([DateTime]) = 1 AND SessionID IS NOT null

    GROUP BY UserName, SessionID, UserId

    Should the order of the GROUP BY keys matter? No. In that sense, it is a limitation, in the same way that writing a query with several window functions might result in an unnecessary sort depending on the written order of the query.

    It may be more natural (and easier for future maintenance) if you explicitly include UserID in the nonclustered index definition.

    edit: In response to your other Twitter question, yes the same considerations apply to DISTINCT:

    -- Sort

    SELECT DISTINCT UserId, UserName, SessionID FROM dbo.AttemptedLogin

    WHERE $Partition.PartitionByYear([DateTime]) = 1 AND sessionid IS NOT null

    -- No sort

    SELECT DISTINCT UserName, SessionID, UserId FROM dbo.AttemptedLogin

    WHERE $Partition.PartitionByYear([DateTime]) = 1 AND sessionid IS NOT null

    Another option is to keep the GROUP BY or DISTINCT column order as it is, and add a presentation ORDER BY UserName, SessionID, UserId

  • I am glad you saw this one Paul! I didn't think anyone else could give as clear of a description of why the OP was getting the observed behavior.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply