Group by field, include or not include

  • Guys,

    I found some strange behavior of indexes (or I just think it is strange).

    It looks like if I add field as a key field into index it will gain less performance then I add group by field as included one. Can anyone tell what to do with group by fields, add as key or add as included?

  • Key columns, because the ordering of them may be useful to the optimiser and query processor. Very much depends on the rest of the index and the rest of the query though.

    http://sqlinthewild.co.za/index.php/2011/11/07/sql-university-advanced-indexing-sorting-and-grouping/

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (6/3/2013)


    Key columns, because the ordering of them may be useful to the optimizer and query processor. Very much depends on the rest of the index and the rest of the query though.

    http://sqlinthewild.co.za/index.php/2011/11/07/sql-university-advanced-indexing-sorting-and-grouping/

    I also thought that it should be key column, but I did some tests and these tests shows that it will be better to include that column. Aslo SQL Server advice to create index with included column.

    Did you do any tests about? Particullary for SQL Server 2012...

  • Evgeny (6/3/2013)


    Did you do any tests about?

    You mean other than the ones on the blog post I referenced?

    Particullary for SQL Server 2012...

    No, but neither the index architecture nor the methods for aggregating have changed in 2012, so will be the same as 2008. As I said, you want key ideally to provide the sort for a stream aggregate, but it depends entirely on what else is in the index and what the query looks like.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • I did test in this way:

    create table:

    create table dbo.orders

    (

    id int identity,

    orderDate datetime,

    orderType int,

    orderAmount money,

    orderDiscount money

    );

    populate it with data:

    declare @i int=0;

    while @i<100000

    begin

    insert dbo.orders(orderDate, orderType, orderAmount, orderDiscount)

    values(

    DATEADD(mi, -1*ABS(CHECKSUM(NEWID()) % 10000), GETDATE()),

    ABS(CHECKSUM(NEWID()) % 100),

    0.01*ABS(CHECKSUM(NEWID()) % 10000),

    0.01*ABS(CHECKSUM(NEWID()) % 1000)

    );

    set @i=@i+1;

    end

    creating indexes:

    create index I1 on dbo.orders(orderDate, orderType) include(orderAmount);

    create index I2 on dbo.orders(orderDate) include(orderType, orderAmount);

    test using queries:

    set statistics time on;

    set nocount on;

    select orderType, SUM(orderAmount)

    from dbo.orders with (index(I1))

    where orderDate>'20130501'

    group by orderType

    select orderType, SUM(orderAmount)

    from dbo.orders with (index(I2))

    where orderDate>'20130501'

    group by orderType

    Result:

    SQL Server Execution Times:

    CPU time = 312 ms, elapsed time = 315 ms.

    SQL Server Execution Times:

    CPU time = 296 ms, elapsed time = 298 ms.

    Without index hint, SQL Server chose I2 index. Anybody can explain this behavior?

  • After executing both queries many times, sometimes I1 overweight I2 but very rare. Looks like I2 is better, but it goes across with my knowledge.

  • It's because of the query form you're using. Please read the blog post I referenced above, it's discussed in there.

    If you want a test that will show that putting the group by column as index key is better, use a query with an equality predicate. This is why I said it's very dependant on the query form and the index.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Also I found that this index works much better for my query:

    create index I3 on dbo.orders(orderType, orderDate) include(orderAmount);

  • With the query form that you've got, you cannot both support the predicate and the group by with a single index (as explained in the blog post I referenced earlier), so it's one or the other. Usually it's the predicate that you want to support as that reduces the volume of data that you are reading off disk, through memory and into the query.

    I would imagine that in your case, with the way the data was populated, that the predicate matches most of the table, and hence there's little filtering to do and why indexing for the group by helps. This is specific to this query and data distribution and should not be assumed to be a general conclusion.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply