INCLUDE columns in nonclustered index which are in PK

  • Hi SSC,

    I was discussing  include columns with a co-worker because he had put primary key columns in his list of INCLUDE columns in a non-clustered index he built. I was going to suggest he remove them (because a non-clustered index already HAS to have a reference to the primary key), and perhaps he still should just for clarity, but I also looked at index stats and noticed that the two indices were identical in size and pages. the plus side, it would seem SQL is smart enough not to double count those columns.

    My question is this: Other than the fact that it's unnecessary, is there anywhere else SQL might stumble or actually incur extra overhead because of these columns being in the INCLUDE list? Or is it just a matter of style?

    FWIW, Here's the test I ran:

    use test
    go

    -- Create a table
    if object_id('Test.dbo.TestReindex') is not null drop table dbo.TestReindex
    create table Test.dbo.TestReindex
    (
      RID int identity(1,1),
      SomeGuid uniqueidentifier not null default newsequentialid(),
      SomeString varchar(36),
      SomeDate datetime2
      constraint PKC__TestReindex__RID primary key clustered (RID, SomeGuid)
    )

    -- Load some fake data
    insert into dbo.TestReindex (SomeString, SomeDate)
    select cast(newid() as varchar(36)), dateadd(minute, checksum(newid()) % 50, sysdatetime())
    from sys.all_objects a, sys.all_objects b

    -- Create a NCI with the PK columns INCLUDEd
    create nonclustered index IX__PKInclude on dbo.TestReindex (SomeString) include (RID, SomeGuid)

    -- Create a NCI without PK columns INCLUDED
    create nonclustered index IX__PKNoInclude on dbo.TestReindex (SomeString)

    -- Check index size
    select
      IndexName = i.Name,
      TableName = object_name(p.object_id),
      IndexRows = p.rows,
      DataCompression = data_compression_desc,
      SizeKB = 8 * au.used_pages,
      ps.in_row_data_page_count,
      ps.in_row_used_page_count,
      ps.in_row_reserved_page_count,
      ps.lob_used_page_count,
      ps.lob_reserved_page_count,
      ps.row_overflow_used_page_count,
      ps.row_overflow_reserved_page_count,
      ps.used_page_count,
      ps.reserved_page_count,
      ps.row_count
    from sys.partitions p
    inner join sys.dm_db_partition_stats ps
      on p.partition_id = ps.partition_id
       and p.object_id = ps.object_id
    inner join sys.indexes i
      on p.object_id = i.object_id
       and p.index_id = i.index_id
    inner join sys.allocation_units au
      on p.partition_id = au.container_id
    where p.object_id = object_id('Test.dbo.TestReindex')

    Executive Junior Cowboy Developer, Esq.[/url]

  • The index is using the columns that the clustered index is based on as the pointers to the row in the table.  Since by default the primary key is also clustered, most of the times you'll see that it is using the columns that the primary key is based on.  
     As you wrote SQL Server is smart enough not to use those columns twice.  If you include the columns that the clustered index is based on, it won't add them because it already has them in the index as the pointers to the real data.  Personally if those columns are needed in the index as included columns, I will add them. I do it just to be sure that if in the future someone will change the clustered index, then those columns will still be in the included columns.

    Adi

    --------------------------------------------------------------
    To know how to ask questions and increase the chances of getting asnwers:
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Adi Cohn-120898 - Tuesday, June 27, 2017 9:30 AM

    ... Personally if those columns are needed in the index as included columns, I will add them. I do it just to be sure that if in the future someone will change the clustered index, then those columns will still be in the included columns.

    Adi

    Interesting point; I had not thought about that.

    Executive Junior Cowboy Developer, Esq.[/url]

  • Xedni - Tuesday, June 27, 2017 9:38 AM

    Adi Cohn-120898 - Tuesday, June 27, 2017 9:30 AM

    ... Personally if those columns are needed in the index as included columns, I will add them. I do it just to be sure that if in the future someone will change the clustered index, then those columns will still be in the included columns.

    Adi

    Interesting point; I had not thought about that.

    Almost no one does. And almost no one really needs to. But every now and again someone gets blind-sided when some queries all of a sudden start going out to lunch because said columns are now "missing" from index(es).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Xedni - Tuesday, June 27, 2017 9:21 AM

    I was discussing  include columns with a co-worker because he had put primary key columns in his list of INCLUDE columns in a non-clustered index he built. I was going to suggest he remove them (because a non-clustered index already HAS to have a reference to the primary key)

    Actually that's not technically accurate.  Non-clustered indexes will have all the key columns of the clustered index, but that does not have to be the PK.  Indeed, best overall performance often demands that it not be.  For example, you might cluster by ( datetime, location, ident ) or by ( location, datetime, ident ) but have ident alone as the PK.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher - Tuesday, June 27, 2017 12:02 PM

    Xedni - Tuesday, June 27, 2017 9:21 AM

    I was discussing  include columns with a co-worker because he had put primary key columns in his list of INCLUDE columns in a non-clustered index he built. I was going to suggest he remove them (because a non-clustered index already HAS to have a reference to the primary key)

    Actually that's not technically accurate.  Non-clustered indexes will have all the key columns of the clustered index, but that does not have to be the PK.  Indeed, best overall performance often demands that it not be.  For example, you might cluster by ( datetime, location, ident ) or by ( location, datetime, ident ) but have ident alone as the PK.

    You're right, I misspoke. Side effect of most PK's being clustered 🙂

    Executive Junior Cowboy Developer, Esq.[/url]

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply