Question about included columns in nonclustered indexes: Where exactly are they stored?

  • I figured an indexing question would best be answered by folks who focus on performance tuning, if there was a better forum for this please let me know.

    Anyway I was looking at: http://msdn.microsoft.com/en-us/library/ms177484(SQL.90).aspx and an article talking about indexing.

    With included columns you can add them to a NONclustered index so that when a query uses that index additional columns the query SELECTs can be "included columns" and returned as part of the INDEX SEEK instead of having to hop over to the table, lookup the table row, and then scan that row of the table to get whatever columns are needed.

    Classic example:

    nonclustered index on: FirstName, LastName.

    included columns: EmailAddress, PhoneNumber

    The table would also of course have a PK which is part of the clustered index.

    The clustered index might have other columns, maybe LastName or "whatever".

    It sorts the table and is used to quickly locate records.

    Here is what I don't get:

    I do my query: SELECT FirstName, LastName, EmailAddress, PhoneNumber

    FROM table

    WHERE FirstName = 'John' AND LastName = 'Doe'

    It then uses my NONclustered index to perform my query AND it would then locate the record(s) that match my criteria.

    What happens then?

    It looks like it would then have a pointer to the clustered index which it would use to locate the record in the table, and retrieve the data.

    Except in this case it is a COVERING INDEX... and it has Email and Phone stored in the NONclustered index's leaf level.

    So... what does it do?

    Does it grab Email and Phone and THEN still jump to the clustered index to locate the row in the table so it can pull out First and Last Name?

    What I'm getting at is this:

    CLUSTERED indexes store the actual table at their leaf level, is that correct?

    NONclustered stores pointers to the clustered index (the table) or to artificial keys in a heap (tables with no clustered index), right?

    So... how does the included column on a NONclustered index allow the query to avoid having to lookup the data in the table itself?

    That's the part I don't get.

    It seems like it grabs Phone and Email from its leaf level, but still has to jump to the clustered index to get the Fname and Lname since it doesn't actually store those data values in the index, just pointers to them...

    So what am I not following/understanding?

    Thanks!

  • The columns defined in the INCLUDE statement are stored at the leaf level of the non-clustered index. There's no need to go to the cluster or a heap to retrieve the data. Storing the data increases the size of the index, but it doesn't change the size and levels of the B-Tree since the index keys are still the same. That means, for the most part, the index seek operations should be as fast as ever.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Maxer (10/1/2008)


    It seems like it grabs Phone and Email from its leaf level, but still has to jump to the clustered index to get the Fname and Lname since it doesn't actually store those data values in the index, just pointers to them...

    FName and LName are the index keys, so they are stored at all levels of the index. They have to be, how else would SQL be able seek on those columns.

    In a nonclustered index, the nonclustered index keys and the clustered index keys are stored at all levels of the index. The include columns are only stored at the leaf level. The clustering keys are there as a 'pointer' to the row, in case a bookmark lookup is required.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (10/1/2008)


    Maxer (10/1/2008)


    It seems like it grabs Phone and Email from its leaf level, but still has to jump to the clustered index to get the Fname and Lname since it doesn't actually store those data values in the index, just pointers to them...

    FName and LName are the index keys, so they are stored at all levels of the index. They have to be, how else would SQL be able seek on those columns.

    In a nonclustered index, the nonclustered index keys and the clustered index keys are stored at all levels of the index. The include columns are only stored at the leaf level. The clustering keys are there as a 'pointer' to the row, in case a bookmark lookup is required.

    Ah, of course. Indeed otherwise how would it be able to seek on those columns 🙂

    Well that explains that.

    Thanks for pointing out my obvious logical error there. I knew I was failing to take something into account, and that would certainly be it!

  • Actually, the included columns make the leaf index entries larger. Therefore, fewer entries per page, which translates to more B-tree above. So there is a size cost for the entire tree structure based on the percent increase in size in the index entry. Increase in tree depth and seek time will be logarithmic increase.

    Still, well defined include columns can yield a big improvement.

    David Lathrop
    DBA
    WA Dept of Health

  • I know it's an old topic but how is included columns affecting memory footprint? Some say higher use to put index in.memory cache. Some say lower since less pages are read and cached since it is in the index.

    K

  • Kristian Ask (11/16/2012)


    I know it's an old topic but how is included columns affecting memory footprint? Some say higher use to put index in.memory cache. Some say lower since less pages are read and cached since it is in the index.

    K

    There is less memory footprint because the idea behind a non-clustered index with INCLUDE columns is to help avoid bookmark lookups in the query plan. Bookmark lookups are operations that go and fetch data from the clustered index (ie. the actual table) to help return columns that are in the SELECT portion of the query. Imagine a very wide table with tens or hundreds of columns being loaded into memory for the bookmark lookup. The memory footprint will be higher than if only a select few columns are instead loaded into memory as include columns in a noncl. Index.

    Does my somewhat muddled explanation make sense? 🙂

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • I got it... sound logical. One more thing though.

    If the included columns is not creating a covering index, will the included columns still be read or is it then only using the table?

  • The included columns will still be read.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Kristian Ask (11/19/2012)


    If the included columns is not creating a covering index, will the included columns still be read or is it then only using the table?

    yes they will be retrieved with the help of keylook up or RID lookup.

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply