Clustered and NonClustered Index

  • I need an explanation may be pointing to some whitepaper for the explanation of below:

    In a unique non clustered index, the clustered key comes to leaf level only

    in a non unique non clustered index, the clustered key comes to leaf and non leaf level

    I was watching Gail's video

    https://www.youtube.com/watch?v=y_bl9dArtmA

    So may be Gail can help me in this 🙂

  • er.mayankshukla (1/8/2016)


    I need an explanation may be pointing to some whitepaper for the explanation of below:

    In a unique non clustered index, the clustered key comes to leaf level only

    in a non unique non clustered index, the clustered key comes to leaf and non leaf level

    I was watching Gail's video

    https://www.youtube.com/watch?v=y_bl9dArtmA

    So may be Gail can help me in this 🙂

    I don't think this is correct. Any nonclustered index (whether unique or not) adds the clustered index column(s) to the leaf pages only.

    You probably misunderstood something Gail said. I started to watch the video but it's a one-hour talk and I don't have that much time to spare.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Yeah I am interested in this as I thought for a non unique nonclustered index that the CI key needs to go up the tree in order to efficiently locate rows in the leaf level.

  • Yes, May be Gail can answer this

  • Yeah, I too thought the keys were stored at the leaf level. I'm positive Gail knows that. It must be a misstatement or misinterpretation of what she said.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hugo Kornelis (1/8/2016)


    er.mayankshukla (1/8/2016)


    I need an explanation may be pointing to some whitepaper for the explanation of below:

    In a unique non clustered index, the clustered key comes to leaf level only

    in a non unique non clustered index, the clustered key comes to leaf and non leaf level

    I was watching Gail's video

    https://www.youtube.com/watch?v=y_bl9dArtmA

    So may be Gail can help me in this 🙂

    I don't think this is correct. Any nonclustered index (whether unique or not) adds the clustered index column(s) to the leaf pages only.

    You probably misunderstood something Gail said. I started to watch the video but it's a one-hour talk and I don't have that much time to spare.

    The subject matter for this post starts at time marker 11:50.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/8/2016)


    The subject matter for this post starts at time marker 11:50.

    Thanks, Jeff! Found it, watched it, and Gail is pretty explicit about it.

    Coming from a lot of people, I would have put this aside as a misunderstanding from the presenter. Coming from Gail who normally knows her stuff very well, I will postpone judgement until after I have double checked this in a test database.

    BRB!


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • This seems relevant as well: http://www.sqlskills.com/blogs/kimberly/nonclustered-indexes-lookup-key-btree/

    Cheers!

  • Okay, did a small test and confirmed that my faith in Gail was justified (and my faith in my own understanding of index internals was not). After creating a test table and two nonclustered indexes, one unique and one nonunique, and adding 10,000 rows; I used DBCC IND to find the root pages and DBCC PAGE to examine their contents. The nonunique index did indeed have extra data added to each entry, and that data corresponds to the primary key values.

    I then decided to check the documentation where I got my understanding from - and though phrased in a hard-to-understand way, I must admit that it's there and that I just had overlooked it so far. (See https://msdn.microsoft.com/en-us/library/ms190620.aspx - step 2, item 2 in the list, first and second paragraph).

    Obviously, the next question is why this is implemented like this. Why would SQL Server need the clustered index key on the non-leaf levels? (Which is obviously also what the original question in this thread was after).

    NOTE: This is where I enter into speculation mode!

    Think about when a row is deleted (or updated affecting data in the nonclustered index). If the row is located through another index, then SQL Server needs to find it in this nonclustered index as well. The indexed value will be known (because updates and deletes always start from the clustered index), but now the right row has to be found. Now suppose that the row affected has a clustered index value of 12345, and the value in the nonclustered index is "green". The table has 1,000,000 entries, and approximately 25,000 of them have the value "green" in the indexed column. With the clustered index value in the leaf pages only, SQL Server has to seek to the first "green" row, read the clustered index value from the leaf page, check if it's the right one and repeat - it will probably have to visit a lot of leaf pages in that process.

    So what I guess is that the clustered index value is added to the nonclustered index, not as just some additional data in the index but as an additional indexed column, and hence the data is added to the nonleaf pages as well. Now SQL Server can directly seek for row "green / 12345", and save a lot of IO.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Thanks for the link, Jacob! That post confirms what I put out as speeculation.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo Kornelis (1/8/2016)


    Okay, did a small test and confirmed that my faith in Gail was justified (and my faith in my own understanding of index internals was not). After creating a test table and two nonclustered indexes, one unique and one nonunique, and adding 10,000 rows; I used DBCC IND to find the root pages and DBCC PAGE to examine their contents. The nonunique index did indeed have extra data added to each entry, and that data corresponds to the primary key values.

    I then decided to check the documentation where I got my understanding from - and though phrased in a hard-to-understand way, I must admit that it's there and that I just had overlooked it so far. (See https://msdn.microsoft.com/en-us/library/ms190620.aspx - step 2, item 2 in the list, first and second paragraph).

    Obviously, the next question is why this is implemented like this. Why would SQL Server need the clustered index key on the non-leaf levels? (Which is obviously also what the original question in this thread was after).

    NOTE: This is where I enter into speculation mode!

    Think about when a row is deleted (or updated affecting data in the nonclustered index). If the row is located through another index, then SQL Server needs to find it in this nonclustered index as well. The indexed value will be known (because updates and deletes always start from the clustered index), but now the right row has to be found. Now suppose that the row affected has a clustered index value of 12345, and the value in the nonclustered index is "green". The table has 1,000,000 entries, and approximately 25,000 of them have the value "green" in the indexed column. With the clustered index value in the leaf pages only, SQL Server has to seek to the first "green" row, read the clustered index value from the leaf page, check if it's the right one and repeat - it will probably have to visit a lot of leaf pages in that process.

    So what I guess is that the clustered index value is added to the nonclustered index, not as just some additional data in the index but as an additional indexed column, and hence the data is added to the nonleaf pages as well. Now SQL Server can directly seek for row "green / 12345", and save a lot of IO.

    Very well done, Hugo. Your research and the link that Jacob posted saved me a whole lot of time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jacob Wilkins (1/8/2016)


    This seems relevant as well: http://www.sqlskills.com/blogs/kimberly/nonclustered-indexes-lookup-key-btree/

    Cheers!

    Excellent reference. You can't do much better that getting it straight from the horse's mouth. Thanks, Jacob.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • er.mayankshukla (1/8/2016)


    In a unique non clustered index, the clustered key comes to leaf level only

    in a non unique non clustered index, the clustered key comes to leaf and non leaf level

    Yup, that's correct. What clarification do you need?

    Basically, if the nonclustered index is defined as UNIQUE, the clustered index key is treated as include columns (unless explicitly specified in the nonclustered index key), if the nonclustered index is not defined as unique, the clustered index key is treated like key columns and added at the end of the explicitly specified key.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Edit: nm

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi All,

    I took some time to write a detail article on this, which might be useful for all for our reference

    https://sqlmayank.wordpress.com/

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply