Clustered indexes

  • This is the most difficult question I've seen to date, and I've been on SSC a few years now. It's an excellent compilation of information, though, and I will use the question, explanation, and comments almost as a small study guide or textbook for the topic of clustered indexes.

    Thanks,

    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Of 928 respondents only 21 have answered it correctly. A whopping 2% correct.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • The part of me ready for trick questions missed the part about NCL having CL pointers in the IX pages. I thought "well, only the leaf level pages have pointers to the CL, so he's trying to trick us". Is my understanding of the data structure correct? Just looking for a clarification. Good question, though.

    Thanks,

    Eric

  • Strommy (5/6/2010)


    The part of me ready for trick questions missed the part about NCL having CL pointers in the IX pages. I thought "well, only the leaf level pages have pointers to the CL, so he's trying to trick us". Is my understanding of the data structure correct? Just looking for a clarification. Good question, though.

    Thanks,

    Eric

    Yes, that was the same thing that caused me to miss.

  • Nadrek (5/6/2010)


    This is basically a difference in terminology, and Microsoft's BOL entry is badly written.

    So, we have a table, with pages, and a clustered index:

    Page 1 contains a-b

    Page 2 contains b-t

    Page 3 contains u-w

    Page 4 contains w-z

    All are full.

    The pages that make up the table are in "clustered index order".

    However, on disk (where "physical" starts to actually matter, particularly on rotating or tape media), if we have 2 data files for this database/filegroup and the data's split evenly, even assuming the table isn't fragmented and the files aren't fragmented (pretty unlikely without good planning and maintenance, or luck): we're likely to have something like:

    Page 1 on Sector 500

    Page 2 on Sector 800

    Page 3 on Sector 501

    Page 4 on Sector 801

    If we have fragmentation at either OS (which SQL Server cannot fix) or SQL Server level, we might have:

    Page 1 on Sector 999

    Page 2 on Sector 232

    Page 3 on Sector 555

    Page 4 on Sector 77

    You are forgetting page splits. If you fill up page 2 with "s" records and the next contiguous free page is after page 4 in the data file (page 5), then you get pointers between page 2 and 5, which now contains half of your [b-t] rows. Now your physical order is off, but the logical order is intact because of the pointers between page 2 and page 5.

    Thanks,

    Eric

  • Good question. The only place where I disagree with author is location of clustered index columns inside the non-clustered indexes. They are located in leaf pages not in index (B-tree) pages.

    Andrei

  • Whoa! I returned to the computer after a few hours absence to find my mailbox almost overflowing with message notifications!

    I won't respond to all comments individually. I read a lot of compliments, for which I thank you all.

    I don't agree with the comments I read stating that 2% correct answers is good. When I submit a question, I want it to be difficult, but not almost impossible. I shoot for 20-30% correct on this type of "select all that apply" questions, a little more on the multiple choice questions since even blind guessing generates a significant percentage of correct answers on those. With hindsight, I should have divided the info in this question over two or three seperate questions; that would have been hard enough. As it is, I am glad Steve only gave 1 point to this question; otherwise it would have felt like cheating you out of your points.

    I also hear a loud strong dislike of the "check all that apply" kind of questions. I'll try to use less of those and more of the "select the single correct answer" type questions for future QotD submissions.

    Several people have rightfully commented on the "All nonclustered indexes include the clustered index columns in their index pages" option. I managed to include two errors in this single option.

    The first one is caused by typing too fast and insufficient proofreading. I intended this option to refer to leaf pages. As several people have commented, the root and intermediate pages do not contain the clustered index columns, this is only the case on leaf pages.

    The other error I made was that I did not think of heaps when adding this option (which is of course quite foolish, considering that I do mention them in the explanation). Now technically, one might say that if there is no clustered index then all clustered index columns (i.e., none) are added to the leaf pages. But that would be nitpicking and wordplay - exactly the kind of thing I hate with a vengeance in every kind of test, exam, or question.

    My apologies to those who got tripped up or confused by my errors on this question.

    I have also seen people quoting various fragments from Books Online that imply or even straightforward state that a clustered index does dictate the physical ordering of data on the disk. All I can see to that is that those Books Online is misleading or even wrong in those places. This is very easy to confirm by making a test database, creating a table, filling it in a way that ensures high fragmentation, and then using DBCC PAGE to check the contents of the actual database pages. I can't post a repro for this right now as it's already 1:30 AM where I live and I need to get some sleep, but I'll be happy to put someting together tomorrow if you guys want me to.

    If I have missed any remark or question that you feel should be adressed, or if you have new questions or concerns, let me know. I'm going to sleep now, but I'll have more time for SQL Server Central tomorrow.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • An excellent question - or was it several questions all rolled into one?

    Some extremely important concepts - unbelievably assigned just a single point! Madness.

    I got this right, but I did pause over this one:

    All nonclustered indexes include the clustered index columns in their index.

    ...not just because of the heap option but because the clustering keys will not be present in the non-leaf levels, if the non-clustered index is UNIQUE. The clustering key will still be at the leaf level - every index leaf row needs a row locator - so I decided to tick the box and was rewarded with a green tick and one point.

    On the physical ordering thing, I would just add that individual rows on the page may not be in physical clustered index either - only the page offset array entries are guaranteed to be in logical and physical order.

    Thanks Hugo.

  • Juts wanted to add my comment about the physical order of rows in table.

    In case when clustered index defined on table they are in "physical order" (as per BooksOnline), and this is true to some degree, as the word "physical" here does not mean that data in physical order as

    a) rows inside the data page may be in different physical order

    b) pages may not be in physical order inside the data file

    c) the data file may be fragmented and not in physical order on the disk

    d) the disk may be physically dispersed across a number of physical disks

    and so on ...

    This sequence may continue even further, but all this shows that any "physical" may happen to be logical if we look at this closer.

    😉

    Andrei.

  • UMG Developer (5/6/2010)


    I missed it just because of the physical order part... They are physically stored in the clustered order, maybe just not contiguously/in sequence. 😉 The BOL docs don't help much.

    Thats the part that I got wrong too. There isn't much on Physical order - we just have to infer it I think. One would really have to know about disk fragmentation to get that right I guess.

  • thanks for this wonderful question 🙂

  • Paul White NZ (5/6/2010)


    On the physical ordering thing, I would just add that individual rows on the page may not be in physical clustered index either - only the page offset array entries are guaranteed to be in logical and physical order.

    Paul, could you elaborate a bit more on this? BTW, thanks for the great articles on APPLY and Server-Side Paging.

    Hugo, thanks for a great question, learned something new again.

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

  • Jan Van der Eecken (5/7/2010)


    Paul White NZ (5/6/2010)


    On the physical ordering thing, I would just add that individual rows on the page may not be in physical clustered index either - only the page offset array entries are guaranteed to be in logical and physical order.

    Paul, could you elaborate a bit more on this? BTW, thanks for the great articles on APPLY and Server-Side Paging.

    Hugo, thanks for a great question, learned something new again.

    You're welcome!

    I'm not Paul, but I can elaborate as well. Suppose that a table has 500-byte rows, and a page is filled with 5 of them, that have been added in sequence of the clustered index key. Examining the page with DBCC PAGE will probably reveal a structure like this (for simplicity, I leave out the various page header fields and start counting bytes at the start of the actual row data).

    Bytes 1 - 500: Row #1

    Bytes 501 - 1000: Row #2

    Bytes 1001 - 1500: Row #3

    Bytes 1501 - 2000: Row #4

    Bytes 2001 - 2500: Row #5

    Bytes 2501 - 8000: unused

    Last ten bytes of page: five 2-byte values representing the decimal values 2001, 1501, 1001, 501, 1.

    These last 10 bytes are the "Row Offset Array". Reading from last to first, they tell SQL Server the start position of the first, second, third, fourth, and fifth row on the page.

    Now I add a new row that, according to clustered index order, sits between #2 and #3 (let's call the now one Row #2.5). Instead of moving 1500 bytes to a different location on the page, SQL Server simply adds the new row in the unused space and moves only the 2-byte locators in the Row Offset Array. The new page layout will be:

    Bytes 1 - 500: Row #1

    Bytes 501 - 1000: Row #2

    Bytes 1001 - 1500: Row #3

    Bytes 1501 - 2000: Row #4

    Bytes 2001 - 2500: Row #5

    Bytes 2501 - 5000: Row #2.5

    Bytes 3001 - 8000: unused

    Last twelve bytes of page: six 2-byte values representing the decimal values 2001, 1501, 1001, 2501, 501, 1.

    Reading the Row Offset Array backwards and locating the row, you get this order:

    From position 1: Row #1

    From position 501: Row #2

    From position 2501: Row #2.5

    From position 1001: Row #3

    From position 1501: Row #4

    From position 2001: Row #5

    So the physical order of the Row Offset Array matches the logical order imposed by the clustered index; the physical order of rows in the page does not.

    I hope this clarifies your question.

    (And Paul - thanks for bringing up this excellent point).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Jan Van der Eecken (5/7/2010)


    Paul, could you elaborate a bit more on this? BTW, thanks for the great articles on APPLY and Server-Side Paging.

    Gosh thanks Jan! Haven't seen you around for a while, so good to see you again.

    @hugo: Thank you for elaborating so clearly - saved me quite a lot of typing - win!!!

  • Thanks, Hugo, nicely explained.

    Paul, I haven't been away, just been sitting quietly on the sidelines observing and learning.

    --------------------------------------------------------------------------
    A little knowledge is a dangerous thing (Alexander Pope)
    In order for us to help you as efficiently as possible, please read this before posting (courtesy of Jeff Moden)[/url]

Viewing 15 posts - 31 through 45 (of 54 total)

You must be logged in to reply to this topic. Login to reply