Clustered indexes

  • @hugo

    Thanks for that - I managed to follow your explanation - and I'm relieved to find out it has nothing to do with disk fragmentation (which was all I could think of for the physical difference)

  • Good question Hugo! I like these kind of questions that really make us think about our conceptions/misconceptions of how things work in SQL Server due to things we have read straight from BOL. This is an excellent example of that. As you seem to point out, we have to be very careful about taking every thing we read in BOL literally. Even Mickeysoft can mislead us about their own product. 😀

    "Technology is a weird thing. It brings you great gifts with one hand, and it stabs you in the back with the other. ...:-D"

  • I believe this explanation is incorrect for the way the question was asked. Your question asked if the rows were physically or logically ordered, not if the pages were physically or logically ordered. All the rows are indeed physically ordered on their pages based on the clustered index.

    I answered the question correctly based on your explanation, and other research, because it is a GOOD question, I'm just putting in my 2 cents.

    Hugo Kornelis (5/6/2010)


    satya11001-1013569 (5/5/2010)


    I have a doubt regarding the logical order of rows for Clustered Index.

    Main difference between Clustered and Non-Clustered is Clustered is physical arrangement of rows and Non-Clustered is logical arranging of rows .

    Hi Satya,

    This is not correct. Clustered and non-clustered indexes are built very similar. The only difference is the actual contents of the leaf pages.

    For a clustered index, root and intermediate pages contain the index key and a pointer to the lower-level page; leaf pages contain the all the columns (except LOB data, such as varchar(max) or xml).

    For a nonclustered index, root and intermediate pages contain the index key and a pointer to the lower-level page; leaf pages contain the index key and a pointer to the data page where the complete row can be found. This pointer is either the clustered index key, or (if the table does not have a clustered index) the RID.

    Can you explain a bit more on the arrangement of actual data rows for Clustered and Non-Clustered index.

    A sketch of the index structure (here for a clustered index, but as I said: the only difference for a nonclustered index is the actual contents of the leaf pages) can be found on http://msdn.microsoft.com/en-us/library/ms177443.aspx. The blue pointers are the pointers to lower level index pages just mentioned. The black arrows indicate the "next page"" / "previous page" pointers found on every page in an index. These pointer chains are used when an index is processed in its logical order.

    The physical arrangement of these pages can be completely different. As an example, let's suppose that an existing table happens to have all its 10,000 leaf pages on the first 10,000 pages in the database file (very unlikely in reality, but work with me). Now an INSERT is executed and the new row should be inserted in the fifth page - but that page is already full, so it has to be split - half the data remains on page 5, half the data goes to a new page that, logicallly, belongs between page 5 and the "old" page 6. SQL Server will not physically move "old" pages 6 through 10,000 up one location - that would really kill performance! Instead, a new page will be allocated "somewhere" in the data file. This new page will have its "previous page" and "next page" pointers pointing to the "old" pages 5 and 6 respectively, and the "next page" pointer on the old page 5 and the "previous page" pointer on the old page 6 will point to the new page. The result is that the pointer chain now still implements the logical order of the index; the physical location of pages however does not.

    (And since this same allocation mechanism has been used when the first 10,000 pages were allocated, it is indeed extremely unlikely to find 10,000 data pages allocated consecutively.

  • dunnjoe (5/7/2010)


    I believe this explanation is incorrect for the way the question was asked. Your question asked if the rows were physically or logically ordered, not if the pages were physically or logically ordered. All the rows are indeed physically ordered on their pages based on the clustered index.

    Thanks for the feedback, dunnjoe!

    However, the question was about the ordering of rows in a table, not rows on a page, so I don't think that part of the question was phrased incorrect.

    And even if it had been about rows on a page - not even those are physically ordered; see my reply to Jan, a few posts back (near the top of this page).


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Hugo,

    Thanks for the response. I stand corrected and will recreate my clustered indexes more frequently 😛

    Thanks,

    Joe

    Hugo Kornelis (5/7/2010)


    dunnjoe (5/7/2010)


    I believe this explanation is incorrect for the way the question was asked. Your question asked if the rows were physically or logically ordered, not if the pages were physically or logically ordered. All the rows are indeed physically ordered on their pages based on the clustered index.

    Thanks for the feedback, dunnjoe!

    However, the question was about the ordering of rows in a table, not rows on a page, so I don't think that part of the question was phrased incorrect.

    And even if it had been about rows on a page - not even those are physically ordered; see my reply to Jan, a few posts back (near the top of this page).

  • Cliff Jones (5/6/2010)


    What threw me off was "All nonclustered indexes include the clustered index columns in their index pages" . I thought this was incorrect because the clustered index columns are contained in the Leaf Node Data Pages and not every index page.

    This nearly threw me off too, but I decided that as it didn't say "in all their index pages" it was actually correct and picked the right answer. I can't say that I particularly like questions where one has to take a lawyer-like attitude to the phrasing, but any dba or developer has to read the small print in requirement statements so maybe it's good practice for us and therefor I don't particularly dislike them either (unless they are really silly extreme cases of it, which this one isn't).

    Tom

  • Tom.Thomson (5/11/2010)


    Cliff Jones (5/6/2010)


    What threw me off was "All nonclustered indexes include the clustered index columns in their index pages" . I thought this was incorrect because the clustered index columns are contained in the Leaf Node Data Pages and not every index page.

    This nearly threw me off too, but I decided that as it didn't say "in all their index pages" it was actually correct and picked the right answer. I can't say that I particularly like questions where one has to take a lawyer-like attitude to the phrasing, but any dba or developer has to read the small print in requirement statements so maybe it's good practice for us and therefor I don't particularly dislike them either (unless they are really silly extreme cases of it, which this one isn't).

    I missed this one as well, because I read the clustered index columns to mean all columns, not just the columns comprising the clustered index key. Since the clustered index is the base table, and since all non-clustered indexes do NOT include all columns in the base table... I died.

    Really good question though.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The Dixie Flatline (5/14/2010)


    Tom.Thomson (5/11/2010)


    Cliff Jones (5/6/2010)


    What threw me off was "All nonclustered indexes include the clustered index columns in their index pages" . I thought this was incorrect because the clustered index columns are contained in the Leaf Node Data Pages and not every index page.

    This nearly threw me off too, but I decided that as it didn't say "in all their index pages" it was actually correct and picked the right answer. I can't say that I particularly like questions where one has to take a lawyer-like attitude to the phrasing, but any dba or developer has to read the small print in requirement statements so maybe it's good practice for us and therefor I don't particularly dislike them either (unless they are really silly extreme cases of it, which this one isn't).

    I missed this one as well, because I read the clustered index columns to mean all columns, not just the columns comprising the clustered index key. Since the clustered index is the base table, and since all non-clustered indexes do NOT include all columns in the base table... I died.

    Really good question though.

    Hugo concedes that that particular answer was not worded correctly, so that is good enough for me.

  • Hmm...

    "A clustered index determines the physical order of data in a table. A clustered index is analogous to a telephone directory, which arranges data by last name. Because the clustered index dictates the physical storage order of the data in the table, a table can contain only one clustered index. ."

    "A clustered index is particularly efficient on columns that are often searched for ranges of values. After the row with the first value is found using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent."

    http://msdn.microsoft.com/en-us/library/aa933131(SQL.80).aspx

  • m.bouffard (8/25/2010)


    Hmm...

    "A clustered index determines the physical order of data in a table. A clustered index is analogous to a telephone directory, which arranges data by last name. Because the clustered index dictates the physical storage order of the data in the table, a table can contain only one clustered index. ."

    "A clustered index is particularly efficient on columns that are often searched for ranges of values. After the row with the first value is found using the clustered index, rows with subsequent indexed values are guaranteed to be physically adjacent."

    http://msdn.microsoft.com/en-us/library/aa933131(SQL.80).aspx

    The first quote is correct. The clustered index does "dictate" the physical storage order. That does not imply that the ordering of the physical storage matches that of the index. Data pages can be scatered through the file; traversing the index in order is done by following the "next page"/"previous page" pointers in the data pages.

    The second quote is incorrect. Rosw with subsequent indexed values are guaranteed to be on the same page or on the next page when following the the pointer chain. They might be physically adjacent, but there is no guarantee at all.

    Here is how the current version of Books Online describes clustered indexes: http://msdn.microsoft.com/en-us/library/ms177443.aspx


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 10 posts - 46 through 54 (of 54 total)

You must be logged in to reply to this topic. Login to reply