Clustered Indexes

  • Good question, good lively discussion. Thanks.

  • Hugo, thanks VERY much for your script and, even more, your explanation. Simple and to the point, thorough.

    Posts like this one are why I sometimes spend more time than I should at this website! 🙂

    Rich

  • Good questions..and thanks to the comments a much more detailed explaination was provided!!

  • I answered this question using the common misconception as a guide. Usually second-guessing the QotD is guaranteed to trap us.

    The discussion of physical vs. logical ordering is certainly interesting in an academic sense. If I was encountering problems with the edge case where physical order was actual impacting performance, I would appreciate finding this distinction quickly rather than delving into deeply esoteric notes.

    However, for the day-to-day operation of a developer and database steward [not quite as lofty as DBA but still responsible for data health] - is there a compelling reason to be pedantic about the "sometimes physical disorder" of a clustered index? Doesn't the DB engine have enough optimization intelligence to ensure that the clustered index performs as well as possible across the largest number of use-cases? Armed with this new information about clustered indexes, am I going to do anything differently?

    I'm not asking this to be a griefer. I am curious if I am being negligent of my responsibility for data if I don't really care that I don't know this level of minutiae. I am grateful that others have thought about (and managed) this for me. I build framework code so other developers don't need to worry about the heavy-lifting details; I am content to accept the same from grown-up DBAs and DBMS developers. Standing on the shoulders of giants, etc. 🙂

  • Very good question - actually gets the blood flowing.

    Steve Jimmo
    Sr DBA
    “If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan

  • the cost of changing the physical order within a page is not that much; the page is in cache anyway, it is alwayys read and written as a whole, so the only cost is that of moving at most 8,000 bytes around in a memory block.

    I have to agree with Hugo on this. I've done a lot of work in C++ with parsing text files and sorting and caching data. I/O (disk/network) is where you really take the hit. In-memory manipulations using efficient, low-level code can be very fast.

  • Nice question.

    Thanks Hugo and Gail for explaining this quite nicely.

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • I regretfully find myself in the 66% majority who answered this question incorrectly. Hopefully, all 529 of us learned something. I know I did. Thanks for the question.

  • wware (10/5/2010)


    I regretfully find myself in the 66% majority who answered this question incorrectly. Hopefully, all 529 of us learned something. I know I did. Thanks for the question.

    +1

    Best QotD in a long time.

  • Whether you think SQL Server should be preserving physical order of the data or not doesn't matter. The facts of the case is that it only preserves logical ordering of the data. If a new record is added to an existing page, SQL Server puts it in the first spot with enough space whether that is in the correct physical order or not.

    Additionally, if SQL Server needs to split the page to insert a record, it does not split the page at the point where it needs to insert the record. It splits it approximately in half and then inserts the record into the page in which it logically fits. Again, this would be at the first slot big enough for it fit the record in and may even be the last record in the page physically even though logicially is in the middle or beginning or anywhere else.

    Also, SQL Server NEVER reads in records from disk. It reads in pages, the location of the record on the page does NOT affect the number disk IO's required to read in the page. Reading in a single page of data is 1 IO whether it is physically sorted by the data or not. So maintaining the physical order of the data in memory would increase the memory workload for the server wihtout getting any benefit.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • rtelgenhoff (10/5/2010)


    I have to agree with Hugo on this. I've done a lot of work in C++ with parsing text files and sorting and caching data. I/O (disk/network) is where you really take the hit. In-memory manipulations using efficient, low-level code can be very fast.

    Since the data is written and read in whole pages only, why would you add extra overhead to your memory operations when you wouldn't be getting any benefit from it?

    In-memory manipulations are indeed fast, but SQL is not written to do 1 thing at a time. We're talking about hundreds, thousands, or even millions of memory operations per second. Do you really want to increase the workload for memory operations by a factor of as much as 8000 times?

    Additionally, if I'm inserting a row, and SQL has to reorder all of the data in the page in memory, I now have to be able to get a lock on the whole page in order to insert a single row that no other process is trying to use. I've now lost all ability to lock only a single row.


    My blog: SQL Soldier[/url]
    SQL Server Best Practices:
    SQL Server Best Practices
    Twitter: @SQLSoldier
    My book: Pro SQL Server 2008 Mirroring[/url]
    Microsoft Certified Master: SQL Server, Data Platform MVP
    Database Engineer at BlueMountain Capital Management[/url]

  • Thanks for the question, I think it helps re-enforce some of the less known internal workings of SQL Server.

  • I got it right, but a question regarding dedicated extents. I know page splits will happen, etc etc, but does an extent stay compact for a range of data, or will it split out of the extent and return to it with page pointers?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (10/5/2010)


    I got it right, but a question regarding dedicated extents. I know page splits will happen, etc etc, but does an extent stay compact for a range of data, or will it split out of the extent and return to it with page pointers?

    If you ask what I think you ask, then the answer is that, if a page has to be split, the new page will almost certainly be allocate in a different extent.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Craig Farrell (10/5/2010)


    I got it right, but a question regarding dedicated extents. I know page splits will happen, etc etc, but does an extent stay compact for a range of data, or will it split out of the extent and return to it with page pointers?

    Test it out and see (hint DBCC IND and DBCC PAGE)

    Think about it. What happens if all 8 pages in the dedicated extent are full and a row is inserted that has to go onto page 4 of the extent?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply