clustered index usage

  • p.s.  We did do something great on this thread... although it's gotten a bit confrontational, it IS bringing a lot of things to light that people might not have known about and it IS (hopefully) getting them to think a whole lot more things on this subject that they probably ever thought about before.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    p.s.  We did do something great on this thread... although it's gotten a bit confrontational, it IS bringing a lot of things to light that people might not have known about and it IS (hopefully) getting them to think a whole lot more things on this subject that they probably ever thought about before.

    Indeed Jeff. I am certainly still learning.

    ----------------------------------------------------

  • Brahmanand Shukla wrote:

    Brahmanand Shukla wrote:

    I've worked on several applications which ride on the external data being received as a flat file every day on an SFTP. The processing time was going beyond 4-5 hours, leading to the unavailability of the system during business hours. CI's were the key contributors here.

    Hi Brahmanand : Just for my own curiousity >

    Was the index already existing on the table as you would insert data, or was there data thrown into a heap and then a clustered index built on top of it? The problem I would see with the first scenario is that it may be maintaining a lock at the source and destination, unlike the 2nd scenario. Again, just for my own curiosity as I dont want to move to far off topic on the post 🙂

    ----------------------------------------------------

  • MMartin1 wrote:

    Brahmanand Shukla wrote:

    Brahmanand Shukla wrote:

    I've worked on several applications which ride on the external data being received as a flat file every day on an SFTP. The processing time was going beyond 4-5 hours, leading to the unavailability of the system during business hours. CI's were the key contributors here.

    Hi Brahmanand : Just for my own curiousity >

    Was the index already existing on the table as you would insert data, or was there data thrown into a heap and then a clustered index built on top of it? The problem I would see with the first scenario is that it may be maintaining a lock at the source and destination, unlike the 2nd scenario. Again, just for my own curiosity as I dont want to move to far off topic on the post 🙂

    It was the first scenario as you mentioned.

  • So just to clarify. if i have a clustered index on say contactid and invoiceid however none of the updates are on those columns then there wouldn't be a need to reorder on update correct? is there any overhead related to the clustered index if updates are applied to columns that arent part of the clustered index?

  • Snargables wrote:

    So just to clarify. if i have a clustered index on say contactid and invoiceid however none of the updates are on those columns then there wouldn't be a need to reorder on update correct? is there any overhead related to the clustered index if updates are applied to columns that arent part of the clustered index?

    No more than anything else, and, less than some because you can use the key values of the index to find the row(s) that have to updated.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Snargables wrote:

    So just to clarify. if i have a clustered index on say contactid and invoiceid however none of the updates are on those columns then there wouldn't be a need to reorder on update correct? is there any overhead related to the clustered index if updates are applied to columns that arent part of the clustered index?

    Potentially.  If they increase the length of those columns, especially to any significant degree, then the clus index can suffer a lot of page splits related to the UPDATEs.

    The most common examples are where varchar() columns are initially inserted as NULL (or '') and then later expanded to full text.  For example, a varchar(30) column that is initially NULL but later updated with 20 chars.  Several of those on a page could easily force a page split.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Snargables wrote:

    So just to clarify. if i have a clustered index on say contactid and invoiceid however none of the updates are on those columns then there wouldn't be a need to reorder on update correct? is there any overhead related to the clustered index if updates are applied to columns that arent part of the clustered index?

    If those are the key columns in the Clustered Index and they never get updated, only inserted, AND they are inserted in an "ever-increasing" order, then there won't be any extra overhead there because they won't be causing any page splits.

    If contactid and invoiceid are the keys to the clustered index, and you change either, especially the one that is the first column of the clustered index, then that WILL lead to some pretty massive fragmentation due to page splits because the rows will actually have to move to another page in most cases for the lead column and in some cases for the second column.  If either are NULL and you update them to something else, that almost guarantees rows being moved from one page to another and that will almost guarantee a page split.  It'll also guarantee some partially full pages so you could end up with some pretty low page densities where rows used to be NULL but were updated to something else.

    If contactid and invoiceid are of a numeric datatype and you don't have row or page compression happening and they aren't part of the clustered index keys, then they won't cause row movement and they won't constitute an "ExpAnsive" update and that won't cause any fragmentation.

    All that's fun until someone starts throwing updates on other columns...

    Read the second paragraph in Scott's post above (and to emphasize what he has correctly stated)...  It doesn't matter if columns are part of the Clustered Index keys or not.  If ANY column is variable width and it's updated from NULL or something small to something bigger (maybe even only by 1 byte), it might only take ONE of those to cause a page split and the resulting 2 types (logical and page density) of fragmentation.  That also includes many normally fixed length columns if you have either row or page compression going on the table.

    And, lowering the FILL FACTOR WILL NOT HELP IF you do INSERTs on an ever increasing Clustered Index and then turn right around an do such an "ExpAnsive" update before the next rebuild because all such INSERTs will be going in at the "hot spot" at the end index and they will fill those pages as close to 100% as possible.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply