Create Clustered Index on a Very Large Table (500 GB)

  • @jeff,

    Sorry for the late response. This table is partitioned by week and we are mostly just inserting into that specific weekly partition and aggregating the data which updates some data. Also since we are almost always inserting a new quarter's data we are basically creating new pages at the end of the table. Because of this special business scenario the inserts are fairly fast for a 4 TB SQL2012 databases. The business operated under NCI index Heap setup for years but some of the larger reports are now suffering some performance and that's why the switch to this compressed CI set up.

    This set up is still in development/testing phase and have not been deployed to production yet. So I will not have a real PROD stats on this until we finish implementation.

    Thanks again,

    -A


    Abdel Ougnou

  • Jeff, I wanted to get your take on something Gail posted once :

    http://www.sqlservercentral.com/Forums/FindPost485499.aspx

    The way I read it, a clustered index (pk or not) won't necessarily cause a huge slowdown to the insert process. I think it would lead to fragmentation if done often enough, due to new pages everywhere. Is there a balance to reach here?

    ----------------------------------------------------

  • MMartin1 (8/7/2014)


    Jeff, I wanted to get your take on something Gail posted once :

    http://www.sqlservercentral.com/Forums/FindPost485499.aspx

    The way I read it, a clustered index (pk or not) won't necessarily cause a huge slowdown to the insert process. I think it would lead to fragmentation if done often enough, due to new pages everywhere. Is there a balance to reach here?

    Correct. Adding rows to a clustered index won't cause any slowdown to an insert process if the insert process is in the same order as the CI and it's at the logical end of the table.

    If it's not in the same order or not at the logical end of the table, you get page splits and those not only cause fragmentation but they can also cause huge slowdowns because (on average) roughly have the page will need to be copied to a new page. None of that is contrary to what Gail stated in that post.

    Non-clustered indexes suffer the same fate.

    I have seen it where someone adding an NCI with a column of very low selectivity as the leading column will suddenly cause massive timeouts on a busy app. Heh... I know this to be true because I was that "someone" in my early days.

    The same thing can happen with clustered indexes.

    The "balance" to be sought is to determine whether the table will suffer more inserts than selects and to write your indexes accordingly.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I think this is slightly mis-stated Jeff. While you may not get fragmentation from end-of-index inserts, you certainly can get page LATCH contention, which can be a significant bottleneck to good insert performance in such a scenario.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Thanks Jeff for sharing your knowledge here.

    Kevin, I'll have to read up on Latching. Thanks guys.

    ----------------------------------------------------

  • TheSQLGuru (8/7/2014)


    I think this is slightly mis-stated Jeff. While you may not get fragmentation from end-of-index inserts, you certainly can get page LATCH contention, which can be a significant bottleneck to good insert performance in such a scenario.

    Kevin, won't you get LATCH contention at the Extent tail either way?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (8/8/2014)


    TheSQLGuru (8/7/2014)


    I think this is slightly mis-stated Jeff. While you may not get fragmentation from end-of-index inserts, you certainly can get page LATCH contention, which can be a significant bottleneck to good insert performance in such a scenario.

    Kevin, won't you get LATCH contention at the Extent tail either way?

    Not if you use a non-"tail-end" type key. This is often done in high-volume insert scenarios to spread the latching around the range of values, at least a bit. Hashes or mods or even GUIDs (yes, it hurt just a little bit to say that :w00t:) can be a win in scenarios like that.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (8/8/2014)


    Evil Kraig F (8/8/2014)


    TheSQLGuru (8/7/2014)


    I think this is slightly mis-stated Jeff. While you may not get fragmentation from end-of-index inserts, you certainly can get page LATCH contention, which can be a significant bottleneck to good insert performance in such a scenario.

    Kevin, won't you get LATCH contention at the Extent tail either way?

    Not if you use a non-"tail-end" type key. This is often done in high-volume insert scenarios to spread the latching around the range of values, at least a bit. Hashes or mods or even GUIDs (yes, it hurt just a little bit to say that :w00t:) can be a win in scenarios like that.

    Sorry, I wasn't clear. I meant HEAP or tail-end CI would react nearly similarly to my knowledge for hot spot contention.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • TheSQLGuru (8/7/2014)


    I think this is slightly mis-stated Jeff. While you may not get fragmentation from end-of-index inserts, you certainly can get page LATCH contention, which can be a significant bottleneck to good insert performance in such a scenario.

    Thanks for the feedback, Kevin. It is certainly possible that I've not worked on a system that has had a high enough level of tail-end inserts to matter. Expedia.com (we took care of the "yellow box" adds at the top and the bottom) was the biggest I've worked on and the number of inserts weren't what I'd call huge (just a couple of hundred every couple of seconds). That's also where the incident with the application timeouts due to page splits on an NCI with low selectivity occurred and that's what I was basing my comment on. It was amazing just how quickly it happend. As soon as the index committed, WHAM! As soon as we dropped the index, the problem went away just as quickly. The table did have a CI on an IDENTITY column.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey guys, sorry I'm a bit late on this but I had a question in regards the Online re-indexing in Enterprise. Does the ONLINE=ON completely eliminate the possibility of Table Locks when the process is running or does it just lower the % of that?

    Thanks 🙂

  • SQLnbe (8/8/2014)


    Hey guys, sorry I'm a bit late on this but I had a question in regards the Online re-indexing in Enterprise. Does the ONLINE=ON completely eliminate the possibility of Table Locks when the process is running or does it just lower the % of that?

    Thanks 🙂

    Maybe you should open a new thread for that? But the answer is no. Still some locks are needed but the operation is less intrusive.

    Grab a cup of coffee and read this: How Online Index Operations Work

  • sql-lover (8/8/2014)


    SQLnbe (8/8/2014)


    Hey guys, sorry I'm a bit late on this but I had a question in regards the Online re-indexing in Enterprise. Does the ONLINE=ON completely eliminate the possibility of Table Locks when the process is running or does it just lower the % of that?

    Thanks 🙂

    Maybe you should open a new thread for that? But the answer is no. Still some locks are needed but the operation is less intrusive.

    Ok thanks

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply