How much will it cost or save to rebuild that index? (SQL Oolie)

  • Comments posted to this topic are about the item How much will it cost or save to rebuild that index? (SQL Oolie)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for a great article. I am not a performance tuning expert, but I have learnt enough here to be dangerous.

    I don't want to be "that" guy and it doesn't bother me when others do it, but I know when I do it, it irritates me to no end. I assume "justification will be to identity how much disk space" should read "justification will be to identify how much disk space".

  • Fat Uncle Dan wrote:

    Thanks for a great article. I am not a performance tuning expert, but I have learnt enough here to be dangerous.

    I don't want to be "that" guy and it doesn't bother me when others do it, but I know when I do it, it irritates me to no end. I assume "justification will be to identity how much disk space" should read "justification will be to identify how much disk space".

    Your correction is correct.  Thanks for the feedback both on the correction and on the article.  Much appreciated.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Interesting...  At the time of this post, there were 3 votes with an average of 3.67.  That might mean two 5's and a 1 or two 4's and a 3.  It would really be nice if someone told me why they rated the article as a "1" (if that's what happened) so that I can improve either the subject matter or the methods used in writing the article and proving the findings in future articles.

    If it turns out to be two 4's and a 3, that's not so bad but it still leaves me wondering what I may have done wrong.  Was it the subject, the proofs, too long for the subject, or ???

    Thanks for for any additional feedback on any problems observed, whomever/whoever you are.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I just now rated the article 3 out of 5, because while it's well written, no discussion of very large table index maintenance would be complete without mentioning the PARTITION = {partition_number} option. When rebuilding or reorganizing a partitioned table, it's essential to do this incrementally, or even better, do it selectively based on which specific partitions exceed the threshold for fragmentation.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell wrote:

    I just now rated the article 3 out of 5, because while it's well written, no discussion of very large table index maintenance would be complete without mentioning the PARTITION = {partition_number} option. When rebuilding or reorganizing a partitioned table, it's essential to do this incrementally, or even better, do it selectively based on which specific partitions exceed the threshold for fragmentation.

    Great point.  Thanks for the feedback, Eric.  I'll add that to the article and resubmit it tonight.  Thanks for your time.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for a useful article.

    A related question (please advise if it would be better as a separate topic) would be if it's possible and, if it is, how to work out how long a reorganise or rebuild will take on a particular system.

    I/O time might be possible to work out by using the page count and historical disk performance from performance monitor on a Windows system (I'm not sure of the equivalent on Linux - maybe the top command?)? I am not sure about CPU time though?

    I'm just thinking about the user/management question - "OK, so you want downtime, how long for?".

    Sorry if this is the wrong place for the question or the information is in your article and I missed it.

  • @as1981 ,

    Apologies for missing your question on this.

    Yes, it's possible to work it out from historical information as you've said but it's still pretty much of a SWAG simply because you can't control what else is happening on your server.  For example, a rebuild of a given index might normally take X minutes but, if someone has a lock on the index for some reason, the rebuild won't even start until that lock clears up.  Even if you rebuild the index "ONLINE", the activity in the table may impact the index rebuild in that it needs to update that "update" table when such online actions occur.  Even if you're the only one using the entire database, if something like a backup occurs, that will impact the rebuild.

    So, the bottom line is that you can make a guess based on things like history, column widths, column types, etc but there is so much that could affect the amount of time to do a REBUILD or REORGANIZE that the guess could be wildly different that what actually happens even if the index isn't a part of what else is happening.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No problem and thanks for your advice

  • I realise that I am a bit late to the party. The percentage increase is one that many people get wrong.

    When I demonstrate similar issues, I like to use 100% vs. 50% because more people can intuitively understand that having only half of data filled up makes the whole thing take up twice as much space.

    But to make the points come across loudly, I often use extreme points like 10% taking up 10 times as much space (and 1% taking up 100 times as much space - which is so far from the 99% coming from the misguided calculation of 100%-1%). When people agree to that calculation it is usually easier to work "up" to 70%.

  • Rasmus Remmer Bielidt wrote:

    I realise that I am a bit late to the party. The percentage increase is one that many people get wrong.

    When I demonstrate similar issues, I like to use 100% vs. 50% because more people can intuitively understand that having only half of data filled up makes the whole thing take up twice as much space.

    But to make the points come across loudly, I often use extreme points like 10% taking up 10 times as much space (and 1% taking up 100 times as much space - which is so far from the 99% coming from the misguided calculation of 100%-1%). When people agree to that calculation it is usually easier to work "up" to 70%.

    Thanks for the feedback.  I just try to keep it real instead of going to extremes because a lot of people will actually dismiss an article if they see it go to such extremes.  If you demonstrate that a very common value, like a 70% Fill Factor, is going to cost almost 43% more space ("almost half" is what many perceive) rather than 30% ("only" about a third is what most perceive), the point is well made and you don't actually have to "work up to 70%", which saves the reader time on what is also meant to be a relatively short article.

    People also need to know that it applies for other starting values other than 100% because that also happens and they don't necessarily understand that the 100% isn't a fixed value.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Stupid question, but why are some people use a low fill factor as 70 or 80%, I'm usually fine with something as 95-98 (or even 100 if it is on a identiy column and has no varchar() columns that WILL (not could) be updated by a longer value).

    I understand, that there may be exception, as when you use an UNID as primary key (maybe because you are using some sort of replication / synchronisation over several sides), but in this case I'd use the UNID-column (usually) only as nonclustered index.

    Usually > 90 % of your database are old data that will almost never be updated. When your main key is ascending (dates, own or "foreign" (as the order_id in the order_position table) identity columns etc.) you will never insert in the middle of the index. And even if you have an index on the StackOverflows Posts table on the user_id or cluster your order table (for whatever reasons) by the customer_id it will be usually only one or two inserts per day / week / whatever your index optimize window is, so 95% gives you usually more then enough space for those inserts.

    God is real, unless declared integer.

  • Thomas Franz wrote:

    Stupid question, but why are some people use a low fill factor as 70 or 80%, I'm usually fine with something as 95-98 (or even 100 if it is on a identiy column and has no varchar() columns that WILL (not could) be updated by a longer value).

    I understand, that there may be exception, as when you use an UNID as primary key (maybe because you are using some sort of replication / synchronisation over several sides), but in this case I'd use the UNID-column (usually) only as nonclustered index.

    Usually > 90 % of your database are old data that will almost never be updated. When your main key is ascending (dates, own or "foreign" (as the order_id in the order_position table) identity columns etc.) you will never insert in the middle of the index. And even if you have an index on the StackOverflows Posts table on the user_id or cluster your order table (for whatever reasons) by the customer_id it will be usually only one or two inserts per day / week / whatever your index optimize window is, so 95% gives you usually more then enough space for those inserts.

    The subject of what Fill Factor to use is a huge one.

    If, for example, you have an ever-increasing index that you do inserts to and follow that (before the next index maintenance) with "ExpAnsive" Updates, using a lower Fill Factor to try to avoid page splits is a totally futile effort because the new inserts go in at 100% no matter what the Fill Factor is.  And, as you point out and especially for ever-increasing indexes, it's actually an ignorant waste to have a Fill Factor on a table where only the latest "tip" of the index is active.

    If, on the other hand (for example), you have a Random GUID keyed index (or any other index that suffers fairly evenly distributed inserts or "ExpAnsive" updates), then a lower Fill Factor (like 70%) can prevent page splits for millions of inserts or updates for months with no index maintenance.  Row size also plays an important consideration at to what the Fill Factor should be, as well.

    Of course, there are some crazy circumstances where your index massively fragments but with absolutely no page splits.  No... that's not impossible.  I see it happen a whole lot.  That also means no page density (physical fragmentation) problems.  Assigning a low Fill Factor to those is also an ignorant waste.  I have a "numbering system" for Fill Factors to identify what type insert/update/frag pattern is exhibited by an index.  It call the ones I'm talking about "Type 98".  The 8 is like an infinity symbol stood on end and is what I use to mark indexes that are "fragmented forever" and aren't an "Even Increasing Index that could actually be repaired (which is a "Type 97" to me)

    You say you're usually fine with a 95-98% Fill Factor but under what conditions are those "fine"?

    And I'm not sure what you mean by an "UNID".  Do you mean a UUID (also known as a GUID in SQL Server)?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thomas Franz wrote:

    Stupid question, but why are some people use a low fill factor as 70 or 80%, I'm usually fine with something as 95-98 (or even 100 if it is on a identiy column and has no varchar() columns that WILL (not could) be updated by a longer value).

    I understand, that there may be exception, as when you use an UNID as primary key (maybe because you are using some sort of replication / synchronisation over several sides), but in this case I'd use the UNID-column (usually) only as nonclustered index.

    Usually > 90 % of your database are old data that will almost never be updated. When your main key is ascending (dates, own or "foreign" (as the order_id in the order_position table) identity columns etc.) you will never insert in the middle of the index. And even if you have an index on the StackOverflows Posts table on the user_id or cluster your order table (for whatever reasons) by the customer_id it will be usually only one or two inserts per day / week / whatever your index optimize window is, so 95% gives you usually more then enough space for those inserts.

    There are some assumptions here that are not universal - for example, a document table in a patient centric system should not be clustered on the DocumentID (identity) - rather it should be clustered on the PatientID and DocumentID.  This ensures that requests for patients documents will be retrieved from the fewest possible contiguous pages.  With this type of table - you definitely want a lower fill factor and a scheduled rebuild of the index to manage fragmentation and allow for new documents to be added to the patient data.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams wrote:

    Thomas Franz wrote:

    Stupid question, but why are some people use a low fill factor as 70 or 80%, I'm usually fine with something as 95-98 (or even 100 if it is on a identiy column and has no varchar() columns that WILL (not could) be updated by a longer value).

    I understand, that there may be exception, as when you use an UNID as primary key (maybe because you are using some sort of replication / synchronisation over several sides), but in this case I'd use the UNID-column (usually) only as nonclustered index.

    Usually > 90 % of your database are old data that will almost never be updated. When your main key is ascending (dates, own or "foreign" (as the order_id in the order_position table) identity columns etc.) you will never insert in the middle of the index. And even if you have an index on the StackOverflows Posts table on the user_id or cluster your order table (for whatever reasons) by the customer_id it will be usually only one or two inserts per day / week / whatever your index optimize window is, so 95% gives you usually more then enough space for those inserts.

    There are some assumptions here that are not universal - for example, a document table in a patient centric system should not be clustered on the DocumentID (identity) - rather it should be clustered on the PatientID and DocumentID.  This ensures that requests for patients documents will be retrieved from the fewest possible contiguous pages.  With this type of table - you definitely want a lower fill factor and a scheduled rebuild of the index to manage fragmentation and allow for new documents to be added to the patient data.

    Can we please NOT have a discussion about what a Clustered Index should or should not contain on this there... PLEASE!  😉

    Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply