Index rebuild / stats update taking lot of disk space.

  • Dear Friends,

    Please advise why the Index rebuild and Update stats  jobs on one server consume lots of disk space after they run..I have to manually shrink the data file to release the space....Thank you in advance.

     

     

  • Oh man never (unless it is a drastic data delete and a one off shrink) shrink the data file, your getting yourself into a shrink rebuild shrink cycle and that is not good for you or your database.

    The rebuild requires space to sort the index before the old and new index is swapped in/out, especially if the index is rebuilt online.

    The old index remains in place and a new index is built, then when ready the index pages are swapped around.

    So you could end up using around 2x the amount of space of your largest index and then that becomes white space after the pages are swapped and dropped.

    If your largest index is say 100GB, you can use up to around 200GB extra just to sort and store that while the rebuild is running.

     

     

    Now you need to ask yourself why are you rebuilding indexes, is it for performance, if so what type of storage are you on?  Maybe you never need to rebuild indexes ever and just use statistics maintenance instead.

    Even Microsoft say you now may never need to rebuild indexes, I would look at what you are doing and why you are doing it and potentially re-work your maintenance routines.

    https://learn.microsoft.com/en-us/sql/relational-databases/indexes/reorganize-and-rebuild-indexes?view=sql-server-ver16#index-maintenance-strategy

  • Arsh wrote:

    Dear Friends,

    Please advise why the Index rebuild and Update stats  jobs on one server consume lots of disk space after they run..I have to manually shrink the data file to release the space....Thank you in advance.

    As Ant-Green correctly advises (and has sited the correct documentation), I'd recommend that you actually stop doing index maintenance until you know more about it.  The reason why Microsoft finally changed this is that a lot of us have proven that the supposed "Best Practices" than 98% of the world has been following is horribly incorrect and has been wrong since the beginning.  Even Paul Randal, the fellow that came up with those number under gun-point by Microsoft said way back in 2008 "to take those [5/30] numbers with a grain of salt and don't treat them as absolute".

    https://www.sqlskills.com/blogs/paul/where-do-the-books-online-index-fragmentation-thresholds-come-from/

    What's worse is that the documentation on REORGANZE and WITH ONLINE = ON is a bit confusing to most.  In the last couple of years, I've proven that REORGANIZE does NOT do what they think it does (like I said, although the documentation on it is correct, it leaves most folks thinking that it "follows the Fill Factor just like REBUILD does and it does NOT) nor is "less resource intensive" as both the documentation and the rest of the world claim.  It will cause log file entries many times the size of whatever index you're using it on including some huge Clustered Indexes.

    I've also proven that it's MUCH better to do no index maintenance that it is to do it wrong.  That's why I stopped doing any general index maintenance on Monday, the 18th of January, 2016.  One of the reasons is because of the massive "morning after blocking" that will occur.

    I and a lot of other good folks have also proven that logical fragmentation mostly does NOT matter and, if you look at all the 'tubes and blogs and articles on the subject of index maintenance, none of them actually prove that it does matter.  They just repeat the same old crap about how it CAN matter but never do a demonstration.

    The information about index maintenance is so wrong that people also believe that Random GUIDs are a fragmentation problem because they're Random... in fact, them being so evenly Random makes them the very epitome of how people think an index should behave and they can have very high insert rates and, quite literally, go for month with < 1% logical fragmentation.  And, I PROVE all that and more in the following video.  Watch it all because it's not just about GUIDs.  And watch the out-take after the Q'n'A section.

    https://www.youtube.com/watch?v=rvZTwMNJxqVo

    So, my advice to you is to stop doing index maintenance and pay more attention to rebuilding stats.  Once you know more about your indexes, science out what they're doing and only rebuild indexes when their average page density gets below 80%. And, even then, you've got to be careful... if they're fragmenting like that and you rebuild them at the default of "0", will that cause more blocking and you need to reduce the fill factor or will reducing the fill factor do NOthing to help?

    Whatever you do, STOP following the supposed "Best Practices"... they're actually a WORST practice and many of us have proven it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I will add that rebuilding indexes is not about performance - and should not be considered as a method to improve performance.  Index maintenance is about resource usage.

    How much wasted storage and wasted memory do you have if you have a 1TB clustered index that has an average page density of 50%?  For any cloud based systems, paying for an additional 300GB of space just because you are not performing index maintenance is an expensive cost for a lot of wasted space.

    One rule of thumb I use is that if an index is selected to be rebuilt (doesn't matter what you use to determine this) - do not rebuild that index with a FILLFACTOR of 0 (100) unless you know exactly how and when data is inserted/updated/deleted.  For example, if the process is part of an ETL and the data is only inserted/updated during the ETL process - then you can rebuild after the load to 100 since you also know the data won't change until the next load.

    The next thing you need to determine is how fast the index is fragmenting and what you can do to reduce that frequency.  That is what FILLFACTOR is all about - making sure you have enough space available on each page to support new rows and updates to avoid expensive page splits.  If you are able to set a reasonable fill factor to get you to a day/week/month/quarterly schedule for that index - without consuming too much space/memory then you can schedule those processes to manage those indexes.

    Final Note: you can reduce the amount of space used in the data file by setting SORT_IN_TEMPDB on.  But be aware this will require much more space in your tempdb and you need to ensure tempdb is capable of handling the additional workload.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Ya gotta be careful, though.  If you do the index maintenance incorrectly, you could be setting up the index to do massive page splits which will cost you dearly in the hours and, sometimes, days after.  If you do have an index with a 50% average page density and you don't actually know if the fragmentation occurred only at the broom tip (hot spot at the insertion point of the end of the index), then rebuild it at 82%.  You be gaining a lot in space savings without the possibility of it killing you with page splits because you made the mistake of rebuilding it with a "0" fill factor, leaving no space.  If you CAN prove that the fragmentation only occurs in the hot-spot, the rebuild it at 97 % so that you know just by looking at the fill factor, what kind of index it is.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Ya gotta be careful, though.  If you do the index maintenance incorrectly, you could be setting up the index to do massive page splits which will cost you dearly in the hours and, sometimes, days after.  If you do have an index with a 50% average page density and you don't actually know if the fragmentation occurred only at the broom tip (hot spot at the insertion point of the end of the index), then rebuild it at 82%.  You be gaining a lot in space savings without the possibility of it killing you with page splits because you made the mistake of rebuilding it with a "0" fill factor, leaving no space.  If you CAN prove that the fragmentation only occurs in the hot-spot, the rebuild it at 97 % so that you know just by looking at the fill factor, what kind of index it is.

    Agreed - I would only ever rebuild an index to 100 if I absolutely know that there are no inserts, updates or deletes that can be performed on that table by any other process.  Generally, that is going to be some type of ETL process where we have absolute control of the population of the tables.  Another place where a 100 fill factor is reasonable would be lookup tables, which generally don't change very often and are normally smaller in size.

    As to what fill factor to use - that is where the analysis needs to be done.  You need to know how the application(s) insert, update and delete data for that table.  You also need to know the volume of changes and determine the best fill factor for that system.  If you have a large enough maintenance window and can rebuild the indexes on that table every night then you can get away with a much higher fill factor - but if that isn't the case then lowering the fill factor *may* allow time to put off rebuilding for a week or month or longer.

    Frankly, not rebuilding indexes is going to lead to much higher costs - especially considering how cloud services charge based on compute resources and memory, storage, IO and CPU all factor into the total cost.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Here's the "formula" I've been using in production

    FF

    100 - Static or usually static.

    99 - Usually, some form of ever-increasing keyed index that suffers no fragmentation, which also means no "ExpAnsive" updates anywhere.

    98 - Permanently fragmented "Sequential Silos".  The odd critters fragment massively but stay near 100% full all the time because, oddly enough, there is no page splitting.  Sounds impossible but it's not.

    97 - Like type 99 BUT they suffer from "broom tip" fragmentation by the bad pattern of doing inserts followed soon after by "ExpAnsive" updates.  It's just a huge waste to use a lower fill factor because none of the rest of the index is affected.  I rebuild these when they're wasting enough memory/disk space to warrant it.

    96 - Random Silos... fragment forever and might not be able to prevent page splits but need space recovery .  Usually not worth leaving any headroom but could become more like a TYPE 1 index when analyzed if the Silos are wide enough to benefit from.

    Type 1 - these are usually evenly distributed fragmenters like Random GUIDs and other things.  Depending on their insert rates, I'll rebuild them at 91, 81, or 71 FF (always ending with a "1" to identify what they are.  The big reason to rebuild these is to prevent the massive page splits they would otherwise suffer.  If the Logical fragmentation hit's 1%, it means it's time to rebuild because the area above the fill factor is full and all hell is getting ready to break loose.

    Type 2 - Usually set to 82 (ending with a "2")  These are indexes that fragment and you need to recover space but you haven't analyzed then yet.  The "2" is to remind you that you still have an analysis "TO DO". 😉  They frequently turn out to be a "97" but haven't done the analysis yet.

    I don't leave "o" fill factor indexes... those are new indexes that need to be classified.  The same goes for existing indexes that don't have the values above.  Someone else added a Fill Factor that usually ends in a 0 or a 5 and they need to be identified and checked.

    I also use the Swap'n'Drop method when recovering space on monster indexes to that I don't leave a bunch of totally unnecessary free space.

    And... I might do index maintenance somewhere between once a month and once a quarter or so.  I don't do this on a schedule.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply