What would Fill Factor Do?

  • I just had an external Vendor call and mention they want to change the Fill Factor setting for a server.

    They share this SQL 2008 R2 server with 10 other databases that support 5 other applications.

    They have 265 Indexes that are rebuilt at lease once a week because they are more than 10% Fragmented.

    They are asking for this change to increase database/Index performance and decrease the space wasted in their indexes.

    I can see how setting the default index fill factor to 0 (100% full ) will decrease empty index space in the pages or reduce Index size.

    I have recommended that we leave the defaults alone and create a nightly job to rebuild Indexes.

    Any Index with 5 to 24.5% fragmentation gets rebuilt with a fill factor of 85%.

    Any Index with fragmentation of over 24.5% gets rebuilt with a fill factor of 80%.

    What are you comments about this plan?

  • I just had an external Vendor call and mention they want to change the Fill Factor setting for a server.

    In my experience it's better to leave the the server setting alone and deal with each index individually

    They have 265 Indexes that are rebuilt at lease once a week because they are more than 10% Fragmented.

    10% fragmentated indexes should be reorganized, not rebuilt; the MS recommendation is to rebuild at 30% or more. Also, depending on the size of the indexes re-org's or rebuilds might be a waste of time.

    They are asking for this change to increase database/Index performance and decrease the space wasted in their indexes.

    Unless we're talking about some big indexes 10% fragmentation is not a big deal.

    I have recommended that we leave the defaults alone and create a nightly job to rebuild Indexes.

    Any Index with 5 to 24.5% fragmentation gets rebuilt with a fill factor of 85%.

    Any Index with fragmentation of over 24.5% gets rebuilt with a fill factor of 80%.

    Nothing you have said in your post has indicated that changing the fill factor will help you. I think more data collection research and testing should be done based on what you have said.

    Remember: if the index is < 30% fragmented you need no rebuild it, you can reorganize it instead.

    A couple good links:

    Rebuild or Reorganize: SQL Server Index Maintenance[/url] (http://www.brentozar.com/)

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (3/23/2015)


    10% fragmentated indexes should be reorganized, not rebuilt; the MS recommendation is to rebuild at 30% or more. Also, depending on the size of the indexes re-org's or rebuilds might be a waste of time.

    Why? It takes less than 20 minutes a night to rebuild all the Indexes that are marked as needing to be defragmented. Why would you defrag instead of just rebuilding?

    PS: Read the Brett Ozar link you provided. We already had and decided to just rebuild instead of deal with a defrag.

    EDIT: I should have mentioned the smallest of these 265 indexes is over 200MB the largest is 9gig. Both of those are Non-Clustered Indexes.

    Can you explain why you say nothing in my post relates to needing to change the Fill Factor?

    The Fill Factor setting directly affects how long before new entries added to an existing index are put in empty space or added to a split page.

    So why would you say that changing the fill factor would have no affect on a index that becomes more than 10% fragmented every day?

    Just asking these things so I can understand your view and were it comes from.

    The concepts in the question posted come mostly from here.

    http://www.sqlskills.com/blogs/kimberly/database-maintenance-best-practices-part-ii-setting-fillfactor/

  • My apologies - I misread you original post a little. I was under the impression that some of these indexes were 10% fragmented weekly, not more than 10% as you clearly stated. That changes things a little.

    Why? It takes less than 20 minutes a night to rebuild all the Indexes that are marked as needing to be defragmented. Why would you defrag instead of just rebuilding?

    Reorgs are faster and keep the index online - that's why I suggested doing a re-org vs rebuild. If it takes 20 minutes and that is acceptable then a rebuild is fine. You did not state it took just 20 minutes (and that was cool with you) in your original post. I generally do re-orgs when indexes are lightly fragmented because it's faster, less resource intensive and keeps the index online. That's why I believe the recommendation is not to rebuild unless the index is 30% more fragmented. In my experience Rebuilds (especially online rebuilds) are much more resource intensive.

    EDIT: I should have mentioned the smallest of these 265 indexes is over 200MB the largest is 9gig. Both of those are Non-Clustered Indexes.

    That's helpful to know. For non-clustered indexes on frequently updated OLTP tables that get fragmented quickly then 80 (or even 70) is probably not a bad idea. If it's a Clustered index based on an identity column, for example, I generally set it to 100. My tally table is set to 100 because it's never updated. I guess I should have said that I would need more information about your indexes.

    So why would you say that changing the fill factor would have no affect on a index that becomes more than 10% fragmented every day?

    I did not say that, I said that nothing you said had indicated that changing the fill factor would help you. I guess I should have asked for more info about the indexes (e.g. clustered vs non-clustered, size and how often they are updated.). A 9GB index that is updated frequently and get's fragmented very quickly would likely benefit from being re-built and having the fill factor lower. In that case the approach you were considering is much, much better than just blindly changing the fill factor for the entire server as your vendor suggested. The only place I have seen that make sense is in my datawareshouses where fragmentation is not in issue because those indexes don't change after they're build.

    I have not read the article you referenced but will do so now to better understand your approach (and because Kimberly Tripp articles are generally very good.)

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Just to add some information to this thread.

    REORGANIZE

    1. Is not always faster. If an index is heavily fragmented, it can be quite a bit slower.

    2. Reorganize is the only way to compress BLOBs, if they need it. Typically, this only affects the Clustered Index because most people don't INCLUDE blobs in NCIs. As a bit of a side bar, storing BLOBs in a sister table can be a real benefit because it won't interfere with ONLINE rebuilds if you need to do one.

    3. Reorganize is FULLY logged. If you reorganize a huge clustered index, better make sure that you have the log file space and that you're doing frequent backups.

    4. Reorganize does most everything "in place", which has the nice side effect of not causing growth of MDF/NDF files.

    5. Reorganize really only affects the Leaf Level. It doesn't affect the B-TREE except to compress it if it has gaps. That may or may not be a problem for you.

    6. Reorganize does not rebuild stats. Any time you save with Reorganize might be consumed by rebuilding stats of the indexes that you're Reorganized. Not rebuilding the stats is probably not a great idea.

    7. Reorganize is ALWAYS done ONLINE even if in the SIMPLE recovery model. That may or may not be a feature because, like I said, Reorganize is ALWAYS fully logged, as well. Don't forget that also affects the size of the log file backups on disk and on tape. If you Reorganize the whole database, the log file backup could be as large as a full backup and sometimes larger depending on what's going on.

    REBUILD

    1. For any index containing more that 128 extents (which is just 8MB), the new index will be fully built and engaged before the old index is dropped. For large indexes (usually clustered indexes), that can cause some seriously unwanted growth of the MDF/NDF files. A lot of people say "Well it has to grow sometime. What's the big deal?" The answer is footprint size and duration of restores if you need to restore to a different box. This can certainly affect smaller environments like Dev boxes or a slightly smaller DR box if it ever comes to that. It won't, however, affect the size of the backup files themselves.

    2. Online rebuilds are actually less effective than offline rebuilds and can take a lot of extra space because online rebuilds will build kind of a transaction table to keep track of any changes that occur during the rebuild. That also takes extra time, which is also a part of the reason that people think that Reorganize is faster than a Rebuild. You'd be surprised at how fast an offline rebuild is compared to a Reorg.

    3. Rebuilds are fully logged only if you're in the FULL recovery model. Since a Rebuild can't be interrupted without doing a full rollback, its frequently quite effective to switch to the BULK LOGGED mode where index Rebuilds are "minimally logged". Of course, minimally logged operations will interfere with the ability to do a point in time restore from log files that contain minimally logged operations. In such cases, you either have to use the whole log file or stop before the log file.

    4. Rebuilds auto-magically rebuild stats and they rebuild the BTREE, as well. Both can be an advantage.

    It's a personal opinion, for sure but I don't agree with MS recommendations on when to use Reorganize or Rebuild. Since many procs that are designed to do all of this for you follow the same basic pattern, I disagree with those, as well.

    For non-huge NCIs, I'll generally do a Rebuild if I need to defrag it as I sometimes will for larger indexes. It depends on the "hit frequency" on the index, which you can get by tracking the usage stats of indexes. If the hit frequency is low enough or is low during the time the time that I'm going to do the defrag, there are a lot of times where I'll do the rebuild offline even though I have the Enterprise Edition just because of the speed and the fact that offline rebuilds to a better job of defragmentation than the online rebuilds do. Obviously, you don't want to do offline rebuilds on large CIs if you can help it because that will make the data in the table inaccessible for the duration.

    If CIs need defragging, I look at the size and whether or not they have blobs in them. If they have blobs, I might bite the bullet and do a Reorganize and then rebuild the stats. If the CIs are small and don't have blobs, then I'll usually do an online rebuild. Since the databases I work with suffer large OLTP loads and large batch loads, part of the decision to rebuild or reorganize is also based on how fragmented the lower levels of the BTREE are.

    Because we're not using 2014, yet, ONLINE rebuilds of partitioned indexes aren't available so I'm pretty much forced into using Reorganize for those until I do the "final packing" of a partition for the previous month. I have to tell you that it's a whole 'nuther ball game there to keep from making a ton of free space in the partition's file/filegroup read-only. About a third of my 2 hour session on partitioning "worm" tables is dedicated to that very subject.

    Getting back to the subject of FILL FACTOR changes. It's really a difficult decision on very large tables. If the CI is quickly fragmented, then there's a very good chance that you've used the wrong keys for the CI. I follow the general recommendation that a CI must be narrow, unique, not null, ever-increasing, and non-updateable especially on high volume OLTP tables. If I need something like a CI for most of my queries, I may elect to simulate one (reduced number of columns, of course) by using a multi-key index with INCLUDES. It really does depend because I don't do that often nor is there a need to in a lot of cases. If you followed the recommendations for the CI, then it shouldn't become fragmented (unless you do deletes or a lot of updates on variable length columns that cause the "cell" to grow). Having anything less than a 100% fill factor in such cases is usually nothing more than a horrible waste of space. If the width of the table is approaching a size where only one row will fit on a page, it's not going to matter what you set the fill factor to, anyway.

    For large NCIs, you can run into similar huge wastes of space and it's more common to do so because NCIs can rarely follow the same recommendations as CIs. The problem there is that if you reduce the fill factor to 80%, then you've just made any multipage lookups 20% slower (not to mention making the footprint of the index 20% larger)... which could be a real killer for reports and other selects that are poorly written and already take a long time. Since fragmentation of NCIs is due to columns being inserted or modified (or deleted, but not normally because everyone keeps everything forever :-D), that would also mean that stats are needing to be changed. Since it takes a hell of a lot of such changes to get auto rebuild of stats to kick in on a large table, it pretty much means that your going to have to either do a Reorg followed by a stats rebuild or just bite the bullet and do a rebuild.

    To get back to fill factor once again, if it's a real problem on some indexes due to inserts, consider building a staging table to handle the inserts for the day and then incorporating them into the main table at night followed by a very selective index rebuild. That way you might avoid a huge amount of wasted space and the related sacrifice in performance.

    And now for the black arts part of all this. If you're going to do offline rebuilds anyway, consider disabling the index first. That effectively removes all of the rows from the index while keeping the definition so that only one copy of the index will exist and that will keep your MDF/NDF from unnecessarily expanding. Of course, you should NEVER EVER do that with a clustered index because the heap will be preserved while you rebuild the clustered index, which is just as bad as doing the rebuild directly.

    [font="Arial Black"]As for the vendor's recommendation that decreasing the fill factor will save space, they could actually be correct.[/font] When a page split occurs, you end up with two pages that are half filled. That's like having a 50% fill factor for those two pages. An 80% fill factor is a whole lot less wasted space than that and won't cause timeouts like a series of page splits can. Unless the NCI is ever-increasing in nature (which wouldn't become fragmented due to inserts anyway), setting the index to a lower fill factor can cut the size of the NCI almost in half.

    Apologies for the long winded answer.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Almost forgot. You can also "pad" the BTREE with the same fill factor. Again, that would be a judgment call and should not necessarily be a blanket policy.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/23/2015)


    Just to add some information to this thread.

    .....

    To get back to fill factor once again, if it's a real problem on some indexes due to inserts, consider building a staging table to handle the inserts for the day and then incorporating them into the main table at night followed by a very selective index rebuild. That way you might avoid a huge amount of wasted space and the related sacrifice in performance.

    .....

    Apologies for the long winded answer.

    Jeff,

    No apologies needed. Everything you are saying matches up with what I reviewed to make these determinations. This would be the only application like this.

    In response to you're suggestion above. This database supports a patient data medical application, EMR to be specific, and using staging tables is not an option.

    Also we have the Temp database on separate volume so any index rebuild is executed with sort in tempdb option on.

    The "fun" part of all that I should have mentioned was that this Vendor demanded we set the Default Index Fill Factor = 10 on the server last July.

    Of course we recommended against changing this.

    However after a 9 months of Index data collection it appears that this is helping them more than harming.

    Also they seem to see this as a "Magic Bullet" to resolve not a performance issue, but and issue with increase in Table space.

    This is the data size of the tables, not the Index size.

    Very peculiar. Sometimes you wonder if the people you are talking to actually understand the technology they are talking about.

  • Jeff Moden (3/23/2015)


    Almost forgot. You can also "pad" the BTREE with the same fill factor. Again, that would be a judgment call and should not necessarily be a blanket policy.

    Please explain. I think I like the way this sounds.

  • Alan.B (3/23/2015)


    My apologies - I misread you original post a little. I was under the impression that some of these indexes were 10% fragmented weekly, not more than 10% as you clearly stated. That changes things a little.

    Alan, Thanks for your response. I did typo weekly and not daily.

    I also did not mention many other things from the last 9 months that went into my decision to make this update.

    Also did not mention that one of the databases on this server is a table to store the text BLOB of all our HL7 messages.

    So I liked what Jeff had to say about all of that.

  • Jeff Moden (3/23/2015)


    .....

    And now for the black arts part of all this. If you're going to do offline rebuilds anyway, consider disabling the index first. That effectively removes all of the rows from the index while keeping the definition so that only one copy of the index will exist and that will keep your MDF/NDF from unnecessarily expanding. Of course, you should NEVER EVER do that with a clustered index because the heap will be preserved while you rebuild the clustered index, which is just as bad as doing the rebuild directly.

    [font="Arial Black"]As for the vendor's recommendation that decreasing the fill factor will save space, they could actually be correct.[/font] When a page split occurs, you end up with two pages that are half filled. That's like having a 50% fill factor for those two pages. An 80% fill factor is a whole lot less wasted space than that and won't cause timeouts like a series of page splits can. Unless the NCI is ever-increasing in nature (which wouldn't become fragmented due to inserts anyway), setting the index to a lower fill factor can cut the size of the NCI almost in half.

    Jeff - Triple thanks for these two paragraphs mate. These are the exact thoughts I had when determining that what we needed was better Index Level Fill factor settings not server level settings.

  • PHYData DBA (3/24/2015)


    In response to you're suggestion above. This database supports a patient data medical application, EMR to be specific, and using staging tables is not an option.

    Shouldn't matter. The staging table would be included in a view for the reads and could be made updateable by adding an Instead Of trigger to the view. The advantage would be that the staging table would remain comparatively small and could have different indexing to support very high performance inserts without having to have unreasonably small fill factors. It would also have an identity column to control where the data was inserted (at the logical end of the table for performance) that wouldn't be used anywhere in the main table.

    The "fun" part of all that I should have mentioned was that this Vendor demanded we set the Default Index Fill Factor = 10 on the server last July.

    Of course we recommended against changing this.

    However after a 9 months of Index data collection it appears that this is helping them more than harming.

    If that's true, then they've made a serious mistake in the design of the database and the code. Things like the staging table that I mentioned can resolve a hell of a lot of this. They just need to realize that they're in a box and start thinking outside the box.

    Also they seem to see this as a "Magic Bullet" to resolve not a performance issue, but and issue with increase in Table space.

    This is the data size of the tables, not the Index size.

    If the table has a clustered index, the issue actually is with an index. If the data is getting that big, they should consider temporal horizontal partitioning to ease the index maintenance requirements.

    Very peculiar. Sometimes you wonder if the people you are talking to actually understand the technology they are talking about.

    Heh... hopefully I'm not included in that statement. While there can be some nuances as to how the data should be handle because of the type of data it is, it's still "just data". That being said, I suppose that medical data can require a more heterogeneous approach but, if the app is designed to use a database, then it's still "just data" and has a solution. Again, I'll bring up the idea of using an online readable staging table to accept high performance inserts and a separate main table that would be updated from the staging table at night or on a regular basis so that low fill factors don't need to be used for most of the tables. I'll also bring up temporal horizontal partitioning, as well.

    As for the padding thing in the BTREE, that's done during an index rebuild using the "WITH PADDING" option. Note that it will not help performance of batch runs that should be doing an index seek with a scan of the leaf level when the start of the scan is determined.

    One of the other considerations is that splits happen and index maintenance is going to happen and people are going to write code that sucks for performance here and there. But, if the biggest concern is the size of the tables, then people should simply plan on buying one of the cheaper commodities and that's more disk space as well as buying one of the better pieces of hardware for performance and that's more memory with the possibility of using SSDs for the more frequently hit tables, which can be isolated to a separate filegroup to do just such a thing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (3/24/2015)


    PHYData DBA (3/24/2015)


    In response to you're suggestion above. This database supports a patient data medical application, EMR to be specific, and using staging tables is not an option.

    Shouldn't matter. The staging table would be included in a view for the reads and could be made updateable by adding an Instead Of trigger to the view. The advantage would be that the staging table would remain comparatively small and could have different indexing to support very high performance inserts without having to have unreasonably small fill factors. It would also have an identity column to control where the data was inserted (at the logical end of the table for performance) that wouldn't be used anywhere in the main table.

    Not sure who would pay for the Multi-million re-write of this application that uses dynamic SQL, ERM, ORM, and No RM to store and retrieve data so that a staging table could be used.

    That was why I mentioned it is not an option. If it was that easy to change how the data is being stored for this application we would solve this issue by creating proper table structure and relations.

    Instead we are having to adapt to 7 oracle databases that were merged to a single MySQL database over 7 years time. Then migrated to MS SQL, back to MySQL, and then to MS SQL again.

    We were happy after the last migration when after a week they listened to us and created the missing Cluster or PK indexes and some Indexes for the relating tables.

    They are using some type of free open source tool to determine what non-clustered indexes are needed.

    Unfortunately they are only using this in their dev environment on tiny copies of the database filled with test data.

    Thanks again for all the comments etc...

    It is good to hear reasonable suggestions coming from a place of knowledge after spending a day discussing the difference between server level default and index specific settings for fill factor.

    I like the padding idea. Much cleaner solution that we have added to our testing for this when the indexes hit that over 24.5% fragmentation daily instead of weekly.

  • Ah... I should have known that such would be the case. And if you touch it, you void the support.

    I agree with you that a 10% fill factor is insane. Just restating what you already know.... If done on the CIs, that would cause table sizes to explode to 10 times their current size as well as the time it would take to do a restore if it ever comes to that. It would also cause the NCIs to explode by a similar amount for questionable benefit. Page splits "only" cause the size to double per affected page (but does have an insert/update performance problem to contend with) so I wouldn't let them go any lower than a 50% fill factor.

    70-80% with fairly regular very targeted online maintenance sounds like the short-term cure. Based on the situation with the software vendor, buying more disk space, memory, and the online maintenance might be the long term cure... heh... at least up to the next day. 😀

    Also and just to be clear, the "padding" only affects the BTree. That may or may not be helpful because singleton lookups don't usually need a nicely defragmented table or BTree. Only reporting and nightly batches would benefit. It may help for singleton inserts/updates, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • PHYData DBA (3/24/2015)


    ... Also they seem to see this as a "Magic Bullet" to resolve not a performance issue, but and issue with increase in Table space.

    This is the data size of the tables, not the Index size.

    Very peculiar. Sometimes you wonder if the people you are talking to actually understand the technology they are talking about.

    I'm confused. How is changing the indexes supposed to resolve an issue with an increase in table space?

    And man, you have my sympathy! Oracle -> MySQL -> SQL Server -> MySQL -> SQL Server?! I don't think I'd want to see your schema! :crazy:

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Wayne West (3/25/2015)


    PHYData DBA (3/24/2015)


    ... Also they seem to see this as a "Magic Bullet" to resolve not a performance issue, but and issue with increase in Table space.

    This is the data size of the tables, not the Index size.

    Very peculiar. Sometimes you wonder if the people you are talking to actually understand the technology they are talking about.

    I'm confused. How is changing the indexes supposed to resolve an issue with an increase in table space?

    And man, you have my sympathy! Oracle -> MySQL -> SQL Server -> MySQL -> SQL Server?! I don't think I'd want to see your schema! :crazy:

    Go back and read what I wrote about them decreasing the fill factor to reduce space used. Seems contrary but can work out in the end. Definitely NOT a panacea, though.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply