Indexes & FillFactor

  • I am not a DBA but play one at work and not very well.

    I have a very large DB (1 billion rows) in several tables. The average table size is about 75mil rows. I append about 15 million rows a week to several tables. My question revolves around fill factor. My limited knowledge tells me if i have a low fill factor, index fragmentation will remain low but I/O could suffer due to more pages. If i go high the opposite will happen.

    How does the fill factor impact my inserts?? If fill factor is low, do inserts take longer??

    Also-

    If i schedule a job/maintenance plan to drop and rebuild or create with drop_existing on, do i need to worry about fragmentation and therefor set my fillfactor to 100?

    Any insight would be much appreciated

    Thanks

    Peter

  • Here is part of where being a DBA can be a bit of an art.

    Indexes and fragmentation can be avoided in several ways. First - let's get the clustered index out of the way - 99.99% of the time every table should have a clustered index. This orders the physical table data in the order of the index. Since this is the order of all of the data and every other index uses the clustered index to retrieve data, it is super important to not fragment it - so make sure to think that one through most carefully.

    Here is a simplified version:

    If you add data to a table in the same order an index is in, it will essentially not fragment during your inserts. You will always be adding data to the end of the index so you will not split pages. So, try to plan your indexing and your inserts together. If you have a bulk insert happening regularly, try to insert in the order of your clustered index and it will save you lots of reindexing.

    If you have a lot of sparse inserts or indexes that contradict the order you add data, these are the ones to use a fillfactor to help you out. Having a fillfactor basically just leaves space when an index is created so you have the ability to add data without splitting a page to get more space in your index. Every time you have to split a page you hurt performance because the new page ends up in a poorer position on the physical disk - requiring the read head to move really far.

    Having a ton of empty space in your indexes can slow you down a bit, but the pages being in the correct place on disk is so helpful that the performance hit you take moving from one page to another is usually pretty low. If you have a lot of queries that return records that are immediately next to each other, it can be helpful to have them on the same page, but a bigger fillfactor is usually a safer performance approach. The big problem is you will drastically increase the size of your database with a really low fillfactor, so be careful.

    Now, taking this and other posted information into account, you should be able to plan a bit. Focus on ensuring you don't fragment your clustered indexes. For the other indexes, plan a fillfactor that keeps fragmentation down and reindex just the ones that fragment frequently when you have to. Don't reindex everything all at once, schedule and plan to match when they are fragmenting.

  • Fill factor is not a representation of index fragmentation. The fill factor setting tells SQL Server how full to fill the index pages when creating a new index. If you specify 80%, the index building operation will fill each page to 80% of capacity and then move on to a new page. The idea of leaving free space is that eventually, in the event of a clustered index since it is the data row, you'll be updating the row and the row update could require more space. If there if free space in the page, the update can happen right there with no problem. If there is not free space, you end up with what we call a page split. Page splits have a negative impact on I/O. You can also get page splits when INSERTing data if the clustered index is build on an non-sequential column. Since the clustered index physically orders the data, creating a clustered index on an non-sequential value means that the data must be moved around so the the newly inserted row has room in the correct order on that page.

    Long story short, the free space is there so as to avoid page splits and increase insert/update performance. Having the fill factor set for too much free space will have a negative impact on your SELECT queries as more data pages will need to be read into the buffer to satisfy the query results. This leaves us, as the DBAs, with the task of balancing out the negative side effects of page splits with the negative side effects of too much free space.

    Keep in mind that the fill factor setting is only relevant when creating (or re-creating) an index. Once the index has been created, SQL Server will not keep the pages at that percentage of fullness. This is why we do maintenance on indexes.

    Make sense?

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Ah, Michael, you must have posted whilst I was typing....I'll second everything that you've said as well!

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • I knew i could count on you smart people!!!! 🙂

    A few follow ups---

    What is the best way to view fragmentation? Is it sys.dm_db_index_physical_stats??

    If it is, what is the key indicator i.e. avg_fragmentation_in_percent

    or other? what values would trigger a rebuild?

    Michael - you mentioned adding data in the same order as the clustered index. Does that mean my insert statement should be:

    INSERT (Field1,Field7,Field8,Field2,Field3...) if Field1,Field7,Field8 is my clustered index??

    Again - TY all for the insight.

    Peter

  • Yes, that dynamice management view is specifically created to watch fragmentation. the avg_fragmentation_in_percent represents the percentage of logical or extent fragmentation within your index/table. The closer this value to zero, the better. I would start thinking index maintenance when this gets to 10% or more.

    Another value to watch is avg_page_space_used_in_percent. This represents how full your pages are. If your fill factor is set at 80%, this value should be somewhere between 80-100%. If, in this example, you see this value 75% or less, you are seeing the result of page splits (when the page split happens, SQL Server splits off 1/2 of the page so this value will appear between 50% and your fill factor value).

    Reading up on fill factor and internal/external fragmentation in BOL will help you with this.

    Also, Michael was referring to the order of the data, not the columns. For example, if your clustered index was on an int column, you would want to see this:

    INSERT INTO @Table (IntValue)

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3

    as opposed to this

    INSERT INTO @Table (IntValue)

    SELECT 3 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Hurray.. to John and Michael.. Great Explanation.

    I am Just curious to know.. how long does it take you to REBUILD your indexes on these BIG Tables.. and How often do you rebuild your indexes.

    with such big tables , you Database Size should be HUGE, so if you RUN SHRINKFILE or SHRINKDATABASE, it can introduce Fragmentation too...

    So you should consider running REINDEX after you shrink operations.

    SQL 2005 has some super DMV's and can give you lots of info as already discussed.

    sys.dm_db_index_physical_stats

    sys.dm_db_index_operational_stats

    sys.dm_db_index_usage_stats

  • Mani Singh (7/9/2008)


    So you should consider running REINDEX after you shrink operations.

    You shouldn't be shrinking your databases in the first place.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • ... unless there is a huge problem of disk storage free space.

    -Satya SK Jayanty
    SQL Server MVP (Follow me @sqlmaster)
    Author of SQL Server 2008 R2 Administration CookBook
    SQL Server Knowledge Sharing network

  • Satya_skj (7/10/2008)


    ... unless there is a huge problem of disk storage free space.

    In which case you get more disks. It's not as if storage is that expensive these days.

    Shrinking is a short term solution, not a long term one. Databases tend to grow. Shrinking a database that is going to get new data is just going to force a grow, possibly at a time where that grow will affect users on the system, possibly causing external (file level) fragmentation. Reindexing to fix the fragmentation thatthe shrink caused is also going to force a grow.

    The only time I recommend a shrink is after an archive and purge of data (in which case there will be lots of space free that may not be reused for months or longer) or when transfering a database down to dev with just a fragment of the data

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • It Depends, and falls back to how flexiable your organization is in spending $$ and also depends if there are any unsed bays.

  • if you are going to shrink, at least do it after the index rebuild as pages will be moved around. If you specified a fill factor of 50 for instance the file would only grow again.

    get into the habit of sizing databases and allocating the appropriate disk space and dont use autogrow

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • Perry Whittle (7/10/2008)


    if you are going to shrink, at least do it after the index rebuild as pages will be moved around. If you specified a fill factor of 50 for instance the file would only grow again.

    If you shrink after an index rebuild, the shrink will unorganise all the pages that your reindex organised. You could easily end up with worse framentation than before the rebuild.

    I did a test of that - Shrinking Databases[/url]

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • wow, never tested it Gail. How would it account for a different fill factor though?

    Agree with the last point though, dont shrink!

    -----------------------------------------------------------------------------------------------------------

    "Ya can't make an omelette without breaking just a few eggs" 😉

  • All good stuff here. Again, i'm the dba-wanna-be.

    So as i start to learn and see that every index and key is completely fragmented, i am doing defrag. My database before i started was 400Gb and is now down to 325 just through defragging and i have more to go. Should i still stay away from a shrinkdb?? is the defragging taking care of recovering the lost space?? if i do shrink at the end of my defragging, should i do another defrag??

    thanks guys.

    peter

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply