A Self-Tuning Fill Factor Technique for SQL Server – Part 2

  • Comments posted to this topic are about the item A Self-Tuning Fill Factor Technique for SQL Server – Part 2

    Mike Byrd

  • Nice article.

    My main problem with the fill factor is, that I can't set it per partition. So I have either to waste a lot of space in my old, cold data from 3 years ago that will never be updated / inserted again or have a low fill factor and many splitts on my current hot data with a fill factor of e.g. 99 %...

    God is real, unless declared integer.

  • very interesting, indeed.

    at least with additional non-clustered indexes you can use different, filtered ones. one for the hot data, another for the cold data. you will need to be creative if reporting stretches into the cold data, though.

  • Stand by for an undated script in GitHub.  You can find it with my UserName: MByrdTX

    I also have listed there this 2 parter in Word format, a PowerPoint presentation I gave last week to the Austin SQL Server User's Group (CACTUSS), and a link to a video created for my Austin presentation.  Newer scripts are there from the one submitted with Part 1, but standby for the new one later this week.

    The new script should be uploaded NLT the end of the week.  I am still doing some testing with additional inputs from previous reviewers.

    I feel the pain of not being able to specify different fill factors for partitions within an index.  It should be on Microsoft's wish list.

    Cheers,

    Mike

    Mike Byrd

  • Apologies... I posted a reply on the wrong thread and removed it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thomas Franz wrote:

    Nice article.

    My main problem with the fill factor is, that I can't set it per partition. So I have either to waste a lot of space in my old, cold data from 3 years ago that will never be updated / inserted again or have a low fill factor and many splitts on my current hot data with a fill factor of e.g. 99 %...

    That's one of the reasons why I prefer Partitioned VIEWS compared to Partitioned TABLES.  They have a lot of other advantages over Partitioned tables, as well.  I hope MS doesn't see fit to someday deprecate them like they have a whole lot of other incredible tools.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, a Deep Dive thread!

    Do you can also comment of PAD_INDEX settings in this all POC discussion? Currently in either fill factor settings your script is setting PAD_INDEX=OFF .

    And please correct; in overall POC the Fill factor on average laid on 95%+ for concerned indexes?

    Thanks,

  • Abrar Ahmad_ wrote:

    Yes, a Deep Dive thread!

    Do you can also comment of PAD_INDEX settings in this all POC discussion? Currently in either fill factor settings your script is setting PAD_INDEX=OFF .

    And please correct; in overall POC the Fill factor on average laid on 95%+ for concerned indexes?

    Thanks,

    Since you pinged me on this with a PM, I have to ask, who are you asking?  The author of this article or me?

    As for my thoughts on the subject of PAD_INDEX, while padding the intermediate levels of a B-Tree Index can help reduce page splits, it pales in comparison to preventing page splits in the leaf level, which would also prevent page splits in the intermediate levels.

    I'll let the author of the article explain the apparent "Fill factor on average laid on 95%+ for concerned indexes" queestion because I'll say "It Depends" entirely on the insert/update pattern and I follow much different methodologies.  For example, a 95%+ Fill Factor for GUIDs will probably not be the way to go but is absolutely the way to go for ever-increasing keyed indexes with the caveat that it won't prevent bad page splits during expansive updates.  Even setting a FILL FACTOR of 50% won't prevent page splits if the expansive updates are made on the most recent data inserted prior to the index being rebuilt.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Abrar Ahmad_ wrote:

    Yes, a Deep Dive thread!

    Do you can also comment of PAD_INDEX settings in this all POC discussion? Currently in either fill factor settings your script is setting PAD_INDEX=OFF .

    And please correct; in overall POC the Fill factor on average laid on 95%+ for concerned indexes?

    Thanks,

    Since you pinged me on this with a PM, I have to ask, who are you asking?  The author of this article or me?

    As for my thoughts on the subject of PAD_INDEX, while padding the intermediate levels of a B-Tree Index can help reduce page splits, it pales in comparison to preventing page splits in the leaf level, which would also prevent page splits in the intermediate levels.

    I'll let the author of the article explain the apparent "Fill factor on average laid on 95%+ for concerned indexes" queestion because I'll say "It Depends" entirely on the insert/update pattern and I follow much different methodologies.  For example, a 95%+ Fill Factor for GUIDs will probably not be the way to go but is absolutely the way to go for ever-increasing keyed indexes with the caveat that it won't prevent bad page splits during expansive updates.  Even setting a FILL FACTOR of 50% won't prevent page splits if the expansive updates are made on the most recent data inserted prior to the index being rebuilt.

    🙂

    Actually just went through from your earlier comments in this thread and following article with your astonishing presentation

    https://www.sqlsaturday.com/830/Sessions/Details.aspx?name=black-arts-index-maintenance-1-how-the-best-practice-methods-are-silently-killing-performance&sid=88238

    Thank you for your valuable feedback!

     

  • You are correct in asking about PAD_INDEX setting.  While I was still in the experimental and analysis part of this project I was trying to keep minimal variations on what I was seeing.  But I did still see page splits in the intermediate levels (especially in my other article: https://www.sqlservercentral.com/articles/how-bad-are-bad-page-splits). Kimberly Tripp posted on this at https://www.sqlskills.com/blogskimberly/msdn-webcast-qa-index-defrag-best-practices-fragmentation-questions-part-i/. From that post she suggests " PAD_INDEX is not needed unless you have extremely unbalanced activity and are unable to rebalance the tree (meaning you need to achieve high availability)."  I never did see very many bad page splits at the intermediate level and for now am content to leave as is.  However, there may be conditions where it may be necessary.

    I'm not sure what you are asking "And please correct; in overall POC the Fill factor on average laid on 95%+ for concerned indexes?"  All indexes had a fill factor = 100% when I started.  The code then adjusted on a daily basis to collect subsequent parametrics on bad page splits and fragmentation.

    The code for the SQL Agent job is still evolving.  Attached is latest code as of 2/10/2020.

    Mike

     

    • This reply was modified 4 years, 2 months ago by  Mike Byrd.
    Attachments:
    You must be logged in to view attached files.

    Mike Byrd

  • I continue to be amazed by the overall reduction in wait times (from DPA free) for the production database.  There are still spikes and sometimes short plateaus, but overall during the day the whole database wait times are hovering in the 200-300 second range over 10 minutes (this is with 32 logical processors).  What is amazing to me is that since early last fall I've only gotten 2 calls (during non-business hours) for slowness and both could be attributed to stored procedure recompiles with the wrong query plan (we are still on SS2012 🙁    ).

    Mike Byrd

  • Mike Byrd wrote:

    You are correct in asking about PAD_INDEX setting.  While I was still in the experimental and analysis part of this project I was trying to keep minimal variations on what I was seeing.  But I did still see page splits in the intermediate levels (especially in my other article: https://www.sqlservercentral.com/articles/how-bad-are-bad-page-splits). Kimberly Tripp posted on this at https://www.sqlskills.com/blogskimberly/msdn-webcast-qa-index-defrag-best-practices-fragmentation-questions-part-i/. From that post she suggests " PAD_INDEX is not needed unless you have extremely unbalanced activity and are unable to rebalance the tree (meaning you need to achieve high availability)."  I never did see very many bad page splits at the intermediate level and for now am content to leave as is.  However, there may be conditions where it may be necessary.

    I'm not sure what you are asking "And please correct; in overall POC the Fill factor on average laid on 95%+ for concerned indexes?"  All indexes had a fill factor = 100% when I started.  The code then adjusted on a daily basis to collect subsequent parametrics on bad page splits and fragmentation.

    The code for the SQL Agent job is still evolving.  Attached is latest code as of 2/10/2020.

    Mike

    Mike; No doubt good answer.

    But I wonder that we are still referring to some master articles written in way back 2004. Do the advancements in IO/Storage , CPUs threading controls and especially the DBMS; do the problems still remained same as of earlier period?

    🙂 do we need to reinvent the wheel???? 😀

    Practically; we are sitting on new streamlined servers but still bottlenecked in spikes; from no-where or at whatever time? 🙂

    Thank You!

     

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply