How much will it cost or save to rebuild that index? (SQL Oolie)

  • @jeff - but both it is linked together 🙂

    @Jeffrey: of course I'd cluster your table by PatientID, DocID too. But when you are not a very new hospital 99.x% of your former patients will already have left it again and their data are still in your database (because of legal issues or to have it available when they are coming back some time later ...)

    So you have e.g. 1000 pages with maybe each 20 patients in your table (= 20k total), you have currently maybe 500 patients, 100 are leaving per day and being replaced with new ones.

    With 100 % fillfactor you would have about 100-500 page splits per day (when the data are equally distributed), which is not the best idea.

    With 95 % fillfactor, you'd have 1053 pages (100% / 95% * 1000 original pages - see the orginal article) with 19 patients on each. You get daily 100 (of 20k) new patients (=0,5 %) and with 5 % free space there is only a chance of about 1:10 that you get a single page split per day for one of them (if two patients are stored into the same "bucket" (page) or you add a ton of stuff to the page of one of your other 400 current patients).

    The only reason for very low fillfactors as 70 % I can see is that you are really inserting / replacing 20-30 % of your data in not ascending order in the time between two index maintenance windows. This MAY happen in special cases but never occured to me.

    Edit: the documents (varchar or varbinary(max) itself will be automatical be stored in separate LOB-pages, the record itself contains only a pointer to it. Exception is, when the document is < 8 kb and will fit (together with the other columns of the record) onto one page. And exception of this exception is, that you forced even small varchar(max) columns onto the LOB-Pages using sp_tableoption.

    • This reply was modified 2 years, 9 months ago by  Thomas Franz.

    God is real, unless declared integer.

  • Jeffrey Williams wrote:

    Thomas Franz wrote:

    Stupid question, but why are some people use a low fill factor as 70 or 80%, I'm usually fine with something as 95-98 (or even 100 if it is on a identiy column and has no varchar() columns that WILL (not could) be updated by a longer value).

    I understand, that there may be exception, as when you use an UNID as primary key (maybe because you are using some sort of replication / synchronisation over several sides), but in this case I'd use the UNID-column (usually) only as nonclustered index.

    Usually > 90 % of your database are old data that will almost never be updated. When your main key is ascending (dates, own or "foreign" (as the order_id in the order_position table) identity columns etc.) you will never insert in the middle of the index. And even if you have an index on the StackOverflows Posts table on the user_id or cluster your order table (for whatever reasons) by the customer_id it will be usually only one or two inserts per day / week / whatever your index optimize window is, so 95% gives you usually more then enough space for those inserts.

    There are some assumptions here that are not universal - for example, a document table in a patient centric system should not be clustered on the DocumentID (identity) - rather it should be clustered on the PatientID and DocumentID.  This ensures that requests for patients documents will be retrieved from the fewest possible contiguous pages.  With this type of table - you definitely want a lower fill factor and a scheduled rebuild of the index to manage fragmentation and allow for new documents to be added to the patient data.

    Thomas Franz wrote:

    @Jeff - but both it is linked together 🙂

    @Jeffrey: of course I'd cluster your table by PatientID, DocID too. But when you are not a very new hospital 99.x% of your former patients will already have left it again and their data are still in your database (because of legal issues or to have it available when they are coming back some time later ...)

    So you have e.g. 1000 pages with maybe each 20 patients in your table (= 20k total), you have currently maybe 500 patients, 100 are leaving per day and being replaced with new ones.

    With 100 % fillfactor you would have about 100-500 page splits per day (when the data are equally distributed), which is not the best idea.

    With 95 % fillfactor, you'd have 1053 pages (100% / 95% * 1000 original pages - see the orginal article) with 19 patients on each. You get daily 100 (of 20k) new patients (=0,5 %) and with 5 % free space there is only a chance of about 1:10 that you get a single page split per day for one of them (if two patients are stored into the same "bucket" (page) or you add a ton of stuff to the page of one of your other 400 current patients).

    The only reason for very low fillfactors as 70 % I can see is that you are really inserting / replacing 20-30 % of your data in not ascending order in the time between two index maintenance windows. This MAY happen in special cases but never occured to me.

    Edit: the documents (varchar or varbinary(max) itself will be automatical be stored in separate LOB-pages, the record itself contains only a pointer to it. Exception is, when the document is < 8 kb and will fit (together with the other columns of the record) onto one page. And exception of this exception is, that you forced even small varchar(max) columns onto the LOB-Pages using sp_tableoption.

    Heh... OK.  Get ready for a round of pork chops at point blank range, then. 😉

    Let's say you do create a CI on the combination of PatientID and DocumentID.  Now... ask yourself, what is the basis of each?  If they are as so many would make them, Ever-Inceasing Integers, what would happen to your nice Clustered Index on that table?

    The answer is that it would become what I classify as a "Type 98" index (A "Sequential Silos" index).  The "8" is there because it looks like an "infinity" symbol stood up on end and is highly appropriate for what will occur on that index for all time.  It will suffer from extremely massive perpetual logical fragmentation and almost never suffer from Page Density (a.k.a. Physical Fragmentation) problems.  Nothing you can do, including lowering the Fill Factor to any level can be done to prevent the fragmentation.  The only good thing about the index will be that it won't suffer from any "Bad" page splits so at least you won't suffer from any huge delays in Inserts nor suffer from any physical fragmentation that cause low page densities.  This is also why I assign a Fill Factor of 98 to such indexes.  It identifies the index as to what type it is (infinitely hopeless when it comes to logical fragmentation) and keeps the Fill Factor near the 100% that the inserts to it will naturally cause.

    Now... that's not a problem at all except for one thing... like most of the bloody world, you probably subscribe to the fact that it's a "Best Practice" to REORGANIZE at between 5 and 30% logical fragmentation and REBUILD at 30%.  The trouble here is that your entire index is going to suffer from logical fragmentation and it will do so very quickly and perpetually and there you are... REORGANIZING this index on a daily basis if you run index maintenance every day (and with a HUGE amount of LOGFILE that will make your REBUILDs look quite trivial in comparison) and, if you only do index maintenance once per week, you'll end up being nearly guaranteed to doing a REBUILD of this Clustered Index every week or at least every other week and wonder why.

    The real issue that I was trying avoid in discussing what to select for Clustered Index is that you're absolutely correct... such a selection IS, in fact, directly related to what Fill Factor you're going to use AS WELL AS TO WHAT YOUR GOING TO USE FOR INDEX MAINTENANCE and until you actually realize that this is going to end up being a "TYPE 98" index, EVERYONE is going to screw up the index maintenance and waste huge amounts of time trying to maintain an index that simply cannot be maintained if you expect the supposed "BEST PRACTICE" index maintenance, which I submit to you is actually a WORST PRACTICE for almost all indexes, to do anything for you.

    If you expect to keep this index from logical fragmentation, then it is quite ignorant of you to build such an index.  It will NOT perform in the manner you think it will especially if it has any row size to it except right after you have either rebuilt it or made the serious mistake of reorganizing it.  Unless you have just done a rebuild or reorg, it will have the "Read Ahead" performance of an unmaintained Random GUID Clustered Index.

    Now, do you STILL think that this conversation is appropriate for a thread that is supposed to be dedicated to simply calculating how much lowering the Fill Factor will cost you for an index where it will prove to be stupid to do so? 😉

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Let's say you do create a CI on the combination of PatientID and DocumentID.  Now... ask yourself, what is the basis of each?  If they are as so many would make them, Ever-Inceasing Integers, what would happen to your nice Clustered Index on that table? 

    Would an NCI with the same keys be like this? Is there a solution or it is what it is and you have to live with it? How will SELECTs on either part of the key be affected?

    Maybe I should start reviewing my indexes :-/

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Thank you for the nice article and breaking things down for us. Yes the mathematics of how much % gain is a interesting one. As I like to see it :

    If you have 1 MB of data in some storage area (page, container, .. doesnt matter) and that 1 MB takes up 100% of your space, then how big does your (page, container) now have to be for 1MB to take 70% of your space?

    Ratios come in handy: our unknown here is our new storage size.

    1Mb / 70% = ? mb / 100% . In other words, if 1MB is now 70% of your storage size then how many MB are 100% ?

    Basic math from here, cross multiply gives the expression (1mb * 100 %) =( ?mb * 70%)

    100=?*70 ;

    (divide both sides by 70)  100/70 = ? =1.429mb. So you would need to add 0.429 MB or ~ 43% more space.

     

     

     

    • This reply was modified 2 years, 8 months ago by  MMartin1.

    ----------------------------------------------------

  • This was removed by the editor as SPAM

Viewing 5 posts - 16 through 19 (of 19 total)

You must be logged in to reply to this topic. Login to reply