Multicolumn indexes and fragmentation

  • Is there anything special you should think of concerning index fragmentation and multicolumn indexes (sorry for possible bad terminology)? I am not sure how to think here, if I need to think about it all. Currently I've got a index that looks like this:  int, int, bit, datetime. I think it's easier to understand when there's only one column involved.. 🙂

    Currently page fullness is about 70% and fragmentation about 60%. The index itself is quite small, below 20MB.

  • Indexes art there for two reasons:

    • to enforce a constraint , needed for the data model ( unique - primary key )
    • to support queries.
    • Depending on the consumption pattern, column order is of importance ( = operator before  not equal ) most filtering column first, ...
    • a 20MB index is small indeed , but can still messup your system if used very frequently and e.g. implicit conversions involved.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Yes, but what about fragmentation? Should I bother? FillFactor? It's easier with just one index-column, not multiple (in my head anyway... :))

  • It's likely the fragmentation is having some effect on overall performance by causing some small additional I/O.

    To clarify, for a single keyed lookup, the index performance is not affected at all by fragmentation.  However, typically with multi-key-column indexes, we see partial key lookups that then scan from a point forward.  Those type of lookups can be affected by index fragmentation.

    If you're concerned, you could rebuild the index with, say, a 99% fillfactor.  For that small an index, the rebuild will be very fast anyway.

     

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • The broader the key, the more likely an index is to fragment. More keys, more fragmentation. Overall though, I'd only worry about fragmentation if you can prove, and I mean prove, that defragging the index improves performance. Generally speaking, exceptions abound, you're better off focusing on ensuring very good statistics and statistics maintenance as opposed to worrying about fragmentation.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • The broader the key, the more likely an index is to fragment. More keys, more fragmentation. Overall though, I'd only worry about fragmentation if you can prove, and I mean prove, that defragging the index improves performance. Generally speaking, exceptions abound, you're better off focusing on ensuring very good statistics and statistics maintenance as opposed to worrying about fragmentation.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Page fullness is more a problem than fragmentation, but where you are at, at 70% is generally still pretty good.

    I would not even consider going to single column indexes only. It all depends on the workload, however, commonly single column indexes are that useful, and SQL may ignore them determining that doing a scan is more efficient.

  • I don't agree with the rules about the order of columns in a multi-column index. It is correct that it depends on the consumption pattern with "equal to" operator before "not equal to". But "most filtering column first" doesn't mean anything. Instead, "equal to" columns must be sorted depending on any other indexes. If there are no other indexes, we can do as MS does when creating Missing Index proposals and simply list the columns in order of definition. MS knows which column is the "most filtering column" through statistics, but since all columns are used by seek, the order is subordinate and then not used. It is also necessary to decide the order of the Clusterkey - not necessary last as is the default.

  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • Logical fragmentation is seriously overrated as a concern.  As Grant said, unless you can seriously prove that it's causing an issue, don't worry about it.  I'm working on some articles that prove that.

    So far as which Fill Factor to use if you do rebuild it to recover some space, you need to know HOW the index is being fragmented before any recommendation can be made there.  Frequently, lowering the Fill Factor only wastes space.  In a whole lot other cases, it's critical to use a Fill Factor or you'll suffer the proverbial "Morning After" syndrome after blanket index maintenance.  That's when massive blocking occurs until the index fragments enough to make the necessary space to prevent the page-splits (which are a lot more expensive than people have been led to believe) that result in fragmentation.

    With the exception of random but evenly distributed indexes, I wouldn't use logical fragmentation to determine if an index needs to be rebuilt.  In the case mentioned, you actually need to rebuild (NOT reorganize) when the index hits 1% logical fragmentation because a page-split avalanche is getting ready to occur.

    If you want a decent introduction into the myths concerning supposed "Best Practice" index maintenance, how REORGANIZE really works and why you shouldn't generally use it, and much more, please see the following 82 minute video.  Despite its title, its NOT just about Random GUIDs... Random GUIDs were used for most of the demo's because they're the supposed "poster child" for fragmentation.  In the end, I show the results of inserting 100,000 rows per day into a Random GUID clustered index for 58 simulated days and producing less than 1% fragmentation for the 58 days without doing ANY index maintenance at all.

    https://www.youtube.com/watch?v=rvZwMNJxqVo

    If you're listening with a headset, be aware that they added 3 advertisements at the 15:00, 30:15, and 45:25. The advertisements are important to help support this event but I wanted to let you know that they're sudden and they're loud! They WILL lift your headset!

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    With the exception of random but evenly distributed indexes, I wouldn't use logical fragmentation to determine if an index needs to be rebuilt.  In the case mentioned, you actually need to rebuild (NOT reorganize) when the index hits 1% logical fragmentation because a page-split avalanche is getting ready to occur.

    But doesn't that assume that all rows are always identical in length?  I would think that might not be true.  A given new row could be a (lot) longer than a typical row, in which case that one page could split when other pages are not about to split.  Particularly if the table is not using data compression.

    Also, for child tables in particular, the typical usage would be a seek and then a scan of contiguous rows.  In those situations, any significant logical fragmentation could cause some overhead: maybe or maybe not enough to require a rebuild, but it's worth considering.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • ScottPletcher wrote:

    Jeff Moden wrote:

    With the exception of random but evenly distributed indexes, I wouldn't use logical fragmentation to determine if an index needs to be rebuilt.  In the case mentioned, you actually need to rebuild (NOT reorganize) when the index hits 1% logical fragmentation because a page-split avalanche is getting ready to occur.

    But doesn't that assume that all rows are always identical in length?  I would think that might not be true.  A given new row could be a (lot) longer than a typical row, in which case that one page could split when other pages are not about to split.  Particularly if the table is not using data compression.

    Also, for child tables in particular, the typical usage would be a seek and then a scan of contiguous rows.  In those situations, any significant logical fragmentation could cause some overhead: maybe or maybe not enough to require a rebuild, but it's worth considering.

    To be honest, I don't know for sure but I can't see how it would be a problem unless you ran into the issue of trapped short rows, which will bugger up any type of index.  I'll add that to my list of things to do.  If you beat me to it, post the results so we can all learn.  Thanks, Scott.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ScottPletcher wrote:

    Also, for child tables in particular, the typical usage would be a seek and then a scan of contiguous rows.  In those situations, any significant logical fragmentation COULD cause SOME overhead: maybe or maybe not enough to require a rebuild, but it's worth considering.

    Totally understood about your concerns.  This is the same suggestion that I've run across in tens, if not, hundreds of places on the internet.  All but two of them do any kind of proofing one way or another, although not substantial enough, IMHO.  The other thing is that most everyone also use SOME in terms of overhead.  They also say nothing about how much overhead that will be actually proving that it's worth doing index maintenance to avoid. So, yes, I agree with you where you said "maybe or maybe not enough to require a rebuild, but it's worth considering."  The trouble is, most people don't "consider" anything at the single index level.  Instead, they use a "Best Practice" blanket job that was never intended to be a "Best Practice" and is, in fact, a worst practice that tends to perpetuate the causes of pages splits, even for Non-Random GUID keyed indexes.

    I'll see if I can dig up the two threads I'm talking about where one proves that rebuilding and index actually made scan performance much worse and the other showed that logical fragmentation really didn't matter and that the related index maintenance really didn't help.  Again, I consider these to articles to be "teasers" that all that is believed about fragmentation isn't true but they do follow a lot of the testing that I've been doing.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply