Indexes: When Column Selectivity Is Not Always A Requirement

  • Great article, thank you!

    I think this sentence has a typo: "Now we have a Seek Predicate on DateAdded and a Predicate filter on DelFlag with a much-improved query cost (and less logical reads)."  -- looks like "DelFalg" should be "Gender" instead

  • Great catch!  And that was with 4 other sets of eyes reviewing the article before I submitted it.

     

    Mike Byrd

  • Comments posted to this topic are about the item Indexes: When Column Selectivity Is Not Always A Requirement

    Mike Byrd

  • I think this shows why indexes should lead with the equality columns, then inequality columns.  I used to be in that camp; order the index columns by selectivity.  Now I avoid date fields for leading columns of indexes even if they're very selective.

  • I would rephrase your first sentence to ..."why indexes should lead with equality columns and then range searches."  Inequality columns (from your query) will almost always cause a scan rather than a seek.

    Mike Byrd

  • Hi

    There is no index recomendation because optimisation level = TRIVIAL. If you change the query so that its level is FULL, without changing the results, you will also see the index suggestion.

    SELECT LastName,FirstName,DateAdded

    FROM dbo.Customer

    WHERE DateAdded >= '6/1/2019'

    AND DateAdded < '7/1/2019'

    AND Gender = 'F'

    and 1 = (select 1) 

     

  • Sigh, good catch.  It makes you wonder why the optimizer code didn't catch this.

    Mike Byrd

  • This reply has been reported for inappropriate content.

    Thanks for sharing, Mike.

    My understanding is that you have explained that indexes with low selectivity columns can still be useful as the first column in a multicolumn index. This is dependent on the query we are dealing with and in this case using the ASC clause in the index creation seemed very important too, right?

    Could there be other examples (different scenarios) in future articles perhaps?

    "In the Seek Predicate, all the Female rows are grouped together and the DateAdded column is in ascending sequence. Obviously this makes it easier for the query optimizer to go straight to the applicable rows with only one pass rather than two and in the original nonclustered index based on"

    Br. Kenneth Igiri
    https://kennethigiri.com
    All nations come to my light, all kings to the brightness of my rising

  • @ Mike Byrd... thank you for the great article.  I obviously missed it when it first came out and I'm glad they republish such things to give such articles the attention they deserve.

    As a bit of a sidebar, these types of indexes create a bit of "excitement" for folks that are into doing Index Maintenance based on Logical Fragmentation.

    Since the "F" and "M" in this index almost never changes and the index is based on Gender (which form "silos" or "partitions" within the index") and, within the silos, the rows are inserted into the NCI in an ever increasing fashion because the second column is DateAdd, which is "ever-increasing" and the CustomerID key inherently included from the CI, is also "ever increasing".  That means that each "silo" (and I call them that to avoid confusion with "Partitioned Indexes") forms a "perfect index" that will never fragment and will always have a page density near 100 because only "good" page splits (page splits that are append only and don't actually split a page and move half the data from one to another) occur.

    The "excitement" comes from the fact (especially if you had more than just two silos in different "low cardinality" indexes), sys.dm_db_index_physical_stats() will very quickly report them as having very high logical fragmentation because of the mid-index inserts at the end of each silo.

    The truth is, they never need to be defragmented because they're always referenced by "silo" and each silo is permanently NOT fragmented.  My recommendation is that, once you identify them, rebuild them just once (hopefully when you first build them) with a FillFactor of 98 (the "8" looks like the infinity symbol but stood up on end), and then exclude them from your index maintenance.

    To make my life easier, I actually do use the FillFactor to mark the "type" of index as follows...

    100 - Static. These are typically "Reference" tables that almost never change unless a category or status or etc is added or a "description" is updated.

    99   - Ever increasing/append only where rows are added only at the end of the index and no "ExpAnsive" updates occur.  It can take years before the "interleaving" of extents with other indexes causes enough fragmentation to get to 5% fragmentation.  Interestingly, such "interleaving" is actually deadly to performance and you should watch the number of "segments" such indexes have.  It used to be known as "Extent Fragmentation" and is reported in DBCC ShowContig under a similar name and is reported as "fragment_count" in sys.dm_db_index_physical_stats().

    98 - Sequential Silos, which I just explained above.

    There are other FillFactor values that I define for other causes but I won't bore anyone with the rest.  I just wanted to show the relationship of the "Sequential Silos" with other indexes that don't actually need to be defragmented on any kind of regular basis.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply