Index with leading BIT column

  • I've seen articles that say you shouldn't make the leading column in your index a BIT column. There are, of course, other articles pointing out the eternal truth of SQL Server "It Depends".

    However, Microsoft's Missing Index suggestion continually leads the indexes it suggests with BIT columns. I would like to push the BIT column further down in the index column order, but I need to be able to argue against what's seen as Microsoft suggestions.

    For example, the column it suggests is Obsolete. On some tables it can be a 50/50 split between 1 & 0, on others it can be 90/10. On still others it is nullable. It suggests the Obsolete column first on all of these.

    Any suggestions, philosophies, dictates, or musings are welcome.

    --------------------------------------
    When you encounter a problem, if the solution isn't readily evident go back to the start and check your assumptions.
    --------------------------------------
    It’s unpleasantly like being drunk.
    What’s so unpleasant about being drunk?
    You ask a glass of water. -- Douglas Adams

  • I don't see any problem with using a leading bit column for an index. I can think of many more examples where it would be appropriate as part of a non-clustered index. I sometimes work with data with an IsActive bit column and will create a covering index with IsActive as the first key for supporting queries where we only need active or inactive clients, products, etc.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Going back a few employers I combined Current and Historical data into the same tables when downloading and importing data from our ISAM databases into SQL Server. Doing this I had to flag the data as Current (0) or Historical (1). Knowing that the ratio between Current and Historical was 10% Current and 90% Historical, I put the flag as the leading column in the index since 90% of my queries were against the 10% of the data. This limited what data had to be searched.

    I would probably do it if the distribution was 50/50 and 90% of my queries were still against just Current data.

    What does this mean? Just what you said, it depends. It isn't just the distribution of the data, it also depends on the distribution of the data accessed and how it is accessed.

  • If the flag is significant enough, make it the first column in the clustered index. For example, in a 90%-10% situation where you mostly process the 10%.

    Be aware that SQL will (almost) always prefer equality matches, such as bit_column = 0|1, rather than range conditions, such as int_column >= value1 and int_column < value2, even if sometimes the latter condition is more selective. In those cases, you could cluster first on int_column, no matter what the index suggestion was. Remember: it is ultimately just a suggestion, and even MS themselves recommend thoroughly vetting the suggestions before adding/altering/dropping indexes.

    SQL DBA,SQL Server MVP(07, 08, 09) A socialist is someone who will give you the shirt off *someone else's* back.

  • Sioban Krzywicki (12/21/2016)


    I've seen articles that say you shouldn't make the leading column in your index a BIT column. There are, of course, other articles pointing out the eternal truth of SQL Server "It Depends".

    However, Microsoft's Missing Index suggestion continually leads the indexes it suggests with BIT columns. I would like to push the BIT column further down in the index column order, but I need to be able to argue against what's seen as Microsoft suggestions.

    For example, the column it suggests is Obsolete. On some tables it can be a 50/50 split between 1 & 0, on others it can be 90/10. On still others it is nullable. It suggests the Obsolete column first on all of these.

    Any suggestions, philosophies, dictates, or musings are welcome.

    Your looking at the guy that accidentally paralyzed Expedia.com for two minutes by adding an NCI that had a bit column for the leading column. The pages splits on that index during the INSERTs took performance and threw it on the flow.

    I don't care what Microsoft or anyone else says, don't use bit columns or other very low cardinality columns as the leading column of any index. Use a filtered index instead. Just make sure that any SQL Agent jobs that run a script against the table also include SET QUOTED_IDENTIFIER ON or the jobs will fail with a message reminding you of that fact. It doesn't bother jobs that use stored procedures that were created in SSMS if that's the default for whatever SSMS connections are made.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/26/2016)


    Sioban Krzywicki (12/21/2016)


    I've seen articles that say you shouldn't make the leading column in your index a BIT column. There are, of course, other articles pointing out the eternal truth of SQL Server "It Depends".

    However, Microsoft's Missing Index suggestion continually leads the indexes it suggests with BIT columns. I would like to push the BIT column further down in the index column order, but I need to be able to argue against what's seen as Microsoft suggestions.

    For example, the column it suggests is Obsolete. On some tables it can be a 50/50 split between 1 & 0, on others it can be 90/10. On still others it is nullable. It suggests the Obsolete column first on all of these.

    Any suggestions, philosophies, dictates, or musings are welcome.

    Your looking at the guy that accidentally paralyzed Expedia.com for two minutes by adding an NCI that had a bit column for the leading column. The pages splits on that index during the INSERTs took performance and threw it on the flow.

    I don't care what Microsoft or anyone else says, don't use bit columns or other very low cardinality columns as the leading column of any index. Use a filtered index instead. Just make sure that any SQL Agent jobs that run a script against the table also include SET QUOTED_IDENTIFIER ON or the jobs will fail with a message reminding you of that fact. It doesn't bother jobs that use stored procedures that were created in SSMS if that's the default for whatever SSMS connections are made.

    I'd love to hear more about that. What data was in that bit column, why it lead to page splits. It feels like the full context might be an important consideration.

    Also what the connection between filtered indexes, and ensuring quoted identifiers are on in SQL agent jobs. The connection is not immediately clear

  • Sioban Krzywicki (12/21/2016)


    I've seen articles that say you shouldn't make the leading column in your index a BIT column. There are, of course, other articles pointing out the eternal truth of SQL Server "It Depends".

    However, Microsoft's Missing Index suggestion continually leads the indexes it suggests with BIT columns. I would like to push the BIT column further down in the index column order, but I need to be able to argue against what's seen as Microsoft suggestions.

    For example, the column it suggests is Obsolete. On some tables it can be a 50/50 split between 1 & 0, on others it can be 90/10. On still others it is nullable. It suggests the Obsolete column first on all of these.

    Any suggestions, philosophies, dictates, or musings are welcome.

    The first thing to recognize is that the missing index suggestions you see in an execution plan or in the dynamic management views DO NOT specify an order for the columns:

    https://technet.microsoft.com/en-us/library/ms345485(v=sql.105).aspx

    What is the best thing to do is consider how the most frequently used queries against this table join to it or filter data in the WHERE clause. The columns that are most frequently used and are the most selective are the best candidates to be the first columns in the index.

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply