Can NULLs Exist in the Columns of a Non-Clustered Index?

  • A fairly credible source (SolarWinds, DPA help function) had this advice for determining the causes of table scans:

    Investigate Non-Clustered Indexes That Might Contain Nullable Columns as Search Criteria:
     
    non-clustered index on a nullable column does not contain entries for rows with NULL values. Therefore, and for example, if a "SELECT" query uses a WHERE clause that references a nullable column in an existing non-clustered index, SQL Server will not use the non-clustered index for its search. Instead, SQL Server will perform a full table scan for the search. In another example, if a query contains "WHERE col1 IS NULL" and the col1 column allows NULLs, non-clustered index on col1 will not be used for the search

    Fixes for cases involving nullable columns:

    • Modify the table definition of the referenced column to be NOT NULL.
    • Do not use a nullable column as a search component of a non-clustered index.
    This is the first time in 20+ years of using SQL Server that I have read this. Can someone verify the accuracy/truth of this? I've attempted to contrive a test to prove or disprove it with inconclusive results.

    Thanks

  • I ran the following test on a SQL 2016 box and the query used an index seek, so this claim is false.

    CREATE TABLE #TEST ( id TINYINT NULL )

    CREATE NONCLUSTERED INDEX PK_Test ON #Test(id)

    INSERT #TEST (id)
    SELECT NULL
    UNION ALL
    SELECT TOP 100 ROW_NUMBER() OVER(ORDER BY @@VERSION)
    FROM sys.columns

    SELECT *
    FROM #TEST
    WHERE id IS NULL

    DROP TABLE #TEST

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thanks for the query, Drew.
    I believe you are correct. However, I commented out this line of code: --CREATE NONCLUSTERED INDEX PK_Test ON #Test(id) and re-executed the query.
    The execution results are identical, so I'm not sure your code proves that the non-clustered index actually contains a NULL.
    Can you think of any other way to test this? I've tried a half a dozen implementations but never was able to prove the actual number of rows that physically exist in the non-clustered index.

    In my implementations, I thought using a table hint specifying an index would force the Optimizer to use my non-clustered index. But then, I read this in Books Online:
    "Table Hints" - If the collection of hinted indexes do not include all columns referenced by the query, a fetch is performed to retrieve the remaining columns after the SQL Server Database Engine retrieves all the indexed columns.

  • Gail Wanabee - Wednesday, January 9, 2019 2:42 PM

    A fairly credible source (SolarWinds, DPA help function) had this advice for determining the causes of table scans:

    Investigate Non-Clustered Indexes That Might Contain Nullable Columns as Search Criteria:
     
    non-clustered index on a nullable column does not contain entries for rows with NULL values. Therefore, and for example, if a "SELECT" query uses a WHERE clause that references a nullable column in an existing non-clustered index, SQL Server will not use the non-clustered index for its search. Instead, SQL Server will perform a full table scan for the search. In another example, if a query contains "WHERE col1 IS NULL" and the col1 column allows NULLs, non-clustered index on col1 will not be used for the search

    Fixes for cases involving nullable columns:

    • Modify the table definition of the referenced column to be NOT NULL.
    • Do not use a nullable column as a search component of a non-clustered index.
    This is the first time in 20+ years of using SQL Server that I have read this. Can someone verify the accuracy/truth of this? I've attempted to contrive a test to prove or disprove it with inconclusive results.

    Thanks

    Completely false, total garbage. The only time an index has fewer rows than the table is when the index is filtered.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail Wanabee - Wednesday, January 9, 2019 4:39 PM

    I believe you are correct. However, I commented out this line of code: --CREATE NONCLUSTERED INDEX PK_Test ON #Test(id) and re-executed the query.
    The execution results are identical, so I'm not sure your code proves that the non-clustered index actually contains a NULL.

    The execution plan makes it very clear that there's a seek on the nonclustered index happening.

    If you comment out the create index line, you'll get a table scan instead.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Gail Wanabee - Wednesday, January 9, 2019 4:39 PM

    Thanks for the query, Drew.
    I believe you are correct. However, I commented out this line of code: --CREATE NONCLUSTERED INDEX PK_Test ON #Test(id) and re-executed the query.
    The execution results are identical, so I'm not sure your code proves that the non-clustered index actually contains a NULL.
    Can you think of any other way to test this? I've tried a half a dozen implementations but never was able to prove the actual number of rows that physically exist in the non-clustered index.

    In my implementations, I thought using a table hint specifying an index would force the Optimizer to use my non-clustered index. But then, I read this in Books Online:
    "Table Hints" - If the collection of hinted indexes do not include all columns referenced by the query, a fetch is performed to retrieve the remaining columns after the SQL Server Database Engine retrieves all the indexed columns.

    The execution plans cannot possibly be identical.  Remember, there are two parts of the query: inserting the data and selecting the data.  The first part will be essentially the same, because that part of the query hasn't changed, but the second part of the query cannot be identical, because one instance uses the index that doesn't exist in the other instance.  Here is a better test:

    CREATE TABLE #TEST ( id TINYINT NULL )

    INSERT #TEST (id)
    SELECT NULL
    UNION ALL
    SELECT TOP 100 ROW_NUMBER() OVER(ORDER BY @@VERSION)
    FROM sys.columns
    SELECT *
    FROM #TEST
    WHERE id IS NULL
    CREATE NONCLUSTERED INDEX PK_Test ON #Test(id)
    SELECT *
    FROM #TEST
    WHERE id IS NULL
    DROP INDEX PK_Test ON #Test
    SELECT *
    FROM #TEST
    WHERE id IS NULL
    DROP TABLE #TEST

    If you look at the execution plan, you'll see that the first SELECT uses a TABLE SCAN, the second uses an INDEX SEEK and the third goes back to a TABLE SCAN.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Gila, Drew,

    Thank you both for your responses.

  • Gail Wanabee - Wednesday, January 9, 2019 2:42 PM

    A fairly credible source (SolarWinds, DPA help function) had this advice for determining the causes of table scans:

    Investigate Non-Clustered Indexes That Might Contain Nullable Columns as Search Criteria:
     
    non-clustered index on a nullable column does not contain entries for rows with NULL values. Therefore, and for example, if a "SELECT" query uses a WHERE clause that references a nullable column in an existing non-clustered index, SQL Server will not use the non-clustered index for its search. Instead, SQL Server will perform a full table scan for the search. In another example, if a query contains "WHERE col1 IS NULL" and the col1 column allows NULLs, non-clustered index on col1 will not be used for the search

    Fixes for cases involving nullable columns:

    • Modify the table definition of the referenced column to be NOT NULL.
    • Do not use a nullable column as a search component of a non-clustered index.
    This is the first time in 20+ years of using SQL Server that I have read this. Can someone verify the accuracy/truth of this? I've attempted to contrive a test to prove or disprove it with inconclusive results.

    Thanks

    Please provide a link to the article of which you speak.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • It wasn't an article, Jeff. 

    I copied the text excerpt from a Help function write-up in Solarwinds DPA (Database Performance Analyzer) product which my company uses to monitor our production OLTP server.

    I was investigating a blocking chain problem. Over 80 queries were involved in a blocking event that persisted for over 3 hours and 36 minutes in the middle of the night. For one of the queries involved in the blocking, DPA identified 3 table scans on 3 different non-clustered indexes, even though the query plan DPA showed for the query did not have table scans on the referenced table. I wondered how that was possible? So I clicked on the Help function for that section of the UI and it displayed a long write-up on what could be causing the table scans (in general) and on non-clustered indexes (specifically), and how to prevent them.

  • Gail Wanabee - Saturday, January 12, 2019 4:54 PM

    It wasn't an article, Jeff. 

    I copied the text excerpt from a Help function write-up in Solarwinds DPA (Database Performance Analyzer) product which my company uses to monitor our production OLTP server.

    I was investigating a blocking chain problem. Over 80 queries were involved in a blocking event that persisted for over 3 hours and 36 minutes in the middle of the night. For one of the queries involved in the blocking, DPA identified 3 table scans on 3 different non-clustered indexes, even though the query plan DPA showed for the query did not have table scans on the referenced table. I wondered how that was possible? So I clicked on the Help function for that section of the UI and it displayed a long write-up on what could be causing the table scans (in general) and on non-clustered indexes (specifically), and how to prevent them.

    Interesting.  Thanks for the feedback.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I chatted with someone from SolarWinds. Please contact their customer support people and open a request to get this help file corrected

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • By default, non-clustered indexes do in fact keep track of NULL values, and can answer for both "id IS NULL" or "id IS NOT NULL" predicates, regardless of the null-ability constraint on the column definition.
    However, if you have a FILTERED non-clustered index (ex: "WHERE id > 1000"), and the query uses a predicate not answerable by the filter expression (ex: "id IS NOT NULL" or "id = 120"), then that would be a scenario where the non-clustered index is ignored.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply