Counting NULLs

  • I have a table where I cannot get a reliable answer for how many rows have a NULL value for a key field.

    CREATE TABLE [dbo].[EBTable] (

     [EBID] [int] IDENTITY (1, 1) NOT NULL ,

     -- Other fields omitted for clarity

     [KeyField] [char] (54) NULL ,

     CONSTRAINT [PK_EBTable] PRIMARY KEY CLUSTERED ([EBID]) WITH FILLFACTOR = 100  ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    CREATE INDEX [IX_EBTable_KeyField] ON [dbo].[EBTable] ([KeyField]) WITH  FILLFACTOR = 100 ON [PRIMARY]

    GO

    -- Counting all rows in the table is always correct

    SELECT COUNT(*) FROM dbo.EBTable

    -- -----------

    -- 12808937

    GO

    -- Counting rows with non-null values always works

    SELECT COUNT(*) FROM dbo.EBTable WHERE KeyField IS NOT NULL

    -- -----------

    -- 11135154

    GO

    -- Counting rows with null values doesn't work

    SELECT COUNT(*) FROM dbo.EBTable WHERE KeyField IS NULL

    -- -----------

    -- 10931004

    GO

    -- Counting both null and non-null at the same time always works

    SELECT NullKey, COUNT(*) as Records

    FROM (

     SELECT CASE WHEN KeyField IS NULL THEN 1 ELSE 0 END AS NullKey

     FROM dbo.EBTable

    ) x

    GROUP BY NullKey WITH ROLLUP 

    NullKey     Records    

    ----------- -----------

    0           11135154

    1           1673783

    NULL        12808937

    Obviously, if there are 12.8 million rows and over 11 million have a KeyField value it is impossible to have another 10 million with no KeyField value.  What's worse is the NULL row count varies from 9+ million to 12+ million at different times.

    The execution plan says all queries are using the index on KeyField.  I have used DBCC DBREINDEX, and even dropped and recreated the index, but the only effect is a slight variation in the incorrect answer.  If the queries are run without the secondary index, forcing a table scan, the answer is correct.

    There is no other activity in the table at this time.  CHECKTABLE and CHECKDB can't find any errors in the table or database.

    Any ideas?

  • Key field has value as 'NULL'.

    It is not counted as NULL. It happens when you make a wrong query to insert nulls

    SELECT COUNT(*) FROM dbo.EBTable WHERE KeyField IS NULL OR KeyField = NULL OR KeyField = 'NULL'

    Added

    SELECT COUNT(*) FROM dbo.EBTable WHERE KeyField IS NOT NULL AND KeyField <> NULL AND KeyField <> 'NULL'

    might give the results you are expecting

    Regards,
    gova

  • For all things that are null, err not null, ergo Unknown.

    read todays article

    http://www.sqlservercentral.com/columnists/mcoles/fourrulesfornulls.asp

  • Note that all queries use "IS NULL" and "IS NOT NULL", never "=NULL" or "<>NULL".

    And I can assure you that none of the fields contain the character string 'NULL'

     

  • I mis-read the original post.

    Can you play around like this and post the results

    /* Just to analize the data */

    SELECT KeyField, * FROM dbo.EBTable WHERE KeyField IS NULL -- yes million rows will kill the machine

    /* logically should return 0 rows */

    SELECT A.*

    FROM

     (SELECT KeyField, * FROM dbo.EBTable WHERE KeyField IS NOT NULL) A

    JOIN    (SELECT KeyField, * FROM dbo.EBTable WHERE KeyField IS NULL) B

    ON A.KeyField = B.KeyField

    Regards,
    gova

  • Have you tried to recreate the non clustered index? Checked for table inconsistency? Maybe you get different query plans for your queries and your index/table is corrupt (which could lead to different resuts)?

  • As stated initially, I have run DBCC CHECKDB and CHECKTABLE with no errors found.  I have also both reindexed and drop/created the secondary index.

    An execution plan that involves an index seek on the secondary index returns the wrong count, while one that uses a table scan or an index scan on the secondary index gives the correct count.

    The table had to be updated yesterday so there are no more NULL KeyField values, but as an experiment I copied it and set KeyField = NULL in 1,605,243 rows.  The first SELECT COUNT(*) WHERE IS NULL returned 12,284,824 so the problem is persistent.

    When I look at the execution plan for SELECT COUNT(*) ...IS NULL, if I hover over the arrow coming from the index seek the popup correctly shows an estimated row count of 1,605,243.  But when I executed it, this time the answer was 12,572,104.

    I tried SELECT <non-key field> ... WHERE KeyField IS NULL.  The execution plan used an index scan to build a hash table and used a parallel hash join with the full table to get the results.  The estimated row size is 1,605,243 but it returned 12,500,284 rows.

  • Hehe, sorry. Obviously I forgot part that before I started answering

    I don't see anything else that I might point out though... But since the value of the COUNT with the IS NULL is varying from execution to execution maybe this is a good time to bring in Microsoft PSS.

  • Have you tried your query with (MAXDOP = 1)?

     


    * Noel

  • This is on a 2-CPU (hyperthreaded) server.

    With MAXDOP=1, I get the correct result for SELECT COUNT(*) ... IS NULL (1.6 million).  With MAXDOP=2,3, or 4 I get various numbers in the 10-12 million range.

  • Hey Noeld, can you explain why the dual processor is screwing this operation??

  • Assuming that the Poster is not using SP4 (most of us aren't ) he just hit and old bug in SMP systems:

    http://support.microsoft.com/default.aspx?scid=kb;en-us;814509

     


    * Noel

  • Thanx... aren't I glad I don't have a dual proc .

  • Crap, I just remembered that the new unoperational server is dual proc .

  • Unfortunately for me I have hit that bug times, in the begining it cost me some hair  


    * Noel

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply