T-SQL counts

  • TomThomson

    SSC Guru

    Points: 104773

    Jeff Moden (10/1/2013)


    Actually, this is going to cause a mess for me when we upgrade from 2005 because NULL is considered to be a DISTINCT value. For 2005, the correct answer is "4". For 2008 and above, the correct answer is 3. The 2008+ version works as I would expect because just a simple COUNT(ID) would ignore the NULL. In this particular case, that would still return a 4 because even though the "2" is duplicated, there are 4 non-null values.

    The reason why it's going to make a mess for me is because a lot of the developers used it fully expecting NULL to be considered a DISTINCT value. This is one of those changes in how SQL Server works that could produce some serious undetected problems if you built your code in 2005 and you upgrade later.

    Your developers must be doing smething pretty queer. BOL for 2005 says COUNT(DISTINCT x) returns a count of distinct non-null values. SQL 2000 always returned a count of distinct non-null values for me, and I used it throuhghout its fully supported life and a bit longer, it was the same whether you had enterprise, standard, or MSDE versions and teh same whichever service pack you had. Maybe they are setting some queer options that break it (like ansi nulls off or something? although it would be surprising if that change the behaviour of COUNT). And vv_kirov's test shows that BOL is right for at least for one version of 2005, and Jens-Peter hasthe same with a later 2005 build.

    Tom

  • PChiragS

    SSCarpal Tunnel

    Points: 4965

    easy one..

    Thanks Smith..

  • Jeff Moden

    SSC Guru

    Points: 996645

    I struck out my previous comments. I don't know what happened last night but I can't seem to duplicate the problem today. I hate it when this happens.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.
    "Change is inevitable... change for the better is not".
    "If "pre-optimization" is the root of all evil, then what does the resulting no optimization lead to?"

    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • SQLRNNR

    SSC Guru

    Points: 281243

    Jeff Moden (10/2/2013)


    I struck out my previous comments. I don't know what happened last night but I can't seem to duplicate the problem today. I hate it when this happens.

    I think we have all done that somewhere along the way 😉

    Jason...AKA CirqueDeSQLeil
    _______________________________________________
    I have given a name to my pain...MCM SQL Server, MVP
    SQL RNNR
    Posting Performance Based Questions - Gail Shaw[/url]
    Learn Extended Events

  • Dana Medley

    SSCertifiable

    Points: 6764

    Great question. Almost overlooked the distinct in the count().



    Everything is awesome!

  • Revenant

    SSC-Forever

    Points: 42467

    Nice and easy - thanks, Samith!

  • Mr. Kapsicum

    SSCertifiable

    Points: 6128

    didn't need to think twice,,,

    easy one. 🙂

  • marlon.seton

    SSCrazy

    Points: 2623

    Hany Helmy (10/2/2013)


    Easy.

    Ditto.

  • Jamsheer

    Ten Centuries

    Points: 1136

    Nice and simple question. Lot to think.. thanks Samith..

  • Jamsheer

    Ten Centuries

    Points: 1136

    tom.w.brannon (10/2/2013)


    I believe the problem is a difference in how distinct gets used in different contexts. For the following queries

    select count(distinct id) from #temp_test;

    select distinct id from #temp_test;

    select count(*) from (select distinct id from #temp_test) a;

    the results are 3, 4 rows returned, and 4. So NULL is a distinct value but not counted as a distinct value. Maybe somebody else can answer why this makes sense.

    Good work Tom.. If you use 'select count(id)' in the last select query instead of 'select count(*)' you will get the result as 3. If we don't use the field name in count(distinct) clause, it will not eliminate NULL.

  • Tee Time

    Hall of Fame

    Points: 3693

    Thanks!

  • sqlnaive

    SSCoach

    Points: 17435

    Jamsheer (10/3/2013)


    tom.w.brannon (10/2/2013)


    I believe the problem is a difference in how distinct gets used in different contexts. For the following queries

    select count(distinct id) from #temp_test;

    select distinct id from #temp_test;

    select count(*) from (select distinct id from #temp_test) a;

    the results are 3, 4 rows returned, and 4. So NULL is a distinct value but not counted as a distinct value. Maybe somebody else can answer why this makes sense.

    Good work Tom.. If you use 'select count(id)' in the last select query instead of 'select count(*)' you will get the result as 3. If we don't use the field name in count(distinct) clause, it will not eliminate NULL.

    Tom, a value of NULL indicates that the value does not exists and is unknown. For the DISTINCT keyword, null values are considered to be duplicates of each other. When DISTINCT is included in a SELECT statement, only one NULL is returned in the results, regardless of how many null values are encountered.Check the following link:

    http://technet.microsoft.com/en-us/library/ms187831(v=sql.105).aspx

    In case of count(DISTINCT), it returns the number of unique non null values. See the following link for count(DISTINCT):

    http://technet.microsoft.com/en-us/library/ms175997.aspx

  • sqlnaive

    SSCoach

    Points: 17435

    In between, simple and good QOTD. 🙂

  • T.Ashish

    SSCarpal Tunnel

    Points: 4246

    easy

  • jfgoude

    SSCrazy

    Points: 2586

    easy one

    Distinct uses null values normally

    count() eliminates them as well as all the other arrgegates functions

Viewing 15 posts - 16 through 30 (of 30 total)

You must be logged in to reply to this topic. Login to reply