Distinct clause in Select statement

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    A good question, highlighting a very common misconception. I've seen so many people think that there is a way to limit DISTINCT to only some of the SELECTed columns - and yet, when I ask them what SQL Server should return for the other columns, they are never able to answer.

    webrunner (7/26/2010)


    I have one question, though. I've seen people show me queries where they write SELECT COUNT(DISTINCT userid) or something like that. Is that also the same as SELECT DISTINCT COUNT(userid), or is there a difference?

    COUNT(*) returns the number of columns (regardless of values); COUNT(userid) counts the number of non-NULL vallues in the userid column; and COUNT(DISTINCT userid) counts the number of distinct non-NULL values in the userid column. This DISTINCT keyword is different from the one at the start of the SELECT list.

    SELECT DISTINCT .... means that at the end of the query evaluation, rows that are complete duplicates of another row (in all columns) are removed.

    So SELECT DISTINCT COUNT(userid) FROM SomeTable will first count the number of rows where userid is not NULL, yielding a single row as result (with that number as the value in its only column). The DISTINCT will then remove duplicates - which don't exist as the COUNT without GROUP BY guarantees a single row result set.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • TomThomson

    SSC Guru

    Points: 104773

    bitbucket-25253 (7/26/2010)


    webrunner

    Easy enough to find out ...

    CREATE TABLE #T(userid INT)

    INSERT INTO #T

    SELECT 1 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4 UNION ALL

    SELECT 5 UNION ALL

    SELECT 6 UNION ALL

    SELECT 7

    SELECT COUNT(DISTINCT userid) AS '(1)' FROM #T

    SELECT DISTINCT COUNT(userid)AS '(2)' FROM #T

    Results:

    (1) (2)

    7 8

    Looks wrong to me. Surely that 8 should be a 1, or the DISTINCT keyword ahould not be present in the second select?

    Tom

  • Toreador

    SSChampion

    Points: 11261

    A bit of a strange question.

    I'd never come across the user of parantheses with DISTINCT so assumed it must be some sort of extended syntax I didn't know about. So guessed wrongly that it must have different behaviour.

    But it turned out that the reason I'd never come across it was that there is no alternative syntax at all!

    So an interesting problem to think about, but I can't say I learned anything from it (other than that there are apparently people out there who unnecessarily put brackets with a distinct!)

  • Luke L

    SSC-Dedicated

    Points: 35530

    Tom.Thomson (7/26/2010)


    bitbucket-25253 (7/26/2010)


    webrunner

    Easy enough to find out ...

    CREATE TABLE #T(userid INT)

    INSERT INTO #T

    SELECT 1 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 4 UNION ALL

    SELECT 5 UNION ALL

    SELECT 6 UNION ALL

    SELECT 7

    SELECT COUNT(DISTINCT userid) AS '(1)' FROM #T

    SELECT DISTINCT COUNT(userid)AS '(2)' FROM #T

    Results:

    (1) (2)

    7 8

    Looks wrong to me. Surely that 8 should be a 1, or the DISTINCT keyword ahould not be present in the second select?

    Nope, run the code you get 7 and 8 respectively. The Distinct in the second query is applied after the rows are created. So since there is no group by clause it counts all of the userid's and returns 1 row. The DISTINCT in that instance makes sure there are distinct rows, which there are because there is only one. So Yes the DISTINCT keyword doesn't really do anything in the second select, but that was more or less the point of the example that bitbucket gave, to illustrate the difference between Count(DISTINCT expression ) and SELECT DISTINCT count(expression).

    -Luke.

    To help us help you read this[/url]

    For better help with performance problems please read this[/url]

  • Carlo Romagnano

    SSC-Insane

    Points: 22011

    Hugo Kornelis (7/26/2010)COUNT(*) returns the number of columns (regardless of values);

    That's false: COUNT(*) returns the number of ROWS regardless of values of columns. :w00t:

  • TomThomson

    SSC Guru

    Points: 104773

    Luke L (7/27/2010)


    Nope, run the code you get 7 and 8 respectively. The Distinct in the second query is applied after the rows are created. So since there is no group by clause it counts all of the userid's and returns 1 row. The DISTINCT in that instance makes sure there are distinct rows, which there are because there is only one. So Yes the DISTINCT keyword doesn't really do anything in the second select, but that was more or less the point of the example that bitbucket gave, to illustrate the difference between Count(DISTINCT expression ) and SELECT DISTINCT count(expression).

    -Luke.

    Just me being stupid. I somehow read it an a totally insane way.

    Tom

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    Carlo Romagnano (7/27/2010)


    Hugo Kornelis (7/26/2010)COUNT(*) returns the number of columns (regardless of values);

    That's false: COUNT(*) returns the number of ROWS regardless of of columns. :w00t:

    Yes, of course - I typed columns, but intended to write rows. Thanks for catching my mistake.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • m--S3qU3L

    Hall of Fame

    Points: 3706

    Explanation: The distinct clause guarantees all output records will be unique, irrespective of how you write the select statement.

    CREATE TABLE TABLE2(

    Col1 int

    ,Col2 int)

    INSERT INTO table2 VALUES(1,3)

    INSERT INTO table2 VALUES(1,4)

    SELECT DISTINCT Col1,Col2 FROM table2

    I agree the use of parenthesis wont affect the output.

    But while using 2 columns with DISTINCT can produce non unique rows (ateleast in one columm...:-))

    John

  • Hugo Kornelis

    SSC Guru

    Points: 64685

    $partacu$ (8/2/2010)


    But while using 2 columns with DISTINCT can produce non unique rows (ateleast in one columm...:-))

    As long as there is a difference in at least one column, the rows are not identical.

    (Or would you describe a blond-haired, blue-eyed guy, aged 34, weighing 95 kg, and measuring 1.95 m as "identical to" a blond-haired, brown-eyed girl, aged 7, weighing 23 kg and measuring 1.23 m, just because they both have blond hair?)


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Dhruvesh Shah

    SSCrazy

    Points: 2303

    Simple

  • zymos

    SSCommitted

    Points: 1960

    Nice and easy question that should result to same answer regardless due to the use of the SELECT DISTINCT. Thanks.

Viewing 11 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply