Distinct clause in Select statement

Question

Distinct clause in Select statement

Viewing 11 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 1

A good question, highlighting a very common misconception. I've seen so many people think that there is a way to limit DISTINCT to only some of the SELECTed columns - and yet, when I ask them what SQL Server should return for the other columns, they are never able to answer.

webrunner (7/26/2010)
I have one question, though. I've seen people show me queries where they write SELECT COUNT(DISTINCT userid) or something like that. Is that also the same as SELECT DISTINCT COUNT(userid), or is there a difference?

COUNT(*) returns the number of columns (regardless of values); COUNT(userid) counts the number of non-NULL vallues in the userid column; and COUNT(DISTINCT userid) counts the number of distinct non-NULL values in the userid column. This DISTINCT keyword is different from the one at the start of the SELECT list.

SELECT DISTINCT .... means that at the end of the query evaluation, rows that are complete duplicates of another row (in all columns) are removed.

So SELECT DISTINCT COUNT(userid) FROM SomeTable will first count the number of rows where userid is not NULL, yielding a single row as result (with that number as the value in its only column). The DISTINCT will then remove duplicates - which don't exist as the COUNT without GROUP BY guarantees a single row result set.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

TomThomson SSC Guru Points: 104773 More actions · Answer 2

bitbucket-25253 (7/26/2010)
webrunner
Easy enough to find out ...
CREATE TABLE #T(userid INT)
INSERT INTO #T
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7
SELECT COUNT(DISTINCT userid) AS '(1)' FROM #T
SELECT DISTINCT COUNT(userid)AS '(2)' FROM #T
Results:
(1) (2)
7 8

Looks wrong to me. Surely that 8 should be a 1, or the DISTINCT keyword ahould not be present in the second select?

Tom

Toreador SSChampion Points: 11380 More actions · Answer 3

A bit of a strange question.

I'd never come across the user of parantheses with DISTINCT so assumed it must be some sort of extended syntax I didn't know about. So guessed wrongly that it must have different behaviour.

But it turned out that the reason I'd never come across it was that there is no alternative syntax at all!

So an interesting problem to think about, but I can't say I learned anything from it (other than that there are apparently people out there who unnecessarily put brackets with a distinct!)

Luke L SSC-Dedicated Points: 35547 More actions · Answer 4

Tom.Thomson (7/26/2010)
bitbucket-25253 (7/26/2010)
webrunner
Easy enough to find out ...
CREATE TABLE #T(userid INT)
INSERT INTO #T
SELECT 1 UNION ALL
SELECT 1 UNION ALL
SELECT 2 UNION ALL
SELECT 3 UNION ALL
SELECT 4 UNION ALL
SELECT 5 UNION ALL
SELECT 6 UNION ALL
SELECT 7
SELECT COUNT(DISTINCT userid) AS '(1)' FROM #T
SELECT DISTINCT COUNT(userid)AS '(2)' FROM #T
Results:
(1) (2)
7 8
Looks wrong to me. Surely that 8 should be a 1, or the DISTINCT keyword ahould not be present in the second select?

Nope, run the code you get 7 and 8 respectively. The Distinct in the second query is applied after the rows are created. So since there is no group by clause it counts all of the userid's and returns 1 row. The DISTINCT in that instance makes sure there are distinct rows, which there are because there is only one. So Yes the DISTINCT keyword doesn't really do anything in the second select, but that was more or less the point of the example that bitbucket gave, to illustrate the difference between Count(DISTINCT expression ) and SELECT DISTINCT count(expression).

-Luke.

To help us help you read this[/url]For better help with performance problems please read this[/url]

Carlo Romagnano SSC-Insane Points: 22770 More actions · Answer 5

Hugo Kornelis (7/26/2010)COUNT(*) returns the number of columns (regardless of values);

That's false: COUNT(*) returns the number of ROWS regardless of values of columns. :w00t:

TomThomson SSC Guru Points: 104773 More actions · Answer 6

Luke L (7/27/2010)
Nope, run the code you get 7 and 8 respectively. The Distinct in the second query is applied after the rows are created. So since there is no group by clause it counts all of the userid's and returns 1 row. The DISTINCT in that instance makes sure there are distinct rows, which there are because there is only one. So Yes the DISTINCT keyword doesn't really do anything in the second select, but that was more or less the point of the example that bitbucket gave, to illustrate the difference between Count(DISTINCT expression ) and SELECT DISTINCT count(expression).
-Luke.

Just me being stupid. I somehow read it an a totally insane way.

Tom

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 7

Carlo Romagnano (7/27/2010)
Hugo Kornelis (7/26/2010)COUNT(*) returns the number of columns (regardless of values);
That's false: COUNT(*) returns the number of ROWS regardless of of columns. :w00t:

Yes, of course - I typed columns, but intended to write rows. Thanks for catching my mistake.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

m--S3qU3L Hall of Fame Points: 3706 More actions · Answer 8

Explanation: The distinct clause guarantees all output records will be unique, irrespective of how you write the select statement.

CREATE TABLE TABLE2(

Col1 int

,Col2 int)

INSERT INTO table2 VALUES(1,3)

INSERT INTO table2 VALUES(1,4)

SELECT DISTINCT Col1,Col2 FROM table2

I agree the use of parenthesis wont affect the output.

But while using 2 columns with DISTINCT can produce non unique rows (ateleast in one columm...:-))

John

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 9

$partacu$ (8/2/2010)
But while using 2 columns with DISTINCT can produce non unique rows (ateleast in one columm...:-))

As long as there is a difference in at least one column, the rows are not identical.

(Or would you describe a blond-haired, blue-eyed guy, aged 34, weighing 95 kg, and measuring 1.95 m as "identical to" a blond-haired, brown-eyed girl, aged 7, weighing 23 kg and measuring 1.23 m, just because they both have blond hair?)

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Dhruvesh Shah SSCrazy Points: 2303 More actions · Answer 10

Dhruvesh Shah

SSCrazy

Points: 2303

November 4, 2010 at 9:04 pm

#1246981

Simple

zymos SSCommitted Points: 1960 More actions · Answer 11

Nice and easy question that should result to same answer regardless due to the use of the SELECT DISTINCT. Thanks.

Distinct clause in Select statement

Cookies on SQLServerCentral