A good question, highlighting a very common misconception. I've seen so many people think that there is a way to limit DISTINCT to only some of the SELECTed columns - and yet, when I ask them what SQL Server should return for the other columns, they are never able to answer.
I have one question, though. I've seen people show me queries where they write SELECT COUNT(DISTINCT userid) or something like that. Is that also the same as SELECT DISTINCT COUNT(userid), or is there a difference?
COUNT(*) returns the number of columns (regardless of values); COUNT(userid) counts the number of non-NULL vallues in the userid column; and COUNT(DISTINCT userid) counts the number of distinct
non-NULL values in the userid column. This DISTINCT keyword is different from the one at the start of the SELECT list.
SELECT DISTINCT .... means that at the end of the query evaluation, rows that are complete duplicates of another row (in all columns) are removed.
So SELECT DISTINCT COUNT(userid) FROM SomeTable will first count the number of rows where userid is not NULL, yielding a single row as result (with that number as the value in its only column). The DISTINCT will then remove duplicates - which don't exist as the COUNT without GROUP BY guarantees a single row result set.
Hugo Kornelis, SQL Server MVP
Visit my SQL Server blog: http://sqlblog.com/blogs/hugo_kornelis