Which will execute first DISTINCT or TOP, if used in one query.

Question

Post reply

Which will execute first DISTINCT or TOP, if used in one query.

varun.dubey29

SSC Veteran

Points: 212
More actions
April 13, 2011 at 2:44 am

#235649

Hi All,
I am using DISTINCT and TOP in same query as :
SELECT DISTINCT TOP 100 ValidationDate FROM ValidatorDetails.
ValidatorDetails table contains 2000 records.
While looking at the execution plan, I found that DISTINCT is executed first but it returned only 100 records to the TOP execution block and then it show that TOP is executed, which returns 100 records to SELECT.
Kindly help, Thanks in advance.
Regards
Varun

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

HowardW One Orange Chip Points: 29109 More actions · Answer 1

What would you like help with? The DISTINCT has to be processed first as that's what you're selecting the top 100 of.

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 2

varun.dubey29 (4/13/2011)
Hi All,
I am using DISTINCT and TOP in same query as :
SELECT DISTINCT TOP 100 ValidationDate FROM ValidatorDetails.
ValidatorDetails table contains 2000 records.
While looking at the execution plan, I found that DISTINCT is executed first but it returned only 100 records to the TOP execution block and then it show that TOP is executed, which returns 100 records to SELECT.
Kindly help, Thanks in advance.
Regards
Varun

Help with what?

There's an alternative if you prefer:

SELECT TOP 100 ValidationDate

FROM ValidatorDetails

GROUP BY ValidationDate

ORDER BY ValidationDate

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

varun.dubey29 SSC Veteran Points: 212 More actions · Answer 3

That is what the expected behavior I am expecting but in the execution plan it show that Distinct block returns fewer records than the one in the table(it is returning the number of records that is mentioned with TOP) to the TOP Block.

I have recieved different record set from this query.

That occured only once, but in production.

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 4

varun.dubey29 (4/13/2011)
That is what the expected behavior I am expecting but in the execution plan it show that Distinct block returns fewer records than the one in the table(it is returning the number of records that is mentioned with TOP) to the TOP Block.
I have recieved different record set from this query.
That occured only once, but in production.

So if your table had 20 million rows (and a suitable index on ValidationDate), you would expect it to return the distinct values from within all 20 million before selecting the top 100 from them?

The execution plan has to show TOP and DISTINCT as two separate operations but that doesn't mean they are performed separately.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 5

ChrisM@home (4/13/2011)
varun.dubey29 (4/13/2011)
That is what the expected behavior I am expecting but in the execution plan it show that Distinct block returns fewer records than the one in the table(it is returning the number of records that is mentioned with TOP) to the TOP Block.
I have recieved different record set from this query.
That occured only once, but in production.
So if your table had 20 million rows (and a suitable index on ValidationDate), you would expect it to return the distinct values from within all 20 million before selecting the top 100 from them?
The execution plan has to show TOP and DISTINCT as two separate operations but that doesn't mean they are performed separately.

Actually, yeah, they will be processed as two steps because they're logically separated.

It really depends on how the query is structured. If it's a straight SELECT TOP 100 DISTINCT..., then the DISTINCT must be processed, then the TOP 100, and yes, because of the DISTINCT operation, you may see less than 100 rows in the result set if there are fewer than 100 distinct values. You can start looking at derived tables where you can select the TOP 100 and then get the DISTINCT data set of that, but again, it may result in fewer than 100 rows.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

varun.dubey29 SSC Veteran Points: 212 More actions · Answer 6

Thanks to all for the reply.

I got the solution to the existing solution.

by Adding the Order by Clause to the existing query.

I am curious to know about if TOP gets executed before DISTINCT when we use them in same query.

If any one know about this or any link that explaains about the execution of the query, please do let me know .

Thanks in Advance. 🙂

HowardW One Orange Chip Points: 29109 More actions · Answer 7

The distinct operation must occur first because that's logically what you're asking from the query.

Reading between the lines of what you've said above, are you're saying that you sometimes receive a different set of 100 rows than at other times?

TOP doesn't discriminate against which 100 rows it has to return if you do not have an ORDER BY...

varun.dubey29 SSC Veteran Points: 212 More actions · Answer 8

Yeah I had recieved different record set then the usual but only once in prodution.

This scenario is not reproducible there after 🙁

HowardW One Orange Chip Points: 29109 More actions · Answer 9

Ok, the important thing to know is that the order of a select statement is never guaranteed unless you have an order by clause. If you want it to return in the order of the clustered index/primary key for example, you always need to specify this in an ORDER BY.

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 10

Grant Fritchey (4/13/2011)
ChrisM@home (4/13/2011)
varun.dubey29 (4/13/2011)
That is what the expected behavior I am expecting but in the execution plan it show that Distinct block returns fewer records than the one in the table(it is returning the number of records that is mentioned with TOP) to the TOP Block.
I have recieved different record set from this query.
That occured only once, but in production.
So if your table had 20 million rows (and a suitable index on ValidationDate), you would expect it to return the distinct values from within all 20 million before selecting the top 100 from them?
The execution plan has to show TOP and DISTINCT as two separate operations but that doesn't mean they are performed separately.
Actually, yeah, they will be processed as two steps because they're logically separated.
It really depends on how the query is structured. If it's a straight SELECT TOP 100 DISTINCT..., then the DISTINCT must be processed, then the TOP 100, and yes, because of the DISTINCT operation, you may see less than 100 rows in the result set if there are fewer than 100 distinct values. You can start looking at derived tables where you can select the TOP 100 and then get the DISTINCT data set of that, but again, it may result in fewer than 100 rows.

<<Kneels, lays head on block>>

I have a sample table which has 100,000 rows, of which 1 column [Sentence] has about 45k distinct values.

If I run

SELECT DISTINCT TOP 100000 Sentence

FROM TransactionDetail

ORDER BY Sentence -- 44,582 / 100,000

- the whole table is scanned and the Stream Aggregate operation outputs 44,582 rows to the TOP operator.

If I run

SELECT DISTINCT TOP 100 Sentence

FROM TransactionDetail

ORDER BY Sentence -- 100 / 1,605

- sufficient rows are output from the index scan operation (there's an index on [Sentence]) to output 100 rows from the Stream Aggregate operation.

To restrict the working set in this manner, surely the Stream Aggregate operation (i.e. the DISTINCT) works in parallel with the TOP operator? Rather than in serial?

Here's the sproc which generates the sample data - credit goes to Jeff Moden and others for the bulk of it:

IF OBJECT_ID('TempDB.dbo.TransactionDetail') IS NOT NULL

DROP TABLE TempDB.dbo.TransactionDetail

--===== Create the test table (TransactionDetail) with a clustered PK

CREATE TABLE TempDB.dbo.TransactionDetail (

TransactionDetailID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,

[Date] DATETIME,

AccountID INT,

Amount MONEY,

SentenceVARCHAR(200),

AccountRunningTotal MONEY, --Running total across each account

AccountRunningCount INT, --Like "Rank" across each account

NCID INT)

INSERT INTO TempDB.dbo.TransactionDetail (Date, AccountID, Amount, Sentence)

SELECT TOP 100000

--10 years worth of dates with times from 1/1/2000 to 12/31/2009

CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) AS Date,

--100 different account numbers

ABS(CHECKSUM(NEWID()))%100+1,

--Dollar amounts from -99.99 to + 99.99

CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY),

--Randomised number of words

iTVF.Sentence

FROM Master.dbo.SysColumns sc1

CROSS JOIN Master.dbo.SysColumns sc2

CROSS APPLY (

SELECT Word AS 'data()'

FROM (

SELECT TOP ((sc1.colorder*sc2.colorder)%9+1) word

FROM (

SELECT word = CAST('the' AS VARCHAR(200)) UNION ALL

SELECT 'quick' UNION ALL

SELECT 'brown' UNION ALL

SELECT 'fox' UNION ALL

SELECT 'jumped' UNION ALL

SELECT 'over' UNION ALL

SELECT 'the' UNION ALL

SELECT 'lazy' UNION ALL

SELECT 'dog'

) Words ORDER BY NEWID()) u2

FOR XML PATH('')

) iTVF(Sentence)

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Grant Fritchey SSC Guru Points: 399340 More actions · Answer 11

ChrisM@home (4/13/2011)
Grant Fritchey (4/13/2011)
ChrisM@home (4/13/2011)
varun.dubey29 (4/13/2011)
That is what the expected behavior I am expecting but in the execution plan it show that Distinct block returns fewer records than the one in the table(it is returning the number of records that is mentioned with TOP) to the TOP Block.
I have recieved different record set from this query.
That occured only once, but in production.
So if your table had 20 million rows (and a suitable index on ValidationDate), you would expect it to return the distinct values from within all 20 million before selecting the top 100 from them?
The execution plan has to show TOP and DISTINCT as two separate operations but that doesn't mean they are performed separately.
Actually, yeah, they will be processed as two steps because they're logically separated.
It really depends on how the query is structured. If it's a straight SELECT TOP 100 DISTINCT..., then the DISTINCT must be processed, then the TOP 100, and yes, because of the DISTINCT operation, you may see less than 100 rows in the result set if there are fewer than 100 distinct values. You can start looking at derived tables where you can select the TOP 100 and then get the DISTINCT data set of that, but again, it may result in fewer than 100 rows.
<<Kneels, lays head on block>>
I have a sample table which has 100,000 rows, of which 1 column [Sentence] has about 45k distinct values.
If I run
SELECT DISTINCT TOP 100000 Sentence
FROM TransactionDetail
ORDER BY Sentence -- 44,582 / 100,000
- the whole table is scanned and the Stream Aggregate operation outputs 44,582 rows to the TOP operator.
If I run
SELECT DISTINCT TOP 100 Sentence
FROM TransactionDetail
ORDER BY Sentence -- 100 / 1,605
- sufficient rows are output from the index scan operation (there's an index on [Sentence]) to output 100 rows from the Stream Aggregate operation.
To restrict the working set in this manner, surely the Stream Aggregate operation (i.e. the DISTINCT) works in parallel with the TOP operator? Rather than in serial?
Here's the sproc which generates the sample data - credit goes to Jeff Moden and others for the bulk of it:
IF OBJECT_ID('TempDB.dbo.TransactionDetail') IS NOT NULL
DROP TABLE TempDB.dbo.TransactionDetail
--===== Create the test table (TransactionDetail) with a clustered PK
CREATE TABLE TempDB.dbo.TransactionDetail (
TransactionDetailID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
[Date] DATETIME,
AccountID INT,
Amount MONEY,
SentenceVARCHAR(200),
AccountRunningTotal MONEY, --Running total across each account
AccountRunningCount INT, --Like "Rank" across each account
NCID INT)
INSERT INTO TempDB.dbo.TransactionDetail (Date, AccountID, Amount, Sentence)
SELECT TOP 100000
--10 years worth of dates with times from 1/1/2000 to 12/31/2009
CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME) AS Date,
--100 different account numbers
ABS(CHECKSUM(NEWID()))%100+1,
--Dollar amounts from -99.99 to + 99.99
CAST(CHECKSUM(NEWID())%10000 /100.0 AS MONEY),
--Randomised number of words
iTVF.Sentence
FROM Master.dbo.SysColumns sc1
CROSS JOIN Master.dbo.SysColumns sc2
CROSS APPLY (
SELECT Word AS 'data()'
FROM (
SELECT TOP ((sc1.colorder*sc2.colorder)%9+1) word
FROM (
SELECT word = CAST('the' AS VARCHAR(200)) UNION ALL
SELECT 'quick' UNION ALL
SELECT 'brown' UNION ALL
SELECT 'fox' UNION ALL
SELECT 'jumped' UNION ALL
SELECT 'over' UNION ALL
SELECT 'the' UNION ALL
SELECT 'lazy' UNION ALL
SELECT 'dog'
) Words ORDER BY NEWID()) u2
FOR XML PATH('')
) iTVF(Sentence)

You can keep your head. I'm not the guy to take anyone's.

You're still seeing two operations, right? If it's showing two operations, at least based on what I've learned, it's performing two operations. I could certainly be wrong about that (wouldn't be the first time in the last 10 minutes).

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 12

Grant Fritchey (4/13/2011)
...
You're still seeing two operations, right? If it's showing two operations, at least based on what I've learned, it's performing two operations...

I'll live with that. Thanks Grant.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]