Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Correlated Subquery - Stream Aggregate


Correlated Subquery - Stream Aggregate

Author
Message
SQLSACT
SQLSACT
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1456 Visits: 2931
Hi All

Some DDL


--Table 1
CREATE TABLE [dbo].[Indexing2](
[Col1] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL,
[Col3] [int] NULL,
[Col4] [int] NULL
)

--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing2] ([Col1])
CREATE NONCLUSTERED INDEX [NCX] ON [dbo].[Indexing2] ([Col4])

--Table2
CREATE TABLE [dbo].[Indexing](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL
)

--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing] ([ID])
CREATE NONCLUSTERED INDEX [NCIX] ON [dbo].[Indexing] ([Col2])



And consider the following selects

select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'

select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'



Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?

Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?


Thanks
anthony.green
anthony.green
SSCertifiable
SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)SSCertifiable (6.1K reputation)

Group: General Forum Members
Points: 6106 Visits: 6078
Sample data would be good as well

Using the following sample data

insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)
insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)

I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.

Also post your execution plans in a SQLPLAN format so we can take a look.



Want an answer fast? Try here
How to post data/code for the best help - Jeff Moden
When a question, really isn't a question - Jeff Smith
Need a string splitter, try this - Jeff Moden
How to post performance problems - Gail Shaw
CrossTabs-Part1 & Part2 - Jeff Moden
SQL Server Backup, Integrity Check, and Index and Statistics Maintenance - Ola Hallengren
Managing Transaction Logs - Gail Shaw
Troubleshooting SQL Server: A Guide for the Accidental DBA - Jonathan Kehayias and Ted Krueger


Dave Ballantyne
Dave Ballantyne
SSCommitted
SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)SSCommitted (1.8K reputation)

Group: General Forum Members
Points: 1820 Visits: 8370
Essentially, the 2 queries are the same.


Fraid not ,

if a value is used multiple time in Indexing for Col2 then you will get "duplicate" rows back


select col2,count(*)
from indexing
group by col2
having count(*) >1

Will tell you which values are used multiple times.

the exists clause only cares that there is at least one matching row.



Clear Sky SQL
My Blog
Eugene Elutin
Eugene Elutin
Hall of Fame
Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)Hall of Fame (3K reputation)

Group: General Forum Members
Points: 3046 Visits: 5478

...
And consider the following selects

select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'

select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'



Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?

Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?

Thanks


Your two query are essentially very different!
Having the same filter in the WHERE clause, still your first query returns Col4 from Indexing2 for every matching record from Indexing, but the second query only returns one row from Indexing2, regardless how many matching records found in Indexing.
If you want query one to behave same way as your query two you need to use DISTINCT in select.

_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
SQLSACT
SQLSACT
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1456 Visits: 2931
anthony.green (12/19/2012)
Sample data would be good as well

Using the following sample data

insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)
insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)

I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.

Also post your execution plans in a SQLPLAN format so we can take a look.


Thanks

I have 30Mil rows in this table
SQLSACT
SQLSACT
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1456 Visits: 2931
anthony.green (12/19/2012)
Sample data would be good as well

Using the following sample data

insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)
insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)

I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.

Also post your execution plans in a SQLPLAN format so we can take a look.


When I say duplicate, I mean that for Col4 there is more that one row with the same value. The inner join query is returning all those rows while the correlated query doesn't.
SQLSACT
SQLSACT
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1456 Visits: 2931
SQLSACT (12/19/2012)
Hi All

Some DDL


--Table 1
CREATE TABLE [dbo].[Indexing2](
[Col1] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL,
[Col3] [int] NULL,
[Col4] [int] NULL
)

--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing2] ([Col1])
CREATE NONCLUSTERED INDEX [NCX] ON [dbo].[Indexing2] ([Col4])

--Table2
CREATE TABLE [dbo].[Indexing](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL
)

--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing] ([ID])
CREATE NONCLUSTERED INDEX [NCIX] ON [dbo].[Indexing] ([Col2])



And consider the following selects

select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'

select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'



Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?

Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?


Thanks


Plan Attached
Attachments
Plan.sqlplan (2 views, 17.00 KB)
ScottPletcher
ScottPletcher
Hall of Fame
Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)Hall of Fame (4K reputation)

Group: General Forum Members
Points: 3953 Visits: 6689
SQLSACT (12/19/2012)

Why does the 1st query return duplicates and the 2nd query doesn't?

... the 2nd query having a stream aggregate. Why is this being added?

Thanks



The first query does a JOIN; joins return all matching rows, whether duplicate matches or not.

In the second query, you used EXISTS() rather than join. EXISTS() just verifies whether or not a row exists. Since SQL has chosen to use a JOIN to generate the final results, SQL uses the stream aggregate to eliminate duplicate join values, thus insuring that only a single match occurs for each join to the other table.

SQL DBA,SQL Server MVP('07, '08, '09)

Prosecutor James Blackburn, in closing argument in the "Fatal Vision" murders trial: "If in the future, you should cry a tear, cry one for them [the murder victims]. If in the future, you should say a prayer, say one for them. And if in the future, you should light a candle, light one for them."
GilaMonster
GilaMonster
SSC-Forever
SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)

Group: General Forum Members
Points: 47396 Visits: 44399
You have two very different queries there (different in meaning), so it shouldn't be much of a surprise that they return different results.
See http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/ (IN and EXISTS use the same mechanism)


Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search