Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Correlated Subquery - Stream Aggregate Expand / Collapse
Author
Message
Posted Wednesday, December 19, 2012 7:58 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, November 20, 2014 2:08 AM
Points: 1,380, Visits: 2,703
Hi All

Some DDL

--Table 1
CREATE TABLE [dbo].[Indexing2](
[Col1] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL,
[Col3] [int] NULL,
[Col4] [int] NULL
)

--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing2] ([Col1])
CREATE NONCLUSTERED INDEX [NCX] ON [dbo].[Indexing2] ([Col4])

--Table2
CREATE TABLE [dbo].[Indexing](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL
)

--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing] ([ID])
CREATE NONCLUSTERED INDEX [NCIX] ON [dbo].[Indexing] ([Col2])

And consider the following selects
select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'

select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'

Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?

Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?


Thanks
Post #1398426
Posted Wednesday, December 19, 2012 8:04 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Saturday, November 1, 2014 6:54 AM
Points: 5,221, Visits: 5,118
Sample data would be good as well

Using the following sample data

insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)
insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)

I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.

Also post your execution plans in a SQLPLAN format so we can take a look.




Want an answer fast? Try here
How to post data/code for the best help - Jeff Moden
Need a string splitter, try this - Jeff Moden
How to post performance problems - Gail Shaw
CrossTabs-Part1 & Part2 - Jeff Moden
SQL Server Backup, Integrity Check, and Index and Statistics Maintenance - Ola Hallengren
Managing Transaction Logs - Gail Shaw
Troubleshooting SQL Server: A Guide for the Accidental DBA - Jonathan Kehayias and Ted Krueger

Post #1398429
Posted Wednesday, December 19, 2012 8:05 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, September 9, 2014 2:06 AM
Points: 1,768, Visits: 8,318
Essentially, the 2 queries are the same.


Fraid not ,

if a value is used multiple time in Indexing for Col2 then you will get "duplicate" rows back


select col2,count(*)
from indexing
group by col2
having count(*) >1

Will tell you which values are used multiple times.

the exists clause only cares that there is at least one matching row.





Clear Sky SQL
My Blog
Kent user group
Post #1398431
Posted Wednesday, December 19, 2012 8:15 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 8:23 AM
Points: 2,873, Visits: 5,189

...
And consider the following selects
select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'

select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'

Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?

Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?

Thanks


Your two query are essentially very different!
Having the same filter in the WHERE clause, still your first query returns Col4 from Indexing2 for every matching record from Indexing, but the second query only returns one row from Indexing2, regardless how many matching records found in Indexing.
If you want query one to behave same way as your query two you need to use DISTINCT in select.




_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1398439
Posted Thursday, December 20, 2012 1:44 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, November 20, 2014 2:08 AM
Points: 1,380, Visits: 2,703
anthony.green (12/19/2012)
Sample data would be good as well

Using the following sample data

insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)
insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)

I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.

Also post your execution plans in a SQLPLAN format so we can take a look.


Thanks

I have 30Mil rows in this table
Post #1398855
Posted Thursday, December 20, 2012 1:47 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, November 20, 2014 2:08 AM
Points: 1,380, Visits: 2,703
anthony.green (12/19/2012)
Sample data would be good as well

Using the following sample data

insert into Indexing values (199),(200),(210),(220),(230),(240),(250),(251)
insert into Indexing2 (Col4) values (199),(200),(210),(220),(230),(240),(250),(251)

I get no duplicates and now stream aggregate, the only difference is a left semi join on the exists vs a inner join.

Also post your execution plans in a SQLPLAN format so we can take a look.


When I say duplicate, I mean that for Col4 there is more that one row with the same value. The inner join query is returning all those rows while the correlated query doesn't.
Post #1398856
Posted Thursday, December 20, 2012 1:50 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, November 20, 2014 2:08 AM
Points: 1,380, Visits: 2,703
SQLSACT (12/19/2012)
Hi All

Some DDL

--Table 1
CREATE TABLE [dbo].[Indexing2](
[Col1] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL,
[Col3] [int] NULL,
[Col4] [int] NULL
)

--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing2] ([Col1])
CREATE NONCLUSTERED INDEX [NCX] ON [dbo].[Indexing2] ([Col4])

--Table2
CREATE TABLE [dbo].[Indexing](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Col2] [int] NULL
)

--Indexes
CREATE UNIQUE CLUSTERED INDEX [CX] ON [dbo].[Indexing] ([ID])
CREATE NONCLUSTERED INDEX [NCIX] ON [dbo].[Indexing] ([Col2])

And consider the following selects
select Col4 from Indexing2
inner join Indexing
on Indexing2.Col4 = Indexing.Col2
where Col4 between '200' and '250'

select Col4 from Indexing2
where exists (select * from Indexing where Indexing.Col2 = Indexing2.Col4)
and Col4 between '200' and '250'

Essentially, the 2 queries are the same. Why does the 1st query return duplicates and the 2nd query doesn't?

Also, the the execution plans are identical besides the 2nd query having a stream aggregate. Why is this being added?


Thanks


Plan Attached


  Post Attachments 
Plan.sqlplan (1 view, 17.49 KB)
Post #1398859
Posted Friday, December 21, 2012 2:52 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 2:49 PM
Points: 2,266, Visits: 3,421
SQLSACT (12/19/2012)

Why does the 1st query return duplicates and the 2nd query doesn't?

... the 2nd query having a stream aggregate. Why is this being added?

Thanks



The first query does a JOIN; joins return all matching rows, whether duplicate matches or not.

In the second query, you used EXISTS() rather than join. EXISTS() just verifies whether or not a row exists. Since SQL has chosen to use a JOIN to generate the final results, SQL uses the stream aggregate to eliminate duplicate join values, thus insuring that only a single match occurs for each join to the other table.


SQL DBA,SQL Server MVP('07, '08, '09)

Carl Sagan said: "There is no such thing as a dumb question." Sagan obviously never watched a congressional hearing!
Post #1399599
Posted Friday, December 21, 2012 3:14 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 2:01 PM
Points: 40,404, Visits: 36,849
You have two very different queries there (different in meaning), so it shouldn't be much of a surprise that they return different results.
See http://sqlinthewild.co.za/index.php/2010/01/12/in-vs-inner-join/ (IN and EXISTS use the same mechanism)



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1399606
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse