RE: Improving query performance to detect first duplicate

SSC Guru

Points: 442467

August 21, 2012 at 11:47 pm

aostanley (8/21/2012)
I want to express my sincere gratitude to all of you for the valuable help you have offered me here.
Thanks to your help, I have a solution now that I think is going to work for me. There is something that still puzzles me related to this problem and my solution to it, but I need a little time to boil that down into a reasonably specific question. If I end up posting a new question relating to this problem, I'll add a reference to it on this thread in case anyone of you might be interested.
Thank you all for your contributions.
The DDL that I gave originally doesn't match my actual DDL, it was only meant to represent a subset of the query problem I faced. The solution I have settled on for now as applied to the DDL in my original post would look like this:
-- Sales that overlap on effective dates
select distinct S1.SaleID, S2.SaleID OverlapSaleID
into #SO
from Sale S1
join Sale S2 -- Compare sales
on S1.SaleID <> S2.SaleID
and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap
and S2.SaleStart <= S1.SaleEnd)
intersect
-- Sales that overlap on Location
select distinct SLX1.SaleID, SLX2.SaleID OverlapSaleID
from SaleLocationXref SLX1
join SaleLocationXREF SLX2
on SLX1.SaleID <> SLX2.SaleID
-- and SLX1.LocationID = SLX2.LocationID
where EXISTS(select SLX1.LocationID INTERSECT select SLX2.LocationID)
;with SOIO as (
select distinct #SO.SaleID, #SO.OverlapSaleID
from #SO
join SaleItemXref SIX1
on #SO.SaleID = SIX1.SaleID
join SaleItemXref SIX2
on #SO.OverlapSaleID = SIX2.SaleID
-- and SIX1.ItemID = SIX2.ItemID
where EXISTS(select SIX1.ItemID intersect select SIX2.ItemID)
)
select distinct SaleID, Count(OverlapSaleID) OVERLAP_CNT -- OverlapSaleID
from SOIO
group by SaleID
order by SaleID;
As ChrisM@Work pointed out, it was more useful to know that 1 overlaped with 2 and 2 overlapped with 5 that it was to know that 1, 2 and 5 were all involved in an overlap. So, I change the schema of the result set returned in the solution above from my original post.
A couple of other notes about the solution:
1) I ended up storing intermediate results into a #SO temp table instead of including it as a named result set in the final query, because the query optimizer created a much more performant execution plan with the use of the temp table (I don't understand why).
2) When comparing overlapping LocationID and ItemID, using a join seemed to produce the same query plan and execution times as using the EXISTS(...intersect...) technique in the where clause. I believe that the EXISTS (...intersect...) technique should have been able to deliver better performance than it actually does, so I'm going to go with it and hope to pickup some performance in future updates to SQL Server.

I'm glad you have a solution, but I think you may be missing a different solution.

You should take a close look at the execution plans for each of the solutions provided below (in the attached zip file). You should try downloading SQL Sentry's Plan Explorer and compare the options. You may be surprised at what you find.

The attached file has the code for setting up the tables, updated code that can be run from SQL Sentry's Plan Explorer after setting up the tables.

I have also included the sqlplans from Plan Explorer.

These sqlplans can also be opened and viewed in SSMS.