Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Improving query performance to detect first duplicate Expand / Collapse
Author
Message
Posted Monday, August 20, 2012 7:17 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Yesterday @ 5:45 PM
Points: 3,617, Visits: 5,237
SQL Padawan (8/20/2012)
The code below ran pretty fast on my desktop. Give it a try:
SELECT DISTINCT @holder1 = s1.SaleID,@holder2 = LocationID
FROM #Sale s1
INNER JOIN #SaleLocationXref SLX1 ON S1.SaleID = SLX1.SaleID
INNER JOIN #SaleItemXref SIX ON SIX.SaleID = s1.SaleID
WHERE EXISTS(
SELECT 1
FROM #Sale s2
INNER JOIN #SaleLocationXref SLX2 ON s2.SaleID = SLX2.SaleID
INNER JOIN #SaleItemXref SIX2 ON SIX2.SaleID = s2.SaleID
WHERE s1.SaleID <> s2.SaleID
AND s1.SaleStart <= s2.SaleEnd
AND s2.SaleStart <= s1.SaleEnd
AND SLX2.LocationID = SLX1.LocationID
AND SIX.ItemID = SIX2.ItemID)
GROUP BY s1.SaleID,LocationID



+1 SQL Padawan. The force was with you!

---- Original query

SQL Server Execution Times:
CPU time = 3433 ms, elapsed time = 942 ms.
---- Dwain's query

SQL Server Execution Times:
CPU time = 3230 ms, elapsed time = 1789 ms.
---- SQL Padawan

SQL Server Execution Times:
CPU time = 124 ms, elapsed time = 123 ms.




My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
Post #1347533
Posted Tuesday, August 21, 2012 7:39 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, July 23, 2014 3:32 PM
Points: 44, Visits: 1,066
^^^ lol. I need to change the moniker.

Here is a revised version eliminating the GROUP BY and/or DISTINCT clauses. I especially dislike using either of those to return non-aggregated data. Most of the time there is always a better and more efficient way to write the query.
SELECT s1.SaleID,LocationID
FROM #Sale s1
INNER JOIN #SaleLocationXref SLX1 ON S1.SaleID = SLX1.SaleID --Find location
WHERE EXISTS(
SELECT 1
FROM #Sale s2
INNER JOIN #SaleLocationXref SLX2 ON s2.SaleID = SLX2.SaleID --Find location
INNER JOIN #SaleItemXref SIX2 ON SIX2.SaleID = s2.SaleID --Find item
WHERE s1.SaleID <> s2.SaleID --Make sure that we are comparing two different SaleIDs
AND s1.SaleStart <= s2.SaleEnd --Make sure that they fall within date ranges of each other
AND s2.SaleStart <= s1.SaleEnd --^^
AND SLX2.LocationID = SLX1.LocationID --Where s1 and s2 have the same location id
AND EXISTS(
SELECT 1
FROM #SaleItemXref SIX
WHERE SIX.SaleID = s1.SaleID --Find sale item for record s1
AND SIX.ItemID = SIX2.ItemID --Make sure sale item from s1 has a matching record in s2
)
)

Post #1347789
Posted Tuesday, August 21, 2012 9:03 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 8:56 AM
Points: 7,137, Visits: 13,527
Sales 1 and 2 overlap on date range and have a common item & location.
Same with 2 and 5. But 1 and 5 don't overlap: 1, 2 & 5 aren't a set with overlapping date ranges and location in common.
It makes more sense to me to do this as sales pairs, as follows:

SELECT 
s1_SaleID = s1.SaleID,
s2_SaleID = s2.SaleID,
l.*, i.*
FROM Sale s1
INNER JOIN Sale s2
ON s1.SaleID < s2.SaleID
AND S1.SaleStart <= S2.SaleEnd
AND S2.SaleStart <= S1.SaleEnd
CROSS APPLY (
SELECT li.LocationID
FROM SaleLocationXref li
WHERE li.SaleID = s1.SaleID
OR li.SaleID = s2.SaleID
GROUP BY li.LocationID
HAVING COUNT(*) = 2
) l
CROSS APPLY (
SELECT ii.ItemID
FROM SaleItemXref ii
WHERE ii.SaleID = s1.SaleID
OR ii.SaleID = s2.SaleID
GROUP BY ii.ItemID
HAVING COUNT(*) = 2
) i



“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1347849
Posted Tuesday, August 21, 2012 11:54 AM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 5:09 PM
Points: 23,089, Visits: 31,635
Here is another potential solution. Would like to have others give it a test.

Against the small original test set it worked. Unfortunately, I think it may be wrong in other ways. This would probably work better if there were a fixed test suite of values with a validated results set to test against.


with BaseData as (
select
s.SaleID,
s.SaleStart,
s.SaleEnd,
slx.LocationID,
six.ItemID
from
Sale s
inner join SaleLocationXref slx
on (s.SaleID = slx.SaleID)
inner join SaleItemXref six
on (s.SaleID = six.SaleID)
)
select
bd.SaleID,
bd.SaleStart,
bd.SaleEnd,
bd.LocationID,
bd.ItemID
from
BaseData bd
where
exists(select 1
from BaseData bd1
where
bd.SaleID <> bd1.SaleID and
bd.LocationID = bd1.LocationID and
bd.ItemID = bd1.ItemID and
bd.SaleStart <= bd1.SaleEnd and
bd.SaleEnd >= bd1.SaleStart)
order by
bd.ItemID,
bd.LocationID,
bd.SaleID;


Modified to work using the test suite above (I commented 3 lines in the final select statement and added the holder variables) and include a run using MAXDOP(1) as well:


set nocount on;
go
create table #SaleLocationXref (
SaleID int not null,
LocationID int not null,
constraint SaleLocationXref_PK primary key (SaleID,LocationID)
)
create table #Sale (
SaleID int not null,
SaleStart datetime not null,
SaleEnd datetime not null,
constraint Sale_PK primary key (SaleID)
)
create table #SaleItemXref (
SaleID int not null,
ItemID int not null,
constraint SaleItemXref_PK primary key (SaleID,ItemID)
)

insert into #SaleLocationXref values (1,1),(2,1),(4,1),(5,1),
(3,2)

insert into #Sale values (1,'01/01/2012','01/07/2012'),
(2,'01/06/2012','01/10/2012'),
(3,'01/01/2012','02/07/2012'),
(4,'02/08/2012','02/15/2012'),
(5,'01/09/2012','01/16/2012')

insert into #SaleItemXref values (1,1),(1,2),(1,3),
(2,1),(2,4),(2,5),
(3,5),(3,6),(3,7),
(4,5),(4,6),(4,7),
(5,7),(5,8),(5,1)

;WITH Tally (n) AS (
SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)
INSERT INTO #SaleLocationXref
SELECT DISTINCT SaleID * n, LocationID * n
FROM #SaleLocationXref
CROSS APPLY Tally

;WITH Tally (n) AS (
SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)
INSERT INTO #Sale
SELECT SaleID, SaleStart, SaleEnd
FROM (
SELECT SaleID=SaleID * n, SaleStart, SaleEnd
,n=ROW_NUMBER() OVER (PARTITION BY SaleID * n ORDER BY (SELECT NULL))
FROM #Sale
CROSS APPLY Tally) a
WHERE n = 1

;WITH Tally (n) AS (
SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)
INSERT INTO #SaleItemXref
SELECT DISTINCT SaleID * n, ItemID * n
FROM #SaleItemXref
CROSS APPLY Tally

DECLARE @Holder1 INT, @Holder2 INT

PRINT '';
PRINT '---- Revised query'
SET STATISTICS TIME ON
;WITH Sale AS (
SELECT a.SaleID, a.SaleStart, a.SaleEnd, b.LocationID, c.ItemID
FROM #Sale a
JOIN #SaleLocationXref b ON a.SaleID = b.SaleID
JOIN #SaleItemXref c ON c.SaleID = a.SaleID
)
SELECT @Holder1=SaleID, @Holder2=LocationID
FROM (
SELECT a.SaleID, a.SaleStart, a.SaleEnd, a.LocationID, a.ItemID
, SaleID2=b.SaleID, LocationID2=b.LocationID, ItemID2=b.ItemID
,n=ROW_NUMBER() OVER (PARTITION BY a.SaleID, a.LocationID ORDER BY (SELECT NULL))
FROM Sale a
CROSS APPLY (
SELECT SaleID, LocationID, ItemID
FROM Sale b
WHERE (b.SaleStart BETWEEN a.SaleStart AND b.SaleEnd OR
b.SaleEnd BETWEEN a.SaleStart AND b.SaleEnd) AND a.SaleID <> b.SaleID) b
WHERE a.LocationID = b.LocationID AND a.ItemID = b.ItemID) c
WHERE n=1
SET STATISTICS TIME OFF

PRINT '';
PRINT '---- Original query'
SET STATISTICS TIME ON
select distinct @Holder1=SaleID, @Holder2=LocationID
from (
select SIX.SaleID, LocationID, ItemID,
Count(*) over (partition by LocationID, ItemID) as Cnt
from #SaleItemXref SIX
join (
select distinct S1.SaleID, SLX1.LocationID
from #Sale S1
join #Sale S2 -- Compare sales
on S1.SaleID <> S2.SaleID
and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap
and S2.SaleStart <= S1.SaleEnd)
join #SaleLocationXref SLX1
on S1.SaleID = SLX1.SaleID
join #SaleLocationXREF SLX2
on S2.SaleID = SLX2.SaleID
and SLX1.LocationID = SLX2.LocationID -- Compare location overlap
) #SaleOverlap
on #SaleOverlap.SaleID = SIX.SaleID
) ItemOverlap
where ItemOverlap.Cnt > 1 -- Compare item overlap
SET STATISTICS TIME OFF

PRINT '';
PRINT '---- Original query with MAXDOP 1'
SET STATISTICS TIME ON
select distinct @Holder1=SaleID, @Holder2=LocationID
from (
select SIX.SaleID, LocationID, ItemID,
Count(*) over (partition by LocationID, ItemID) as Cnt
from #SaleItemXref SIX
join (
select distinct S1.SaleID, SLX1.LocationID
from #Sale S1
join #Sale S2 -- Compare sales
on S1.SaleID <> S2.SaleID
and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap
and S2.SaleStart <= S1.SaleEnd)
join #SaleLocationXref SLX1
on S1.SaleID = SLX1.SaleID
join #SaleLocationXREF SLX2
on S2.SaleID = SLX2.SaleID
and SLX1.LocationID = SLX2.LocationID -- Compare location overlap
) #SaleOverlap
on #SaleOverlap.SaleID = SIX.SaleID
) ItemOverlap
where ItemOverlap.Cnt > 1 -- Compare item overlap
OPTION (MAXDOP 1)
SET STATISTICS TIME OFF

PRINT '';
PRINT '---- Lynn''s Query';
SET STATISTICS TIME ON;
with BaseData as (
select
s.SaleID,
s.SaleStart,
s.SaleEnd,
slx.LocationID,
six.ItemID
from
#Sale s
inner join #SaleLocationXref slx
on (s.SaleID = slx.SaleID)
inner join #SaleItemXref six
on (s.SaleID = six.SaleID)
)
select
--bd.SaleStart,
--bd.SaleEnd,
--bd.ItemID,
@Holder1 = bd.SaleID,
@Holder2 = bd.LocationID
from
BaseData bd
where
exists(select 1
from BaseData bd1
where
bd.SaleID <> bd1.SaleID and
bd.LocationID = bd1.LocationID and
bd.ItemID = bd1.ItemID and
bd.SaleStart <= bd1.SaleEnd and
bd.SaleEnd >= bd1.SaleStart)
order by
bd.ItemID,
bd.LocationID,
bd.SaleID
;
SET STATISTICS TIME OFF



PRINT '';
PRINT '---- Lynn''s Query with MAXDOP(1)';
SET STATISTICS TIME ON;
with BaseData as (
select
s.SaleID,
s.SaleStart,
s.SaleEnd,
slx.LocationID,
six.ItemID
from
#Sale s
inner join #SaleLocationXref slx
on (s.SaleID = slx.SaleID)
inner join #SaleItemXref six
on (s.SaleID = six.SaleID)
)
select
--bd.SaleStart,
--bd.SaleEnd,
--bd.ItemID,
@Holder1 = bd.SaleID,
@Holder2 = bd.LocationID
from
BaseData bd
where
exists(select 1
from BaseData bd1
where
bd.SaleID <> bd1.SaleID and
bd.LocationID = bd1.LocationID and
bd.ItemID = bd1.ItemID and
bd.SaleStart <= bd1.SaleEnd and
bd.SaleEnd >= bd1.SaleStart)
order by
bd.ItemID,
bd.LocationID,
bd.SaleID
OPTION (MAXDOP 1);

SET STATISTICS TIME OFF

go


drop table #SaleLocationXref
drop table #Sale
drop table #SaleItemXref




And the results when run on a test VM:


---- Revised query

SQL Server Execution Times:
CPU time = 2768 ms, elapsed time = 1646 ms.

---- Original query

SQL Server Execution Times:
CPU time = 2609 ms, elapsed time = 1372 ms.

---- Original query with MAXDOP 1

SQL Server Execution Times:
CPU time = 2312 ms, elapsed time = 2337 ms.

---- Lynn's Query

SQL Server Execution Times:
CPU time = 1141 ms, elapsed time = 641 ms.

---- Lynn's Query with MAXDOP(1)

SQL Server Execution Times:
CPU time = 969 ms, elapsed time = 969 ms.




Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1348002
Posted Tuesday, August 21, 2012 5:20 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, August 27, 2013 10:00 AM
Points: 12, Visits: 71
I want to express my sincere gratitude to all of you for the valuable help you have offered me here.

Thanks to your help, I have a solution now that I think is going to work for me. There is something that still puzzles me related to this problem and my solution to it, but I need a little time to boil that down into a reasonably specific question. If I end up posting a new question relating to this problem, I'll add a reference to it on this thread in case anyone of you might be interested.

Thank you all for your contributions.

The DDL that I gave originally doesn't match my actual DDL, it was only meant to represent a subset of the query problem I faced. The solution I have settled on for now as applied to the DDL in my original post would look like this:

-- Sales that overlap on effective dates
select distinct S1.SaleID, S2.SaleID OverlapSaleID
into #SO
from Sale S1
join Sale S2 -- Compare sales
on S1.SaleID <> S2.SaleID
and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap
and S2.SaleStart <= S1.SaleEnd)
intersect
-- Sales that overlap on Location
select distinct SLX1.SaleID, SLX2.SaleID OverlapSaleID
from SaleLocationXref SLX1
join SaleLocationXREF SLX2
on SLX1.SaleID <> SLX2.SaleID
-- and SLX1.LocationID = SLX2.LocationID
where EXISTS(select SLX1.LocationID INTERSECT select SLX2.LocationID)
;with SOIO as (
select distinct #SO.SaleID, #SO.OverlapSaleID
from #SO
join SaleItemXref SIX1
on #SO.SaleID = SIX1.SaleID
join SaleItemXref SIX2
on #SO.OverlapSaleID = SIX2.SaleID
-- and SIX1.ItemID = SIX2.ItemID
where EXISTS(select SIX1.ItemID intersect select SIX2.ItemID)
)
select distinct SaleID, Count(OverlapSaleID) OVERLAP_CNT -- OverlapSaleID
from SOIO
group by SaleID
order by SaleID;

As ChrisM@Work pointed out, it was more useful to know that 1 overlaped with 2 and 2 overlapped with 5 that it was to know that 1, 2 and 5 were all involved in an overlap. So, I change the schema of the result set returned in the solution above from my original post.

A couple of other notes about the solution:

1) I ended up storing intermediate results into a #SO temp table instead of including it as a named result set in the final query, because the query optimizer created a much more performant execution plan with the use of the temp table (I don't understand why).

2) When comparing overlapping LocationID and ItemID, using a join seemed to produce the same query plan and execution times as using the EXISTS(...intersect...) technique in the where clause. I believe that the EXISTS (...intersect...) technique should have been able to deliver better performance than it actually does, so I'm going to go with it and hope to pickup some performance in future updates to SQL Server.
Post #1348137
Posted Tuesday, August 21, 2012 11:47 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Today @ 5:09 PM
Points: 23,089, Visits: 31,635
aostanley (8/21/2012)
I want to express my sincere gratitude to all of you for the valuable help you have offered me here.

Thanks to your help, I have a solution now that I think is going to work for me. There is something that still puzzles me related to this problem and my solution to it, but I need a little time to boil that down into a reasonably specific question. If I end up posting a new question relating to this problem, I'll add a reference to it on this thread in case anyone of you might be interested.

Thank you all for your contributions.

The DDL that I gave originally doesn't match my actual DDL, it was only meant to represent a subset of the query problem I faced. The solution I have settled on for now as applied to the DDL in my original post would look like this:

-- Sales that overlap on effective dates
select distinct S1.SaleID, S2.SaleID OverlapSaleID
into #SO
from Sale S1
join Sale S2 -- Compare sales
on S1.SaleID <> S2.SaleID
and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap
and S2.SaleStart <= S1.SaleEnd)
intersect
-- Sales that overlap on Location
select distinct SLX1.SaleID, SLX2.SaleID OverlapSaleID
from SaleLocationXref SLX1
join SaleLocationXREF SLX2
on SLX1.SaleID <> SLX2.SaleID
-- and SLX1.LocationID = SLX2.LocationID
where EXISTS(select SLX1.LocationID INTERSECT select SLX2.LocationID)
;with SOIO as (
select distinct #SO.SaleID, #SO.OverlapSaleID
from #SO
join SaleItemXref SIX1
on #SO.SaleID = SIX1.SaleID
join SaleItemXref SIX2
on #SO.OverlapSaleID = SIX2.SaleID
-- and SIX1.ItemID = SIX2.ItemID
where EXISTS(select SIX1.ItemID intersect select SIX2.ItemID)
)
select distinct SaleID, Count(OverlapSaleID) OVERLAP_CNT -- OverlapSaleID
from SOIO
group by SaleID
order by SaleID;

As ChrisM@Work pointed out, it was more useful to know that 1 overlaped with 2 and 2 overlapped with 5 that it was to know that 1, 2 and 5 were all involved in an overlap. So, I change the schema of the result set returned in the solution above from my original post.

A couple of other notes about the solution:

1) I ended up storing intermediate results into a #SO temp table instead of including it as a named result set in the final query, because the query optimizer created a much more performant execution plan with the use of the temp table (I don't understand why).

2) When comparing overlapping LocationID and ItemID, using a join seemed to produce the same query plan and execution times as using the EXISTS(...intersect...) technique in the where clause. I believe that the EXISTS (...intersect...) technique should have been able to deliver better performance than it actually does, so I'm going to go with it and hope to pickup some performance in future updates to SQL Server.



I'm glad you have a solution, but I think you may be missing a different solution.

You should take a close look at the execution plans for each of the solutions provided below (in the attached zip file). You should try downloading SQL Sentry's Plan Explorer and compare the options. You may be surprised at what you find.

The attached file has the code for setting up the tables, updated code that can be run from SQL Sentry's Plan Explorer after setting up the tables.

I have also included the sqlplans from Plan Explorer.

These sqlplans can also be opened and viewed in SSMS.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)


  Post Attachments 
SQLCode.zip (3 views, 17.95 KB)
Post #1348190
Posted Tuesday, September 18, 2012 3:08 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Wednesday, July 23, 2014 3:35 PM
Points: 2,393, Visits: 3,399
Removed: Didn't test properly.


N 56°04'39.16"
E 12°55'05.25"
Post #1360633
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse