Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Improving query performance to detect first duplicate Expand / Collapse
Author
Message
Posted Friday, August 17, 2012 11:25 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, August 27, 2013 10:00 AM
Points: 12, Visits: 71
I need to query some tables looking for All Sales that are active at the same time and have at least one Location in common and at least one Item in common. I have a query that does this, but it actually counts all the common Items on Sale at each Location and this is taking too much time in the large tables that I have to query. I feel like their should be a way to restructure my query approach so that only the first common item would need to be found rather than all common items.

I'm looking for ways to improve the performance of my query approach.

Here is a simple representation of my data and existing query:


drop table SaleLocationXref
drop table Sale
drop table SaleItemXref

create table SaleLocationXref (
SaleID int not null,
LocationID int not null,
constraint SaleLocationXref_PK primary key (SaleID,LocationID)
)
create table Sale (
SaleID int not null,
SaleStart datetime not null,
SaleEnd datetime not null,
constraint Sale_PK primary key (SaleID)
)
create table SaleItemXref (
SaleID int not null,
ItemID int not null,
constraint SaleItemXref_PK primary key (SaleID,ItemID)
)

insert into SaleLocationXref values (1,1),(2,1),(4,1),(5,1),
(3,2)

insert into Sale values (1,'01/01/2012','01/07/2012'),
(2,'01/06/2012','01/10/2012'),
(3,'01/01/2012','02/07/2012'),
(4,'02/08/2012','02/15/2012'),
(5,'01/09/2012','01/16/2012')

insert into SaleItemXref values (1,1),(1,2),(1,3),
(2,1),(2,4),(2,5),
(3,5),(3,6),(3,7),
(4,5),(4,6),(4,7),
(5,7),(5,8),(5,1)

select distinct SaleID, LocationID
from (
select SIX.SaleID, LocationID, ItemID,
Count(*) over (partition by LocationID, ItemID) as Cnt
from SaleItemXref SIX
join (
select distinct S1.SaleID, SLX1.LocationID
from Sale S1
join Sale S2 -- Compare sales
on S1.SaleID <> S2.SaleID
and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap
and S2.SaleStart <= S1.SaleEnd)
join SaleLocationXref SLX1
on S1.SaleID = SLX1.SaleID
join SaleLocationXREF SLX2
on S2.SaleID = SLX2.SaleID
and SLX1.LocationID = SLX2.LocationID -- Compare location overlap
) #SaleOverlap
on #SaleOverlap.SaleID = SIX.SaleID
) ItemOverlap
where ItemOverlap.Cnt > 1 -- Compare item overlap
Post #1346684
Posted Friday, August 17, 2012 12:31 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 11:51 PM
Points: 21,773, Visits: 34,451
If there are 3 SalesID's with 1 item in common but the overlap is for only 2 of the SalesID's, does that need to be reported or does the over lap have to occur for all SlaesID with that item?

In your sample data look at SaleID 2,3, and 4 and ItemID 5. The SaleID 2 and 3 overlap each other but 4 doesn't.



Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1346708
Posted Friday, August 17, 2012 2:49 PM


SSC-Insane

SSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-InsaneSSC-Insane

Group: General Forum Members
Last Login: Yesterday @ 11:51 PM
Points: 21,773, Visits: 34,451
Using what you provided, I actually came up with the following.


create table SaleLocationXref (
SaleID int not null,
LocationID int not null,
constraint SaleLocationXref_PK primary key (SaleID,LocationID)
);
create table Sale (
SaleID int not null,
SaleStart datetime not null,
SaleEnd datetime not null,
constraint Sale_PK primary key (SaleID)
);
create table SaleItemXref (
SaleID int not null,
ItemID int not null,
constraint SaleItemXref_PK primary key (SaleID,ItemID)
);

insert into SaleLocationXref
values (1,1),(2,1),(4,1),(5,1),(3,2);

insert into Sale
values (1,'01/01/2012','01/07/2012'),
(2,'01/06/2012','01/10/2012'),
(3,'01/01/2012','02/07/2012'),
(4,'02/08/2012','02/15/2012'),
(5,'01/09/2012','01/16/2012');

insert into SaleItemXref
values
(1,1),(1,2),(1,3),
(2,1),(2,4),(2,5),
(3,5),(3,6),(3,7),
(4,5),(4,6),(4,7),
(5,7),(5,8),(5,1);
GO

with BaseData as (
select
six.ItemID,
six.SaleID,
s.SaleStart,
s.SaleEnd,
count(*) over (partition by six.ItemID) as ItemCnt
from
Sale s inner join SaleItemXref six
on (s.SaleID = six.SaleID)
)
select distinct
bd1.ItemID,
bd1.SaleID,
slx.LocationID,
bd1.SaleStart,
bd1.SaleEnd
from
BaseData bd1
inner join BaseData bd2
on (bd1.ItemID = bd2.ItemID
and bd1.SaleID <> bd2.SaleID)
inner join SaleLocationXref slx
on (slx.SaleID = bd1.SaleID)
where
bd1.ItemCnt > 1
and (bd1.SaleStart between bd2.SaleStart and bd2.SaleEnd or
bd2.SaleStart between bd1.SaleStart and bd1.SaleEnd)
order by
bd1.ItemID,
bd1.SaleID;
GO

drop table SaleLocationXref;
drop table Sale;
drop table SaleItemXref;
GO





Lynn Pettis

For better assistance in answering your questions, click here
For tips to get better help with Performance Problems, click here
For Running Totals and its variations, click here or when working with partitioned tables
For more about Tally Tables, click here
For more about Cross Tabs and Pivots, click here and here
Managing Transaction Logs

SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
Post #1346758
Posted Friday, August 17, 2012 4:46 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, August 27, 2013 10:00 AM
Points: 12, Visits: 71
Celko,

Thank you for your reply. I'm familiar with your work and am honored that you took the time to reply to this post. If you have a way to utilize EXISTS() (or TOP) to solve the query problem, then I hope you will share it. I would be very interested in that type of solution.

Lynn,

Thank you for your reply. It made me realize that my description of the result I was looking for was not accurate. I should have said:
I need a query to find "All Sales that are active at the same time and have at least one Location in common and at least one Item in common."

Forgive me for that. I'll try to edit my original post.
Post #1346784
Posted Sunday, August 19, 2012 12:11 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, August 27, 2013 10:00 AM
Points: 12, Visits: 71
Celko,

I really appreciate the idea. I tried it:

select ...
where EXISTS (select f1.item_id INTERSECT select f2.item_id)

It sounded like it could potentially offer much better performance, but it appeared to perform exactly the same as an inner join

select ...
join ...
on f1.item_id = f2.item_id

I'm still working to try and formulate some solution with accepatable performacne, so if you or anyone else might have another idea, I would really appreciate hearing about it.
Post #1346944
Posted Sunday, August 19, 2012 7:30 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 9:50 PM
Points: 3,963, Visits: 6,292
I took a shot at it with this:

;WITH Sale AS (
SELECT a.SaleID, a.SaleStart, a.SaleEnd, b.LocationID, c.ItemID
FROM #Sale a
JOIN #SaleLocationXref b ON a.SaleID = b.SaleID
JOIN #SaleItemXref c ON c.SaleID = a.SaleID
)
SELECT SaleID, LocationID
FROM (
SELECT a.SaleID, a.SaleStart, a.SaleEnd, a.LocationID, a.ItemID
, SaleID2=b.SaleID, LocationID2=b.LocationID, ItemID2=b.ItemID
,n=ROW_NUMBER() OVER (PARTITION BY a.SaleID, a.LocationID ORDER BY (SELECT NULL))
FROM Sale a
CROSS APPLY (
SELECT SaleID, LocationID, ItemID
FROM Sale b
WHERE (b.SaleStart BETWEEN a.SaleStart AND b.SaleEnd OR
b.SaleEnd BETWEEN a.SaleStart AND b.SaleEnd) AND a.SaleID <> b.SaleID) b
WHERE a.LocationID = b.LocationID AND a.ItemID = b.ItemID) c
WHERE n=1


Unfortunately mine is significantly slower than yours though. You can test with this test harness:

create table #SaleLocationXref (
SaleID int not null,
LocationID int not null,
constraint SaleLocationXref_PK primary key (SaleID,LocationID)
)
create table #Sale (
SaleID int not null,
SaleStart datetime not null,
SaleEnd datetime not null,
constraint Sale_PK primary key (SaleID)
)
create table #SaleItemXref (
SaleID int not null,
ItemID int not null,
constraint SaleItemXref_PK primary key (SaleID,ItemID)
)

insert into #SaleLocationXref values (1,1),(2,1),(4,1),(5,1),
(3,2)

insert into #Sale values (1,'01/01/2012','01/07/2012'),
(2,'01/06/2012','01/10/2012'),
(3,'01/01/2012','02/07/2012'),
(4,'02/08/2012','02/15/2012'),
(5,'01/09/2012','01/16/2012')

insert into #SaleItemXref values (1,1),(1,2),(1,3),
(2,1),(2,4),(2,5),
(3,5),(3,6),(3,7),
(4,5),(4,6),(4,7),
(5,7),(5,8),(5,1)

;WITH Tally (n) AS (
SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)
INSERT INTO #SaleLocationXref
SELECT DISTINCT SaleID * n, LocationID * n
FROM #SaleLocationXref
CROSS APPLY Tally

;WITH Tally (n) AS (
SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)
INSERT INTO #Sale
SELECT SaleID, SaleStart, SaleEnd
FROM (
SELECT SaleID=SaleID * n, SaleStart, SaleEnd
,n=ROW_NUMBER() OVER (PARTITION BY SaleID * n ORDER BY (SELECT NULL))
FROM #Sale
CROSS APPLY Tally) a
WHERE n = 1

;WITH Tally (n) AS (
SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)
INSERT INTO #SaleItemXref
SELECT DISTINCT SaleID * n, ItemID * n
FROM #SaleItemXref
CROSS APPLY Tally

DECLARE @Holder1 INT, @Holder2 INT

PRINT '---- Revised query'
SET STATISTICS TIME ON
;WITH Sale AS (
SELECT a.SaleID, a.SaleStart, a.SaleEnd, b.LocationID, c.ItemID
FROM #Sale a
JOIN #SaleLocationXref b ON a.SaleID = b.SaleID
JOIN #SaleItemXref c ON c.SaleID = a.SaleID
)
SELECT @Holder1=SaleID, @Holder2=LocationID
FROM (
SELECT a.SaleID, a.SaleStart, a.SaleEnd, a.LocationID, a.ItemID
, SaleID2=b.SaleID, LocationID2=b.LocationID, ItemID2=b.ItemID
,n=ROW_NUMBER() OVER (PARTITION BY a.SaleID, a.LocationID ORDER BY (SELECT NULL))
FROM Sale a
CROSS APPLY (
SELECT SaleID, LocationID, ItemID
FROM Sale b
WHERE (b.SaleStart BETWEEN a.SaleStart AND b.SaleEnd OR
b.SaleEnd BETWEEN a.SaleStart AND b.SaleEnd) AND a.SaleID <> b.SaleID) b
WHERE a.LocationID = b.LocationID AND a.ItemID = b.ItemID) c
WHERE n=1
SET STATISTICS TIME OFF

PRINT '---- Original query'
SET STATISTICS TIME ON
select distinct @Holder1=SaleID, @Holder2=LocationID
from (
select SIX.SaleID, LocationID, ItemID,
Count(*) over (partition by LocationID, ItemID) as Cnt
from #SaleItemXref SIX
join (
select distinct S1.SaleID, SLX1.LocationID
from #Sale S1
join #Sale S2 -- Compare sales
on S1.SaleID <> S2.SaleID
and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap
and S2.SaleStart <= S1.SaleEnd)
join #SaleLocationXref SLX1
on S1.SaleID = SLX1.SaleID
join #SaleLocationXREF SLX2
on S2.SaleID = SLX2.SaleID
and SLX1.LocationID = SLX2.LocationID -- Compare location overlap
) #SaleOverlap
on #SaleOverlap.SaleID = SIX.SaleID
) ItemOverlap
where ItemOverlap.Cnt > 1 -- Compare item overlap
SET STATISTICS TIME OFF

PRINT '---- Original query with MAXDOP 1'
SET STATISTICS TIME ON
select distinct @Holder1=SaleID, @Holder2=LocationID
from (
select SIX.SaleID, LocationID, ItemID,
Count(*) over (partition by LocationID, ItemID) as Cnt
from #SaleItemXref SIX
join (
select distinct S1.SaleID, SLX1.LocationID
from #Sale S1
join #Sale S2 -- Compare sales
on S1.SaleID <> S2.SaleID
and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap
and S2.SaleStart <= S1.SaleEnd)
join #SaleLocationXref SLX1
on S1.SaleID = SLX1.SaleID
join #SaleLocationXREF SLX2
on S2.SaleID = SLX2.SaleID
and SLX1.LocationID = SLX2.LocationID -- Compare location overlap
) #SaleOverlap
on #SaleOverlap.SaleID = SIX.SaleID
) ItemOverlap
where ItemOverlap.Cnt > 1 -- Compare item overlap
OPTION (MAXDOP 1)
SET STATISTICS TIME OFF

drop table #SaleLocationXref
drop table #Sale
drop table #SaleItemXref


The interesting thing about this (at least to me) is the high degree of parallelism in your query. Note how CPU is greater than elapsed time for "Original query" below. I ran it also with OPTION (MAXDOP 1) to see what it would look like without parallelism. These results are here:

---- Revised query

SQL Server Execution Times:
CPU time = 2791 ms, elapsed time = 5481 ms.
---- Original query

SQL Server Execution Times:
CPU time = 3403 ms, elapsed time = 974 ms.
---- Original query with MAXDOP 1

SQL Server Execution Times:
CPU time = 1840 ms, elapsed time = 1883 ms.




My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Post #1346974
Posted Monday, August 20, 2012 11:32 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, August 27, 2013 10:00 AM
Points: 12, Visits: 71
Dwain.c,

Thank you for all the effort you put into this.

I'm still searching for some way to limit the searching for common rows (common sale items or common sale locations) to just the first match, eliminating the need to have to compare all items/locations to determine that a pair of Sales have a common Item at a common Location at the same time (SaleStart/SaleEnd).

I'm finding that Celko's idea of using EXISTS (select f1.item_id INTERSECT select f2.item_id), can sometimes work better that I initially thought, so I'm currently experimenting more with that technique, but I'm still looking for some other good ideas if anyone has any.

I'm grateful to you all for helping me with ideas to solve this problem.

Post #1347355
Posted Monday, August 20, 2012 5:46 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, June 25, 2015 1:34 PM
Points: 44, Visits: 1,107
The code below ran pretty fast on my desktop. Give it a try:
SELECT DISTINCT @holder1 = s1.SaleID,@holder2 = LocationID
FROM #Sale s1
INNER JOIN #SaleLocationXref SLX1 ON S1.SaleID = SLX1.SaleID
INNER JOIN #SaleItemXref SIX ON SIX.SaleID = s1.SaleID
WHERE EXISTS(
SELECT 1
FROM #Sale s2
INNER JOIN #SaleLocationXref SLX2 ON s2.SaleID = SLX2.SaleID
INNER JOIN #SaleItemXref SIX2 ON SIX2.SaleID = s2.SaleID
WHERE s1.SaleID <> s2.SaleID
AND s1.SaleStart <= s2.SaleEnd
AND s2.SaleStart <= s1.SaleEnd
AND SLX2.LocationID = SLX1.LocationID
AND SIX.ItemID = SIX2.ItemID)
GROUP BY s1.SaleID,LocationID

Post #1347516
Posted Monday, August 20, 2012 7:17 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: 2 days ago @ 9:50 PM
Points: 3,963, Visits: 6,292
SQL Padawan (8/20/2012)
The code below ran pretty fast on my desktop. Give it a try:
SELECT DISTINCT @holder1 = s1.SaleID,@holder2 = LocationID
FROM #Sale s1
INNER JOIN #SaleLocationXref SLX1 ON S1.SaleID = SLX1.SaleID
INNER JOIN #SaleItemXref SIX ON SIX.SaleID = s1.SaleID
WHERE EXISTS(
SELECT 1
FROM #Sale s2
INNER JOIN #SaleLocationXref SLX2 ON s2.SaleID = SLX2.SaleID
INNER JOIN #SaleItemXref SIX2 ON SIX2.SaleID = s2.SaleID
WHERE s1.SaleID <> s2.SaleID
AND s1.SaleStart <= s2.SaleEnd
AND s2.SaleStart <= s1.SaleEnd
AND SLX2.LocationID = SLX1.LocationID
AND SIX.ItemID = SIX2.ItemID)
GROUP BY s1.SaleID,LocationID



+1 SQL Padawan. The force was with you!

---- Original query

SQL Server Execution Times:
CPU time = 3433 ms, elapsed time = 942 ms.
---- Dwain's query

SQL Server Execution Times:
CPU time = 3230 ms, elapsed time = 1789 ms.
---- SQL Padawan

SQL Server Execution Times:
CPU time = 124 ms, elapsed time = 123 ms.




My mantra: No loops! No CURSORs! No RBAR! Hoo-uh!

My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.


Need to UNPIVOT? Why not CROSS APPLY VALUES instead?
Since random numbers are too important to be left to chance, let's generate some!
Learn to understand recursive CTEs by example.
Splitting strings based on patterns can be fast!
My temporal SQL musings: Calendar Tables, an Easter SQL, Time Slots and Self-maintaining, Contiguous Effective Dates in Temporal Tables
Post #1347533
Posted Tuesday, August 21, 2012 7:39 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, June 25, 2015 1:34 PM
Points: 44, Visits: 1,107
^^^ lol. I need to change the moniker.

Here is a revised version eliminating the GROUP BY and/or DISTINCT clauses. I especially dislike using either of those to return non-aggregated data. Most of the time there is always a better and more efficient way to write the query.
SELECT s1.SaleID,LocationID
FROM #Sale s1
INNER JOIN #SaleLocationXref SLX1 ON S1.SaleID = SLX1.SaleID --Find location
WHERE EXISTS(
SELECT 1
FROM #Sale s2
INNER JOIN #SaleLocationXref SLX2 ON s2.SaleID = SLX2.SaleID --Find location
INNER JOIN #SaleItemXref SIX2 ON SIX2.SaleID = s2.SaleID --Find item
WHERE s1.SaleID <> s2.SaleID --Make sure that we are comparing two different SaleIDs
AND s1.SaleStart <= s2.SaleEnd --Make sure that they fall within date ranges of each other
AND s2.SaleStart <= s1.SaleEnd --^^
AND SLX2.LocationID = SLX1.LocationID --Where s1 and s2 have the same location id
AND EXISTS(
SELECT 1
FROM #SaleItemXref SIX
WHERE SIX.SaleID = s1.SaleID --Find sale item for record s1
AND SIX.ItemID = SIX2.ItemID --Make sure sale item from s1 has a matching record in s2
)
)

Post #1347789
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse