|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Thursday, May 30, 2013 12:01 PM
Points: 12,
Visits: 69
|
|
I need to query some tables looking for All Sales that are active at the same time and have at least one Location in common and at least one Item in common. I have a query that does this, but it actually counts all the common Items on Sale at each Location and this is taking too much time in the large tables that I have to query. I feel like their should be a way to restructure my query approach so that only the first common item would need to be found rather than all common items.
I'm looking for ways to improve the performance of my query approach.
Here is a simple representation of my data and existing query:
drop table SaleLocationXref drop table Sale drop table SaleItemXref
create table SaleLocationXref ( SaleID int not null, LocationID int not null, constraint SaleLocationXref_PK primary key (SaleID,LocationID) ) create table Sale ( SaleID int not null, SaleStart datetime not null, SaleEnd datetime not null, constraint Sale_PK primary key (SaleID) ) create table SaleItemXref ( SaleID int not null, ItemID int not null, constraint SaleItemXref_PK primary key (SaleID,ItemID) )
insert into SaleLocationXref values (1,1),(2,1),(4,1),(5,1), (3,2)
insert into Sale values (1,'01/01/2012','01/07/2012'), (2,'01/06/2012','01/10/2012'), (3,'01/01/2012','02/07/2012'), (4,'02/08/2012','02/15/2012'), (5,'01/09/2012','01/16/2012')
insert into SaleItemXref values (1,1),(1,2),(1,3), (2,1),(2,4),(2,5), (3,5),(3,6),(3,7), (4,5),(4,6),(4,7), (5,7),(5,8),(5,1)
select distinct SaleID, LocationID from ( select SIX.SaleID, LocationID, ItemID, Count(*) over (partition by LocationID, ItemID) as Cnt from SaleItemXref SIX join ( select distinct S1.SaleID, SLX1.LocationID from Sale S1 join Sale S2 -- Compare sales on S1.SaleID <> S2.SaleID and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap and S2.SaleStart <= S1.SaleEnd) join SaleLocationXref SLX1 on S1.SaleID = SLX1.SaleID join SaleLocationXREF SLX2 on S2.SaleID = SLX2.SaleID and SLX1.LocationID = SLX2.LocationID -- Compare location overlap ) #SaleOverlap on #SaleOverlap.SaleID = SIX.SaleID ) ItemOverlap where ItemOverlap.Cnt > 1 -- Compare item overlap
|
|
|
|
|
SSC-Insane
         
Group: General Forum Members
Last Login: Today @ 4:33 PM
Points: 21,832,
Visits: 27,860
|
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
>> I need to query some tables looking for Sale's at different Location's that are active at the same time and have at least one Item in common. <<
That is a clear spec! Thank you.
>> I feel like their should be a way to restructure my query approach so that only the first common item would need to be found rather than all common items. <<
No. SQL is a set-oriented language and it is designed to return a set. An EXISTS() predicate will stop when it has a rows that establishes it is true.
The term XREF is from the old Network Database and was done with double link pointer chains; in SQL you use REFERENCES instead. Your sales table has attribute splits and needs to be re-done. Time and space are dimensions of the sales; they belong in the same table! Google up “attribute Splitting” and read the details. Her is a corrected table; smaller, faster, and it has dat integrity:
CREATE TABLE Sales (sale_id INTEGER NOT NULL REFERENCES Sales(sale_id) ON DELETE CASCADE, location_id INTEGER NOT NULL --- SAN codes or DUNS? REFERENCES Locations (location_id ON DELETE CASCADE, sale_start_date DATE NOT NULL, sale_end_date DATE NOT NULL, CONSTRAINT Date_Order CHECK (sale_start_date <= sale_end_date), PRIMARY KEY (sale_id, location_id, sale_start_date) );
The Date_Ordering constraint will give the optimizer a boost. We have a date data type now. The correct format is yyyy-mm-dd in Standard SQL and it is the default with the new temporal types. It is also used in other ISO Standards. Can I assume that you have an inventory from which you draw sale items?
CREATE TABLE Sales_Items (sale_id INTEGER NOT NULL REFERENCES Sales(sale_id) ON DELETE CASCADE, item_id INTEGER NOT NULL REFERENCES Inventory*(item_id) ON DELETE CASCADE, PRIMARY KEY (sale_id, item_id) );
Do you have a Calendar table? It makes life much easier. Pick a date range for the query and save some work. Let's build flyers to advertise the sales in a CTE, then findb concurent slaes, etc
WITH Sales_Flyers AS (SELECT S.sale_id, S.sale_start_date, S.sale_end_date, F.item_id FROM Sales AS S Sale_items AS SI WHERE S.sale_id = SI.sale_id)
SELECT C.cal_date, F1.sale_id AS sale_1, F2.sale_id AS sale_2, COUNT(*) AS common_item_cnt FROM Calendar AS C, Sales_Flyers AS F1, Sales_Flyers AS F2 WHERE C.cal_date BETWEEN @in_report_start_date AND @in_report_end_date AND C.cal_date BETWEEN F1.sale_start_date AND F1.sale_end_date AND C.cal_date BETWEEN F2.sale_start_date AND F2.sale_end_date AND F1.sale_id < F2.sale_id ;
untested. AND F1.item_id = F2.item_id GROUP BY C.cal_date, F1.sale_id AS sale_1, F2.sale_id AS sale_2;
This gives you the count of common items by date between pairs of sales. The grouping and counting could be replaced with an EXISTS() on the item_id and dates. This seewsmt obe extra information at litle cost.
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
SSC-Insane
         
Group: General Forum Members
Last Login: Today @ 4:33 PM
Points: 21,832,
Visits: 27,860
|
|
Using what you provided, I actually came up with the following.
create table SaleLocationXref ( SaleID int not null, LocationID int not null, constraint SaleLocationXref_PK primary key (SaleID,LocationID) ); create table Sale ( SaleID int not null, SaleStart datetime not null, SaleEnd datetime not null, constraint Sale_PK primary key (SaleID) ); create table SaleItemXref ( SaleID int not null, ItemID int not null, constraint SaleItemXref_PK primary key (SaleID,ItemID) );
insert into SaleLocationXref values (1,1),(2,1),(4,1),(5,1),(3,2);
insert into Sale values (1,'01/01/2012','01/07/2012'), (2,'01/06/2012','01/10/2012'), (3,'01/01/2012','02/07/2012'), (4,'02/08/2012','02/15/2012'), (5,'01/09/2012','01/16/2012');
insert into SaleItemXref values (1,1),(1,2),(1,3), (2,1),(2,4),(2,5), (3,5),(3,6),(3,7), (4,5),(4,6),(4,7), (5,7),(5,8),(5,1); GO
with BaseData as ( select six.ItemID, six.SaleID, s.SaleStart, s.SaleEnd, count(*) over (partition by six.ItemID) as ItemCnt from Sale s inner join SaleItemXref six on (s.SaleID = six.SaleID) ) select distinct bd1.ItemID, bd1.SaleID, slx.LocationID, bd1.SaleStart, bd1.SaleEnd from BaseData bd1 inner join BaseData bd2 on (bd1.ItemID = bd2.ItemID and bd1.SaleID <> bd2.SaleID) inner join SaleLocationXref slx on (slx.SaleID = bd1.SaleID) where bd1.ItemCnt > 1 and (bd1.SaleStart between bd2.SaleStart and bd2.SaleEnd or bd2.SaleStart between bd1.SaleStart and bd1.SaleEnd) order by bd1.ItemID, bd1.SaleID; GO
drop table SaleLocationXref; drop table Sale; drop table SaleItemXref; GO
 Lynn Pettis
For better assistance in answering your questions, click here For tips to get better help with Performance Problems, click here For Running Totals and its variations, click here or when working with partitioned tables For more about Tally Tables, click here For more about Cross Tabs and Pivots, click here and here Managing Transaction Logs
SQL Musings from the Desert Fountain Valley SQL (My Mirror Blog)
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Thursday, May 30, 2013 12:01 PM
Points: 12,
Visits: 69
|
|
Celko,
Thank you for your reply. I'm familiar with your work and am honored that you took the time to reply to this post. If you have a way to utilize EXISTS() (or TOP) to solve the query problem, then I hope you will share it. I would be very interested in that type of solution.
Lynn,
Thank you for your reply. It made me realize that my description of the result I was looking for was not accurate. I should have said: I need a query to find "All Sales that are active at the same time and have at least one Location in common and at least one Item in common."
Forgive me for that. I'll try to edit my original post.
|
|
|
|
|
SSCommitted
      
Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945,
Visits: 2,782
|
|
If you have a way to utilize EXISTS() (or TOP) to solve the query problem, then I hope you will share it. I would be very interested in that type of solution
I have to get ready for my second eye surgery, so it is a bitch to type, but think about using this skeleton: EXISTS (F1.item_id INTERSECT F2.item_id).
I just have a feeling that there is some way to use just set operators to get the answer. J don't know if anyone has timed them yet in T-SQL but they are scary fast in Oracle and DB2.
Books in Celko Series for Morgan-Kaufmann Publishing Analytics and OLAP in SQL Data and Databases: Concepts in Practice Data, Measurements and Standards in SQL SQL for Smarties SQL Programming Style SQL Puzzles and Answers Thinking in Sets Trees and Hierarchies in SQL
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Thursday, May 30, 2013 12:01 PM
Points: 12,
Visits: 69
|
|
Celko,
I really appreciate the idea. I tried it:
select ... where EXISTS (select f1.item_id INTERSECT select f2.item_id)
It sounded like it could potentially offer much better performance, but it appeared to perform exactly the same as an inner join
select ... join ... on f1.item_id = f2.item_id
I'm still working to try and formulate some solution with accepatable performacne, so if you or anyone else might have another idea, I would really appreciate hearing about it.
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Today @ 1:40 PM
Points: 2,370,
Visits: 3,251
|
|
I took a shot at it with this:
;WITH Sale AS ( SELECT a.SaleID, a.SaleStart, a.SaleEnd, b.LocationID, c.ItemID FROM #Sale a JOIN #SaleLocationXref b ON a.SaleID = b.SaleID JOIN #SaleItemXref c ON c.SaleID = a.SaleID ) SELECT SaleID, LocationID FROM ( SELECT a.SaleID, a.SaleStart, a.SaleEnd, a.LocationID, a.ItemID , SaleID2=b.SaleID, LocationID2=b.LocationID, ItemID2=b.ItemID ,n=ROW_NUMBER() OVER (PARTITION BY a.SaleID, a.LocationID ORDER BY (SELECT NULL)) FROM Sale a CROSS APPLY ( SELECT SaleID, LocationID, ItemID FROM Sale b WHERE (b.SaleStart BETWEEN a.SaleStart AND b.SaleEnd OR b.SaleEnd BETWEEN a.SaleStart AND b.SaleEnd) AND a.SaleID <> b.SaleID) b WHERE a.LocationID = b.LocationID AND a.ItemID = b.ItemID) c WHERE n=1
Unfortunately mine is significantly slower than yours though. You can test with this test harness:
create table #SaleLocationXref ( SaleID int not null, LocationID int not null, constraint SaleLocationXref_PK primary key (SaleID,LocationID) ) create table #Sale ( SaleID int not null, SaleStart datetime not null, SaleEnd datetime not null, constraint Sale_PK primary key (SaleID) ) create table #SaleItemXref ( SaleID int not null, ItemID int not null, constraint SaleItemXref_PK primary key (SaleID,ItemID) )
insert into #SaleLocationXref values (1,1),(2,1),(4,1),(5,1), (3,2)
insert into #Sale values (1,'01/01/2012','01/07/2012'), (2,'01/06/2012','01/10/2012'), (3,'01/01/2012','02/07/2012'), (4,'02/08/2012','02/15/2012'), (5,'01/09/2012','01/16/2012')
insert into #SaleItemXref values (1,1),(1,2),(1,3), (2,1),(2,4),(2,5), (3,5),(3,6),(3,7), (4,5),(4,6),(4,7), (5,7),(5,8),(5,1)
;WITH Tally (n) AS ( SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2) INSERT INTO #SaleLocationXref SELECT DISTINCT SaleID * n, LocationID * n FROM #SaleLocationXref CROSS APPLY Tally
;WITH Tally (n) AS ( SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2) INSERT INTO #Sale SELECT SaleID, SaleStart, SaleEnd FROM ( SELECT SaleID=SaleID * n, SaleStart, SaleEnd ,n=ROW_NUMBER() OVER (PARTITION BY SaleID * n ORDER BY (SELECT NULL)) FROM #Sale CROSS APPLY Tally) a WHERE n = 1
;WITH Tally (n) AS ( SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2) INSERT INTO #SaleItemXref SELECT DISTINCT SaleID * n, ItemID * n FROM #SaleItemXref CROSS APPLY Tally
DECLARE @Holder1 INT, @Holder2 INT
PRINT '---- Revised query' SET STATISTICS TIME ON ;WITH Sale AS ( SELECT a.SaleID, a.SaleStart, a.SaleEnd, b.LocationID, c.ItemID FROM #Sale a JOIN #SaleLocationXref b ON a.SaleID = b.SaleID JOIN #SaleItemXref c ON c.SaleID = a.SaleID ) SELECT @Holder1=SaleID, @Holder2=LocationID FROM ( SELECT a.SaleID, a.SaleStart, a.SaleEnd, a.LocationID, a.ItemID , SaleID2=b.SaleID, LocationID2=b.LocationID, ItemID2=b.ItemID ,n=ROW_NUMBER() OVER (PARTITION BY a.SaleID, a.LocationID ORDER BY (SELECT NULL)) FROM Sale a CROSS APPLY ( SELECT SaleID, LocationID, ItemID FROM Sale b WHERE (b.SaleStart BETWEEN a.SaleStart AND b.SaleEnd OR b.SaleEnd BETWEEN a.SaleStart AND b.SaleEnd) AND a.SaleID <> b.SaleID) b WHERE a.LocationID = b.LocationID AND a.ItemID = b.ItemID) c WHERE n=1 SET STATISTICS TIME OFF
PRINT '---- Original query' SET STATISTICS TIME ON select distinct @Holder1=SaleID, @Holder2=LocationID from ( select SIX.SaleID, LocationID, ItemID, Count(*) over (partition by LocationID, ItemID) as Cnt from #SaleItemXref SIX join ( select distinct S1.SaleID, SLX1.LocationID from #Sale S1 join #Sale S2 -- Compare sales on S1.SaleID <> S2.SaleID and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap and S2.SaleStart <= S1.SaleEnd) join #SaleLocationXref SLX1 on S1.SaleID = SLX1.SaleID join #SaleLocationXREF SLX2 on S2.SaleID = SLX2.SaleID and SLX1.LocationID = SLX2.LocationID -- Compare location overlap ) #SaleOverlap on #SaleOverlap.SaleID = SIX.SaleID ) ItemOverlap where ItemOverlap.Cnt > 1 -- Compare item overlap SET STATISTICS TIME OFF
PRINT '---- Original query with MAXDOP 1' SET STATISTICS TIME ON select distinct @Holder1=SaleID, @Holder2=LocationID from ( select SIX.SaleID, LocationID, ItemID, Count(*) over (partition by LocationID, ItemID) as Cnt from #SaleItemXref SIX join ( select distinct S1.SaleID, SLX1.LocationID from #Sale S1 join #Sale S2 -- Compare sales on S1.SaleID <> S2.SaleID and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap and S2.SaleStart <= S1.SaleEnd) join #SaleLocationXref SLX1 on S1.SaleID = SLX1.SaleID join #SaleLocationXREF SLX2 on S2.SaleID = SLX2.SaleID and SLX1.LocationID = SLX2.LocationID -- Compare location overlap ) #SaleOverlap on #SaleOverlap.SaleID = SIX.SaleID ) ItemOverlap where ItemOverlap.Cnt > 1 -- Compare item overlap OPTION (MAXDOP 1) SET STATISTICS TIME OFF
drop table #SaleLocationXref drop table #Sale drop table #SaleItemXref
The interesting thing about this (at least to me) is the high degree of parallelism in your query. Note how CPU is greater than elapsed time for "Original query" below. I ran it also with OPTION (MAXDOP 1) to see what it would look like without parallelism. These results are here:
---- Revised query
SQL Server Execution Times: CPU time = 2791 ms, elapsed time = 5481 ms. ---- Original query
SQL Server Execution Times: CPU time = 3403 ms, elapsed time = 974 ms. ---- Original query with MAXDOP 1
SQL Server Execution Times: CPU time = 1840 ms, elapsed time = 1883 ms.
No loops! No CURSORs! No RBAR! Hoo-uh!
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
Need to UNPIVOT? Why not CROSS APPLY VALUES instead? Since random numbers are too important to be left to chance, let's generate some! Are you too recursively challenged? Splitting strings based on patterns can be fast!
|
|
|
|
|
Grasshopper
      
Group: General Forum Members
Last Login: Thursday, May 30, 2013 12:01 PM
Points: 12,
Visits: 69
|
|
Dwain.c,
Thank you for all the effort you put into this.
I'm still searching for some way to limit the searching for common rows (common sale items or common sale locations) to just the first match, eliminating the need to have to compare all items/locations to determine that a pair of Sales have a common Item at a common Location at the same time (SaleStart/SaleEnd).
I'm finding that Celko's idea of using EXISTS (select f1.item_id INTERSECT select f2.item_id), can sometimes work better that I initially thought, so I'm currently experimenting more with that technique, but I'm still looking for some other good ideas if anyone has any.
I'm grateful to you all for helping me with ideas to solve this problem.
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Today @ 3:33 PM
Points: 42,
Visits: 962
|
|
The code below ran pretty fast on my desktop. Give it a try:
SELECT DISTINCT @holder1 = s1.SaleID,@holder2 = LocationID FROM #Sale s1 INNER JOIN #SaleLocationXref SLX1 ON S1.SaleID = SLX1.SaleID INNER JOIN #SaleItemXref SIX ON SIX.SaleID = s1.SaleID WHERE EXISTS( SELECT 1 FROM #Sale s2 INNER JOIN #SaleLocationXref SLX2 ON s2.SaleID = SLX2.SaleID INNER JOIN #SaleItemXref SIX2 ON SIX2.SaleID = s2.SaleID WHERE s1.SaleID <> s2.SaleID AND s1.SaleStart <= s2.SaleEnd AND s2.SaleStart <= s1.SaleEnd AND SLX2.LocationID = SLX1.LocationID AND SIX.ItemID = SIX2.ItemID) GROUP BY s1.SaleID,LocationID
|
|
|
|