Improving query performance to detect first duplicate

  • I need to query some tables looking for All Sales that are active at the same time and have at least one Location in common and at least one Item in common. I have a query that does this, but it actually counts all the common Items on Sale at each Location and this is taking too much time in the large tables that I have to query. I feel like their should be a way to restructure my query approach so that only the first common item would need to be found rather than all common items.

    I'm looking for ways to improve the performance of my query approach.

    Here is a simple representation of my data and existing query:

    [font="Courier New"]

    drop table SaleLocationXref

    drop table Sale

    drop table SaleItemXref

    create table SaleLocationXref (

    SaleID int not null,

    LocationID int not null,

    constraint SaleLocationXref_PK primary key (SaleID,LocationID)

    )

    create table Sale (

    SaleID int not null,

    SaleStart datetime not null,

    SaleEnd datetime not null,

    constraint Sale_PK primary key (SaleID)

    )

    create table SaleItemXref (

    SaleID int not null,

    ItemID int not null,

    constraint SaleItemXref_PK primary key (SaleID,ItemID)

    )

    insert into SaleLocationXref values (1,1),(2,1),(4,1),(5,1),

    (3,2)

    insert into Sale values (1,'01/01/2012','01/07/2012'),

    (2,'01/06/2012','01/10/2012'),

    (3,'01/01/2012','02/07/2012'),

    (4,'02/08/2012','02/15/2012'),

    (5,'01/09/2012','01/16/2012')

    insert into SaleItemXref values (1,1),(1,2),(1,3),

    (2,1),(2,4),(2,5),

    (3,5),(3,6),(3,7),

    (4,5),(4,6),(4,7),

    (5,7),(5,8),(5,1)

    select distinct SaleID, LocationID

    from (

    select SIX.SaleID, LocationID, ItemID,

    Count(*) over (partition by LocationID, ItemID) as Cnt

    from SaleItemXref SIX

    join (

    select distinct S1.SaleID, SLX1.LocationID

    from Sale S1

    join Sale S2 -- Compare sales

    on S1.SaleID <> S2.SaleID

    and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap

    and S2.SaleStart <= S1.SaleEnd)

    join SaleLocationXref SLX1

    on S1.SaleID = SLX1.SaleID

    join SaleLocationXREF SLX2

    on S2.SaleID = SLX2.SaleID

    and SLX1.LocationID = SLX2.LocationID -- Compare location overlap

    ) #SaleOverlap

    on #SaleOverlap.SaleID = SIX.SaleID

    ) ItemOverlap

    where ItemOverlap.Cnt > 1 -- Compare item overlap

    [/font]

  • If there are 3 SalesID's with 1 item in common but the overlap is for only 2 of the SalesID's, does that need to be reported or does the over lap have to occur for all SlaesID with that item?

    In your sample data look at SaleID 2,3, and 4 and ItemID 5. The SaleID 2 and 3 overlap each other but 4 doesn't.

  • Using what you provided, I actually came up with the following.

    create table SaleLocationXref (

    SaleID int not null,

    LocationID int not null,

    constraint SaleLocationXref_PK primary key (SaleID,LocationID)

    );

    create table Sale (

    SaleID int not null,

    SaleStart datetime not null,

    SaleEnd datetime not null,

    constraint Sale_PK primary key (SaleID)

    );

    create table SaleItemXref (

    SaleID int not null,

    ItemID int not null,

    constraint SaleItemXref_PK primary key (SaleID,ItemID)

    );

    insert into SaleLocationXref

    values (1,1),(2,1),(4,1),(5,1),(3,2);

    insert into Sale

    values (1,'01/01/2012','01/07/2012'),

    (2,'01/06/2012','01/10/2012'),

    (3,'01/01/2012','02/07/2012'),

    (4,'02/08/2012','02/15/2012'),

    (5,'01/09/2012','01/16/2012');

    insert into SaleItemXref

    values

    (1,1),(1,2),(1,3),

    (2,1),(2,4),(2,5),

    (3,5),(3,6),(3,7),

    (4,5),(4,6),(4,7),

    (5,7),(5,8),(5,1);

    GO

    with BaseData as (

    select

    six.ItemID,

    six.SaleID,

    s.SaleStart,

    s.SaleEnd,

    count(*) over (partition by six.ItemID) as ItemCnt

    from

    Sale s inner join SaleItemXref six

    on (s.SaleID = six.SaleID)

    )

    select distinct

    bd1.ItemID,

    bd1.SaleID,

    slx.LocationID,

    bd1.SaleStart,

    bd1.SaleEnd

    from

    BaseData bd1

    inner join BaseData bd2

    on (bd1.ItemID = bd2.ItemID

    and bd1.SaleID <> bd2.SaleID)

    inner join SaleLocationXref slx

    on (slx.SaleID = bd1.SaleID)

    where

    bd1.ItemCnt > 1

    and (bd1.SaleStart between bd2.SaleStart and bd2.SaleEnd or

    bd2.SaleStart between bd1.SaleStart and bd1.SaleEnd)

    order by

    bd1.ItemID,

    bd1.SaleID;

    GO

    drop table SaleLocationXref;

    drop table Sale;

    drop table SaleItemXref;

    GO

  • Celko,

    Thank you for your reply. I'm familiar with your work and am honored that you took the time to reply to this post. If you have a way to utilize EXISTS() (or TOP) to solve the query problem, then I hope you will share it. I would be very interested in that type of solution.

    Lynn,

    Thank you for your reply. It made me realize that my description of the result I was looking for was not accurate. I should have said:

    I need a query to find "All Sales that are active at the same time and have at least one Location in common and at least one Item in common."

    Forgive me for that. I'll try to edit my original post.

  • Celko,

    I really appreciate the idea. I tried it:

    select ...

    where EXISTS (select f1.item_id INTERSECT select f2.item_id)

    It sounded like it could potentially offer much better performance, but it appeared to perform exactly the same as an inner join

    select ...

    join ...

    on f1.item_id = f2.item_id

    I'm still working to try and formulate some solution with accepatable performacne, so if you or anyone else might have another idea, I would really appreciate hearing about it.

  • I took a shot at it with this:

    ;WITH Sale AS (

    SELECT a.SaleID, a.SaleStart, a.SaleEnd, b.LocationID, c.ItemID

    FROM #Sale a

    JOIN #SaleLocationXref b ON a.SaleID = b.SaleID

    JOIN #SaleItemXref c ON c.SaleID = a.SaleID

    )

    SELECT SaleID, LocationID

    FROM (

    SELECT a.SaleID, a.SaleStart, a.SaleEnd, a.LocationID, a.ItemID

    , SaleID2=b.SaleID, LocationID2=b.LocationID, ItemID2=b.ItemID

    ,n=ROW_NUMBER() OVER (PARTITION BY a.SaleID, a.LocationID ORDER BY (SELECT NULL))

    FROM Sale a

    CROSS APPLY (

    SELECT SaleID, LocationID, ItemID

    FROM Sale b

    WHERE (b.SaleStart BETWEEN a.SaleStart AND b.SaleEnd OR

    b.SaleEnd BETWEEN a.SaleStart AND b.SaleEnd) AND a.SaleID <> b.SaleID) b

    WHERE a.LocationID = b.LocationID AND a.ItemID = b.ItemID) c

    WHERE n=1

    Unfortunately mine is significantly slower than yours though. You can test with this test harness:

    create table #SaleLocationXref (

    SaleID int not null,

    LocationID int not null,

    constraint SaleLocationXref_PK primary key (SaleID,LocationID)

    )

    create table #Sale (

    SaleID int not null,

    SaleStart datetime not null,

    SaleEnd datetime not null,

    constraint Sale_PK primary key (SaleID)

    )

    create table #SaleItemXref (

    SaleID int not null,

    ItemID int not null,

    constraint SaleItemXref_PK primary key (SaleID,ItemID)

    )

    insert into #SaleLocationXref values (1,1),(2,1),(4,1),(5,1),

    (3,2)

    insert into #Sale values (1,'01/01/2012','01/07/2012'),

    (2,'01/06/2012','01/10/2012'),

    (3,'01/01/2012','02/07/2012'),

    (4,'02/08/2012','02/15/2012'),

    (5,'01/09/2012','01/16/2012')

    insert into #SaleItemXref values (1,1),(1,2),(1,3),

    (2,1),(2,4),(2,5),

    (3,5),(3,6),(3,7),

    (4,5),(4,6),(4,7),

    (5,7),(5,8),(5,1)

    ;WITH Tally (n) AS (

    SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)

    INSERT INTO #SaleLocationXref

    SELECT DISTINCT SaleID * n, LocationID * n

    FROM #SaleLocationXref

    CROSS APPLY Tally

    ;WITH Tally (n) AS (

    SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)

    INSERT INTO #Sale

    SELECT SaleID, SaleStart, SaleEnd

    FROM (

    SELECT SaleID=SaleID * n, SaleStart, SaleEnd

    ,n=ROW_NUMBER() OVER (PARTITION BY SaleID * n ORDER BY (SELECT NULL))

    FROM #Sale

    CROSS APPLY Tally) a

    WHERE n = 1

    ;WITH Tally (n) AS (

    SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)

    INSERT INTO #SaleItemXref

    SELECT DISTINCT SaleID * n, ItemID * n

    FROM #SaleItemXref

    CROSS APPLY Tally

    DECLARE @Holder1 INT, @Holder2 INT

    PRINT '---- Revised query'

    SET STATISTICS TIME ON

    ;WITH Sale AS (

    SELECT a.SaleID, a.SaleStart, a.SaleEnd, b.LocationID, c.ItemID

    FROM #Sale a

    JOIN #SaleLocationXref b ON a.SaleID = b.SaleID

    JOIN #SaleItemXref c ON c.SaleID = a.SaleID

    )

    SELECT @Holder1=SaleID, @Holder2=LocationID

    FROM (

    SELECT a.SaleID, a.SaleStart, a.SaleEnd, a.LocationID, a.ItemID

    , SaleID2=b.SaleID, LocationID2=b.LocationID, ItemID2=b.ItemID

    ,n=ROW_NUMBER() OVER (PARTITION BY a.SaleID, a.LocationID ORDER BY (SELECT NULL))

    FROM Sale a

    CROSS APPLY (

    SELECT SaleID, LocationID, ItemID

    FROM Sale b

    WHERE (b.SaleStart BETWEEN a.SaleStart AND b.SaleEnd OR

    b.SaleEnd BETWEEN a.SaleStart AND b.SaleEnd) AND a.SaleID <> b.SaleID) b

    WHERE a.LocationID = b.LocationID AND a.ItemID = b.ItemID) c

    WHERE n=1

    SET STATISTICS TIME OFF

    PRINT '---- Original query'

    SET STATISTICS TIME ON

    select distinct @Holder1=SaleID, @Holder2=LocationID

    from (

    select SIX.SaleID, LocationID, ItemID,

    Count(*) over (partition by LocationID, ItemID) as Cnt

    from #SaleItemXref SIX

    join (

    select distinct S1.SaleID, SLX1.LocationID

    from #Sale S1

    join #Sale S2 -- Compare sales

    on S1.SaleID <> S2.SaleID

    and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap

    and S2.SaleStart <= S1.SaleEnd)

    join #SaleLocationXref SLX1

    on S1.SaleID = SLX1.SaleID

    join #SaleLocationXREF SLX2

    on S2.SaleID = SLX2.SaleID

    and SLX1.LocationID = SLX2.LocationID -- Compare location overlap

    ) #SaleOverlap

    on #SaleOverlap.SaleID = SIX.SaleID

    ) ItemOverlap

    where ItemOverlap.Cnt > 1 -- Compare item overlap

    SET STATISTICS TIME OFF

    PRINT '---- Original query with MAXDOP 1'

    SET STATISTICS TIME ON

    select distinct @Holder1=SaleID, @Holder2=LocationID

    from (

    select SIX.SaleID, LocationID, ItemID,

    Count(*) over (partition by LocationID, ItemID) as Cnt

    from #SaleItemXref SIX

    join (

    select distinct S1.SaleID, SLX1.LocationID

    from #Sale S1

    join #Sale S2 -- Compare sales

    on S1.SaleID <> S2.SaleID

    and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap

    and S2.SaleStart <= S1.SaleEnd)

    join #SaleLocationXref SLX1

    on S1.SaleID = SLX1.SaleID

    join #SaleLocationXREF SLX2

    on S2.SaleID = SLX2.SaleID

    and SLX1.LocationID = SLX2.LocationID -- Compare location overlap

    ) #SaleOverlap

    on #SaleOverlap.SaleID = SIX.SaleID

    ) ItemOverlap

    where ItemOverlap.Cnt > 1 -- Compare item overlap

    OPTION (MAXDOP 1)

    SET STATISTICS TIME OFF

    drop table #SaleLocationXref

    drop table #Sale

    drop table #SaleItemXref

    The interesting thing about this (at least to me) is the high degree of parallelism in your query. Note how CPU is greater than elapsed time for "Original query" below. I ran it also with OPTION (MAXDOP 1) to see what it would look like without parallelism. These results are here:

    ---- Revised query

    SQL Server Execution Times:

    CPU time = 2791 ms, elapsed time = 5481 ms.

    ---- Original query

    SQL Server Execution Times:

    CPU time = 3403 ms, elapsed time = 974 ms.

    ---- Original query with MAXDOP 1

    SQL Server Execution Times:

    CPU time = 1840 ms, elapsed time = 1883 ms.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Dwain.c,

    Thank you for all the effort you put into this.

    I'm still searching for some way to limit the searching for common rows (common sale items or common sale locations) to just the first match, eliminating the need to have to compare all items/locations to determine that a pair of Sales have a common Item at a common Location at the same time (SaleStart/SaleEnd).

    I'm finding that Celko's idea of using EXISTS (select f1.item_id INTERSECT select f2.item_id), can sometimes work better that I initially thought, so I'm currently experimenting more with that technique, but I'm still looking for some other good ideas if anyone has any.

    I'm grateful to you all for helping me with ideas to solve this problem.

  • The code below ran pretty fast on my desktop. Give it a try:

    SELECT DISTINCT @holder1 = s1.SaleID,@holder2 = LocationID

    FROM #Sale s1

    INNER JOIN #SaleLocationXref SLX1 ON S1.SaleID = SLX1.SaleID

    INNER JOIN #SaleItemXref SIX ON SIX.SaleID = s1.SaleID

    WHERE EXISTS(

    SELECT 1

    FROM #Sale s2

    INNER JOIN #SaleLocationXref SLX2 ON s2.SaleID = SLX2.SaleID

    INNER JOIN #SaleItemXref SIX2 ON SIX2.SaleID = s2.SaleID

    WHERE s1.SaleID <> s2.SaleID

    AND s1.SaleStart <= s2.SaleEnd

    AND s2.SaleStart <= s1.SaleEnd

    AND SLX2.LocationID = SLX1.LocationID

    AND SIX.ItemID = SIX2.ItemID)

    GROUP BY s1.SaleID,LocationID

  • SQL Padawan (8/20/2012)


    The code below ran pretty fast on my desktop. Give it a try:

    SELECT DISTINCT @holder1 = s1.SaleID,@holder2 = LocationID

    FROM #Sale s1

    INNER JOIN #SaleLocationXref SLX1 ON S1.SaleID = SLX1.SaleID

    INNER JOIN #SaleItemXref SIX ON SIX.SaleID = s1.SaleID

    WHERE EXISTS(

    SELECT 1

    FROM #Sale s2

    INNER JOIN #SaleLocationXref SLX2 ON s2.SaleID = SLX2.SaleID

    INNER JOIN #SaleItemXref SIX2 ON SIX2.SaleID = s2.SaleID

    WHERE s1.SaleID <> s2.SaleID

    AND s1.SaleStart <= s2.SaleEnd

    AND s2.SaleStart <= s1.SaleEnd

    AND SLX2.LocationID = SLX1.LocationID

    AND SIX.ItemID = SIX2.ItemID)

    GROUP BY s1.SaleID,LocationID

    +1 SQL Padawan. The force was with you!

    ---- Original query

    SQL Server Execution Times:

    CPU time = 3433 ms, elapsed time = 942 ms.

    ---- Dwain's query

    SQL Server Execution Times:

    CPU time = 3230 ms, elapsed time = 1789 ms.

    ---- SQL Padawan

    SQL Server Execution Times:

    CPU time = 124 ms, elapsed time = 123 ms.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • ^^^ lol. I need to change the moniker.

    Here is a revised version eliminating the GROUP BY and/or DISTINCT clauses. I especially dislike using either of those to return non-aggregated data. Most of the time there is always a better and more efficient way to write the query.

    SELECT s1.SaleID,LocationID

    FROM #Sale s1

    INNER JOIN #SaleLocationXref SLX1 ON S1.SaleID = SLX1.SaleID --Find location

    WHERE EXISTS(

    SELECT 1

    FROM #Sale s2

    INNER JOIN #SaleLocationXref SLX2 ON s2.SaleID = SLX2.SaleID --Find location

    INNER JOIN #SaleItemXref SIX2 ON SIX2.SaleID = s2.SaleID --Find item

    WHERE s1.SaleID <> s2.SaleID --Make sure that we are comparing two different SaleIDs

    AND s1.SaleStart <= s2.SaleEnd --Make sure that they fall within date ranges of each other

    AND s2.SaleStart <= s1.SaleEnd --^^

    AND SLX2.LocationID = SLX1.LocationID --Where s1 and s2 have the same location id

    AND EXISTS(

    SELECT 1

    FROM #SaleItemXref SIX

    WHERE SIX.SaleID = s1.SaleID --Find sale item for record s1

    AND SIX.ItemID = SIX2.ItemID --Make sure sale item from s1 has a matching record in s2

    )

    )

  • Sales 1 and 2 overlap on date range and have a common item & location.

    Same with 2 and 5. But 1 and 5 don't overlap: 1, 2 & 5 aren't a set with overlapping date ranges and location in common.

    It makes more sense to me to do this as sales pairs, as follows:

    SELECT

    s1_SaleID = s1.SaleID,

    s2_SaleID = s2.SaleID,

    l.*, i.*

    FROM Sale s1

    INNER JOIN Sale s2

    ON s1.SaleID < s2.SaleID

    AND S1.SaleStart <= S2.SaleEnd

    AND S2.SaleStart <= S1.SaleEnd

    CROSS APPLY (

    SELECT li.LocationID

    FROM SaleLocationXref li

    WHERE li.SaleID = s1.SaleID

    OR li.SaleID = s2.SaleID

    GROUP BY li.LocationID

    HAVING COUNT(*) = 2

    ) l

    CROSS APPLY (

    SELECT ii.ItemID

    FROM SaleItemXref ii

    WHERE ii.SaleID = s1.SaleID

    OR ii.SaleID = s2.SaleID

    GROUP BY ii.ItemID

    HAVING COUNT(*) = 2

    ) i

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Here is another potential solution. Would like to have others give it a test.

    Against the small original test set it worked. Unfortunately, I think it may be wrong in other ways. This would probably work better if there were a fixed test suite of values with a validated results set to test against.

    with BaseData as (

    select

    s.SaleID,

    s.SaleStart,

    s.SaleEnd,

    slx.LocationID,

    six.ItemID

    from

    Sale s

    inner join SaleLocationXref slx

    on (s.SaleID = slx.SaleID)

    inner join SaleItemXref six

    on (s.SaleID = six.SaleID)

    )

    select

    bd.SaleID,

    bd.SaleStart,

    bd.SaleEnd,

    bd.LocationID,

    bd.ItemID

    from

    BaseData bd

    where

    exists(select 1

    from BaseData bd1

    where

    bd.SaleID <> bd1.SaleID and

    bd.LocationID = bd1.LocationID and

    bd.ItemID = bd1.ItemID and

    bd.SaleStart <= bd1.SaleEnd and

    bd.SaleEnd >= bd1.SaleStart)

    order by

    bd.ItemID,

    bd.LocationID,

    bd.SaleID;

    Modified to work using the test suite above (I commented 3 lines in the final select statement and added the holder variables) and include a run using MAXDOP(1) as well:

    set nocount on;

    go

    create table #SaleLocationXref (

    SaleID int not null,

    LocationID int not null,

    constraint SaleLocationXref_PK primary key (SaleID,LocationID)

    )

    create table #Sale (

    SaleID int not null,

    SaleStart datetime not null,

    SaleEnd datetime not null,

    constraint Sale_PK primary key (SaleID)

    )

    create table #SaleItemXref (

    SaleID int not null,

    ItemID int not null,

    constraint SaleItemXref_PK primary key (SaleID,ItemID)

    )

    insert into #SaleLocationXref values (1,1),(2,1),(4,1),(5,1),

    (3,2)

    insert into #Sale values (1,'01/01/2012','01/07/2012'),

    (2,'01/06/2012','01/10/2012'),

    (3,'01/01/2012','02/07/2012'),

    (4,'02/08/2012','02/15/2012'),

    (5,'01/09/2012','01/16/2012')

    insert into #SaleItemXref values (1,1),(1,2),(1,3),

    (2,1),(2,4),(2,5),

    (3,5),(3,6),(3,7),

    (4,5),(4,6),(4,7),

    (5,7),(5,8),(5,1)

    ;WITH Tally (n) AS (

    SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)

    INSERT INTO #SaleLocationXref

    SELECT DISTINCT SaleID * n, LocationID * n

    FROM #SaleLocationXref

    CROSS APPLY Tally

    ;WITH Tally (n) AS (

    SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)

    INSERT INTO #Sale

    SELECT SaleID, SaleStart, SaleEnd

    FROM (

    SELECT SaleID=SaleID * n, SaleStart, SaleEnd

    ,n=ROW_NUMBER() OVER (PARTITION BY SaleID * n ORDER BY (SELECT NULL))

    FROM #Sale

    CROSS APPLY Tally) a

    WHERE n = 1

    ;WITH Tally (n) AS (

    SELECT TOP 10000 10+ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns t1 CROSS JOIN sys.all_columns t2)

    INSERT INTO #SaleItemXref

    SELECT DISTINCT SaleID * n, ItemID * n

    FROM #SaleItemXref

    CROSS APPLY Tally

    DECLARE @Holder1 INT, @Holder2 INT

    PRINT '';

    PRINT '---- Revised query'

    SET STATISTICS TIME ON

    ;WITH Sale AS (

    SELECT a.SaleID, a.SaleStart, a.SaleEnd, b.LocationID, c.ItemID

    FROM #Sale a

    JOIN #SaleLocationXref b ON a.SaleID = b.SaleID

    JOIN #SaleItemXref c ON c.SaleID = a.SaleID

    )

    SELECT @Holder1=SaleID, @Holder2=LocationID

    FROM (

    SELECT a.SaleID, a.SaleStart, a.SaleEnd, a.LocationID, a.ItemID

    , SaleID2=b.SaleID, LocationID2=b.LocationID, ItemID2=b.ItemID

    ,n=ROW_NUMBER() OVER (PARTITION BY a.SaleID, a.LocationID ORDER BY (SELECT NULL))

    FROM Sale a

    CROSS APPLY (

    SELECT SaleID, LocationID, ItemID

    FROM Sale b

    WHERE (b.SaleStart BETWEEN a.SaleStart AND b.SaleEnd OR

    b.SaleEnd BETWEEN a.SaleStart AND b.SaleEnd) AND a.SaleID <> b.SaleID) b

    WHERE a.LocationID = b.LocationID AND a.ItemID = b.ItemID) c

    WHERE n=1

    SET STATISTICS TIME OFF

    PRINT '';

    PRINT '---- Original query'

    SET STATISTICS TIME ON

    select distinct @Holder1=SaleID, @Holder2=LocationID

    from (

    select SIX.SaleID, LocationID, ItemID,

    Count(*) over (partition by LocationID, ItemID) as Cnt

    from #SaleItemXref SIX

    join (

    select distinct S1.SaleID, SLX1.LocationID

    from #Sale S1

    join #Sale S2 -- Compare sales

    on S1.SaleID <> S2.SaleID

    and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap

    and S2.SaleStart <= S1.SaleEnd)

    join #SaleLocationXref SLX1

    on S1.SaleID = SLX1.SaleID

    join #SaleLocationXREF SLX2

    on S2.SaleID = SLX2.SaleID

    and SLX1.LocationID = SLX2.LocationID -- Compare location overlap

    ) #SaleOverlap

    on #SaleOverlap.SaleID = SIX.SaleID

    ) ItemOverlap

    where ItemOverlap.Cnt > 1 -- Compare item overlap

    SET STATISTICS TIME OFF

    PRINT '';

    PRINT '---- Original query with MAXDOP 1'

    SET STATISTICS TIME ON

    select distinct @Holder1=SaleID, @Holder2=LocationID

    from (

    select SIX.SaleID, LocationID, ItemID,

    Count(*) over (partition by LocationID, ItemID) as Cnt

    from #SaleItemXref SIX

    join (

    select distinct S1.SaleID, SLX1.LocationID

    from #Sale S1

    join #Sale S2 -- Compare sales

    on S1.SaleID <> S2.SaleID

    and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap

    and S2.SaleStart <= S1.SaleEnd)

    join #SaleLocationXref SLX1

    on S1.SaleID = SLX1.SaleID

    join #SaleLocationXREF SLX2

    on S2.SaleID = SLX2.SaleID

    and SLX1.LocationID = SLX2.LocationID -- Compare location overlap

    ) #SaleOverlap

    on #SaleOverlap.SaleID = SIX.SaleID

    ) ItemOverlap

    where ItemOverlap.Cnt > 1 -- Compare item overlap

    OPTION (MAXDOP 1)

    SET STATISTICS TIME OFF

    PRINT '';

    PRINT '---- Lynn''s Query';

    SET STATISTICS TIME ON;

    with BaseData as (

    select

    s.SaleID,

    s.SaleStart,

    s.SaleEnd,

    slx.LocationID,

    six.ItemID

    from

    #Sale s

    inner join #SaleLocationXref slx

    on (s.SaleID = slx.SaleID)

    inner join #SaleItemXref six

    on (s.SaleID = six.SaleID)

    )

    select

    --bd.SaleStart,

    --bd.SaleEnd,

    --bd.ItemID,

    @Holder1 = bd.SaleID,

    @Holder2 = bd.LocationID

    from

    BaseData bd

    where

    exists(select 1

    from BaseData bd1

    where

    bd.SaleID <> bd1.SaleID and

    bd.LocationID = bd1.LocationID and

    bd.ItemID = bd1.ItemID and

    bd.SaleStart <= bd1.SaleEnd and

    bd.SaleEnd >= bd1.SaleStart)

    order by

    bd.ItemID,

    bd.LocationID,

    bd.SaleID

    ;

    SET STATISTICS TIME OFF

    PRINT '';

    PRINT '---- Lynn''s Query with MAXDOP(1)';

    SET STATISTICS TIME ON;

    with BaseData as (

    select

    s.SaleID,

    s.SaleStart,

    s.SaleEnd,

    slx.LocationID,

    six.ItemID

    from

    #Sale s

    inner join #SaleLocationXref slx

    on (s.SaleID = slx.SaleID)

    inner join #SaleItemXref six

    on (s.SaleID = six.SaleID)

    )

    select

    --bd.SaleStart,

    --bd.SaleEnd,

    --bd.ItemID,

    @Holder1 = bd.SaleID,

    @Holder2 = bd.LocationID

    from

    BaseData bd

    where

    exists(select 1

    from BaseData bd1

    where

    bd.SaleID <> bd1.SaleID and

    bd.LocationID = bd1.LocationID and

    bd.ItemID = bd1.ItemID and

    bd.SaleStart <= bd1.SaleEnd and

    bd.SaleEnd >= bd1.SaleStart)

    order by

    bd.ItemID,

    bd.LocationID,

    bd.SaleID

    OPTION (MAXDOP 1);

    SET STATISTICS TIME OFF

    go

    drop table #SaleLocationXref

    drop table #Sale

    drop table #SaleItemXref

    And the results when run on a test VM:

    ---- Revised query

    SQL Server Execution Times:

    CPU time = 2768 ms, elapsed time = 1646 ms.

    ---- Original query

    SQL Server Execution Times:

    CPU time = 2609 ms, elapsed time = 1372 ms.

    ---- Original query with MAXDOP 1

    SQL Server Execution Times:

    CPU time = 2312 ms, elapsed time = 2337 ms.

    ---- Lynn's Query

    SQL Server Execution Times:

    CPU time = 1141 ms, elapsed time = 641 ms.

    ---- Lynn's Query with MAXDOP(1)

    SQL Server Execution Times:

    CPU time = 969 ms, elapsed time = 969 ms.

  • I want to express my sincere gratitude to all of you for the valuable help you have offered me here.

    Thanks to your help, I have a solution now that I think is going to work for me. There is something that still puzzles me related to this problem and my solution to it, but I need a little time to boil that down into a reasonably specific question. If I end up posting a new question relating to this problem, I'll add a reference to it on this thread in case anyone of you might be interested.

    Thank you all for your contributions.

    The DDL that I gave originally doesn't match my actual DDL, it was only meant to represent a subset of the query problem I faced. The solution I have settled on for now as applied to the DDL in my original post would look like this:

    -- Sales that overlap on effective dates

    select distinct S1.SaleID, S2.SaleID OverlapSaleID

    into #SO

    from Sale S1

    join Sale S2 -- Compare sales

    on S1.SaleID <> S2.SaleID

    and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap

    and S2.SaleStart <= S1.SaleEnd)

    intersect

    -- Sales that overlap on Location

    select distinct SLX1.SaleID, SLX2.SaleID OverlapSaleID

    from SaleLocationXref SLX1

    join SaleLocationXREF SLX2

    on SLX1.SaleID <> SLX2.SaleID

    -- and SLX1.LocationID = SLX2.LocationID

    where EXISTS(select SLX1.LocationID INTERSECT select SLX2.LocationID)

    ;with SOIO as (

    select distinct #SO.SaleID, #SO.OverlapSaleID

    from #SO

    join SaleItemXref SIX1

    on #SO.SaleID = SIX1.SaleID

    join SaleItemXref SIX2

    on #SO.OverlapSaleID = SIX2.SaleID

    -- and SIX1.ItemID = SIX2.ItemID

    where EXISTS(select SIX1.ItemID intersect select SIX2.ItemID)

    )

    select distinct SaleID, Count(OverlapSaleID) OVERLAP_CNT -- OverlapSaleID

    from SOIO

    group by SaleID

    order by SaleID;

    As ChrisM@Work pointed out, it was more useful to know that 1 overlaped with 2 and 2 overlapped with 5 that it was to know that 1, 2 and 5 were all involved in an overlap. So, I change the schema of the result set returned in the solution above from my original post.

    A couple of other notes about the solution:

    1) I ended up storing intermediate results into a #SO temp table instead of including it as a named result set in the final query, because the query optimizer created a much more performant execution plan with the use of the temp table (I don't understand why).

    2) When comparing overlapping LocationID and ItemID, using a join seemed to produce the same query plan and execution times as using the EXISTS(...intersect...) technique in the where clause. I believe that the EXISTS (...intersect...) technique should have been able to deliver better performance than it actually does, so I'm going to go with it and hope to pickup some performance in future updates to SQL Server.

  • aostanley (8/21/2012)


    I want to express my sincere gratitude to all of you for the valuable help you have offered me here.

    Thanks to your help, I have a solution now that I think is going to work for me. There is something that still puzzles me related to this problem and my solution to it, but I need a little time to boil that down into a reasonably specific question. If I end up posting a new question relating to this problem, I'll add a reference to it on this thread in case anyone of you might be interested.

    Thank you all for your contributions.

    The DDL that I gave originally doesn't match my actual DDL, it was only meant to represent a subset of the query problem I faced. The solution I have settled on for now as applied to the DDL in my original post would look like this:

    -- Sales that overlap on effective dates

    select distinct S1.SaleID, S2.SaleID OverlapSaleID

    into #SO

    from Sale S1

    join Sale S2 -- Compare sales

    on S1.SaleID <> S2.SaleID

    and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap

    and S2.SaleStart <= S1.SaleEnd)

    intersect

    -- Sales that overlap on Location

    select distinct SLX1.SaleID, SLX2.SaleID OverlapSaleID

    from SaleLocationXref SLX1

    join SaleLocationXREF SLX2

    on SLX1.SaleID <> SLX2.SaleID

    -- and SLX1.LocationID = SLX2.LocationID

    where EXISTS(select SLX1.LocationID INTERSECT select SLX2.LocationID)

    ;with SOIO as (

    select distinct #SO.SaleID, #SO.OverlapSaleID

    from #SO

    join SaleItemXref SIX1

    on #SO.SaleID = SIX1.SaleID

    join SaleItemXref SIX2

    on #SO.OverlapSaleID = SIX2.SaleID

    -- and SIX1.ItemID = SIX2.ItemID

    where EXISTS(select SIX1.ItemID intersect select SIX2.ItemID)

    )

    select distinct SaleID, Count(OverlapSaleID) OVERLAP_CNT -- OverlapSaleID

    from SOIO

    group by SaleID

    order by SaleID;

    As ChrisM@Work pointed out, it was more useful to know that 1 overlaped with 2 and 2 overlapped with 5 that it was to know that 1, 2 and 5 were all involved in an overlap. So, I change the schema of the result set returned in the solution above from my original post.

    A couple of other notes about the solution:

    1) I ended up storing intermediate results into a #SO temp table instead of including it as a named result set in the final query, because the query optimizer created a much more performant execution plan with the use of the temp table (I don't understand why).

    2) When comparing overlapping LocationID and ItemID, using a join seemed to produce the same query plan and execution times as using the EXISTS(...intersect...) technique in the where clause. I believe that the EXISTS (...intersect...) technique should have been able to deliver better performance than it actually does, so I'm going to go with it and hope to pickup some performance in future updates to SQL Server.

    I'm glad you have a solution, but I think you may be missing a different solution.

    You should take a close look at the execution plans for each of the solutions provided below (in the attached zip file). You should try downloading SQL Sentry's Plan Explorer and compare the options. You may be surprised at what you find.

    The attached file has the code for setting up the tables, updated code that can be run from SQL Sentry's Plan Explorer after setting up the tables.

    I have also included the sqlplans from Plan Explorer.

    These sqlplans can also be opened and viewed in SSMS.

  • Removed: Didn't test properly.


    N 56°04'39.16"
    E 12°55'05.25"

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply