Home Forums SQL Server 2008 T-SQL (SS2K8) Improving query performance to detect first duplicate RE: Improving query performance to detect first duplicate

  • aostanley (8/21/2012)


    I want to express my sincere gratitude to all of you for the valuable help you have offered me here.

    Thanks to your help, I have a solution now that I think is going to work for me. There is something that still puzzles me related to this problem and my solution to it, but I need a little time to boil that down into a reasonably specific question. If I end up posting a new question relating to this problem, I'll add a reference to it on this thread in case anyone of you might be interested.

    Thank you all for your contributions.

    The DDL that I gave originally doesn't match my actual DDL, it was only meant to represent a subset of the query problem I faced. The solution I have settled on for now as applied to the DDL in my original post would look like this:

    -- Sales that overlap on effective dates

    select distinct S1.SaleID, S2.SaleID OverlapSaleID

    into #SO

    from Sale S1

    join Sale S2 -- Compare sales

    on S1.SaleID <> S2.SaleID

    and (S1.SaleStart <= S2.SaleEnd -- Compare dates overlap

    and S2.SaleStart <= S1.SaleEnd)

    intersect

    -- Sales that overlap on Location

    select distinct SLX1.SaleID, SLX2.SaleID OverlapSaleID

    from SaleLocationXref SLX1

    join SaleLocationXREF SLX2

    on SLX1.SaleID <> SLX2.SaleID

    -- and SLX1.LocationID = SLX2.LocationID

    where EXISTS(select SLX1.LocationID INTERSECT select SLX2.LocationID)

    ;with SOIO as (

    select distinct #SO.SaleID, #SO.OverlapSaleID

    from #SO

    join SaleItemXref SIX1

    on #SO.SaleID = SIX1.SaleID

    join SaleItemXref SIX2

    on #SO.OverlapSaleID = SIX2.SaleID

    -- and SIX1.ItemID = SIX2.ItemID

    where EXISTS(select SIX1.ItemID intersect select SIX2.ItemID)

    )

    select distinct SaleID, Count(OverlapSaleID) OVERLAP_CNT -- OverlapSaleID

    from SOIO

    group by SaleID

    order by SaleID;

    As ChrisM@Work pointed out, it was more useful to know that 1 overlaped with 2 and 2 overlapped with 5 that it was to know that 1, 2 and 5 were all involved in an overlap. So, I change the schema of the result set returned in the solution above from my original post.

    A couple of other notes about the solution:

    1) I ended up storing intermediate results into a #SO temp table instead of including it as a named result set in the final query, because the query optimizer created a much more performant execution plan with the use of the temp table (I don't understand why).

    2) When comparing overlapping LocationID and ItemID, using a join seemed to produce the same query plan and execution times as using the EXISTS(...intersect...) technique in the where clause. I believe that the EXISTS (...intersect...) technique should have been able to deliver better performance than it actually does, so I'm going to go with it and hope to pickup some performance in future updates to SQL Server.

    I'm glad you have a solution, but I think you may be missing a different solution.

    You should take a close look at the execution plans for each of the solutions provided below (in the attached zip file). You should try downloading SQL Sentry's Plan Explorer and compare the options. You may be surprised at what you find.

    The attached file has the code for setting up the tables, updated code that can be run from SQL Sentry's Plan Explorer after setting up the tables.

    I have also included the sqlplans from Plan Explorer.

    These sqlplans can also be opened and viewed in SSMS.