Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

  • Jeff Moden

    SSC Guru

    Points: 995132

    Comments posted to this topic are about the item Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems
    Create a Tally Function (fnTally)

  • mark hutchinson

    SSC Eights!

    Points: 878

    @jeff

    Did you compare the speed of this query against one where your first (Group By) From clause is a Select Distinct (sub) query?

    Nice article. It builds nicely for the reader.

    I love seeing the Except clause being introduced to the reader who, like me, cut our teeth on SQL92 and need to learn newer language features to be more productive.

  • Arjun S

    Old Hand

    Points: 371

    Another way to get the result would be to use Intersect and then combine it with Except. The distinct part is handled implicitly.

    --===== Find Customers that bought both "A" AND "B"

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode IN ('A')

    INTERSECT

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode IN ('B')

    EXCEPT

    --===== Find Customers that bought "C".

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode IN ('C')

    ;

  • tommyh

    SSCertifiable

    Points: 6252

    Wouldnt something like this work just as well?

    SELECT

    CustomerId

    FROM #Purchase

    WHERE ProductCode IN ('A','B', 'C')

    GROUP BY CustomerID

    having sum(case when ProductCode = 'A' then 1 else 0 end) > 0

    and sum(case when ProductCode = 'B' then 1 else 0 end) > 0

    and sum(case when ProductCode = 'C' then 1 else 0 end) = 0

    /T

  • ColdCoffee

    SSC-Dedicated

    Points: 39971

    tommyh (3/28/2012)


    Wouldnt something like this work just as well?

    SELECT

    CustomerId

    FROM #Purchase

    WHERE ProductCode IN ('A','B', 'C')

    GROUP BY CustomerID

    having sum(case when ProductCode = 'A' then 1 else 0 end) > 0

    and sum(case when ProductCode = 'B' then 1 else 0 end) > 0

    and sum(case when ProductCode = 'C' then 1 else 0 end) = 0

    /T

    Yep, this is how i once implemented it! Cant remember the thread now.

    Here it is: http://www.sqlservercentral.com/Forums/FindPost1267224.aspx

    Proved to elimate the EXCEPT part; also does a single scan on the table.

  • Koen Verbeeck

    SSC Guru

    Points: 258955

    Nice spackle Jeff, and great alternatives in the discussion!

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • ColdCoffee

    SSC-Dedicated

    Points: 39971

    OH yeah, Nice one Jeff. As always! Thanks a lot for taking time to teach us some of the tricks of the game. Thaks a lot 🙂

  • Toby Harman

    SSCarpal Tunnel

    Points: 4128

    Had a play with this and was very impressed - nice job.

    I played around and added this index

    CREATE INDEX IX_#Purchase_ProductCode

    ON #Purchase (

    ProductCode)

    INCLUDE (CustomerID)

    One concern I had was looking at the query plan, the EXCEPT does a clustered index scan. That may be a function of the small data set, but I am not sure.

    My preference for these types of queries has always been to do an OUTER JOIN on the one we don't want selected and then say that we only want the ones that didn't match returned.

    SELECT a.CustomerID

    FROM #Purchase a

    LEFT OUTER JOIN #Purchase b

    ON b.CustomerID = a.CustomerID

    AND b.ProductCode IN ('C')

    WHERE a.ProductCode IN ('A','B')

    AND b.CustomerID IS NULL

    GROUP BY a.CustomerID

    HAVING COUNT(DISTINCT a.ProductCode) = 2

    Any thoughts?

  • James Dingle-651585

    SSC Veteran

    Points: 212

    "Left Join Where Is Null" gets complicated to read/understand and is subject to cardinalities issues that will impact performance if the right set has many records for 1 record in the left hand side.

    There is also this way, probably the closest to natural language.

    Select Distinct

    CustomerID

    From #Purchase

    Where ProductCode = 'A'

    And CustomerID In

    (

    Select CustomerID

    From #Purchase

    Where ProductCode = 'B'

    )

    And CustomerID Not In

    (

    Select CustomerID

    From #Purchase

    Where ProductCode = 'C'

    )

    I would be interested to have this table filled by 100,000+ rows of sample data and then compare execution plans and statistics of the proposed approaches.

    Comparing syntax without an idea of the performance signature is a bit useless IMHO.

    So here is another version of the initialization script from which the difference will be more blatant:

    --===== Conditionally drop the test table to make

    -- reruns in SSMS easier.

    IF OBJECT_ID('tempdb..#Purchase','U') IS NOT NULL

    DROP TABLE #Purchase

    ;

    --===== Create the test table

    CREATE TABLE #Purchase

    (

    PurchaseID INT IDENTITY(1,1),

    CustomerID INT,

    ProductCode VARCHAR(1000)

    PRIMARY KEY CLUSTERED (PurchaseID)

    )

    ;

    --===== Populate the test table with known data.

    INSERT INTO #Purchase

    (CustomerID, ProductCode)

    ------- Customer #1 precisely meets the criteria.

    -- Bought 'A' and 'B' but not 'C'.

    SELECT 1, 'A' UNION ALL

    SELECT 1, 'B' UNION ALL

    ------- Customer #2 also meets the criteria.

    -- Bought 'A' and 'B' and somthing else,

    -- but not 'C'.

    SELECT 2, 'A' UNION ALL

    SELECT 2, 'B' UNION ALL

    SELECT 2, 'D' UNION ALL

    ------- Customer #3 also meets the criteria.

    -- Bought 'A' and 'B' and something else,

    -- but not 'C'.

    SELECT 3, 'A' UNION ALL

    SELECT 3, 'B' UNION ALL

    SELECT 3, 'D' UNION ALL

    SELECT 3, 'A' UNION ALL

    SELECT 3, 'D' UNION ALL

    ------- Customer #4 doesn't meet the criteria.

    -- Bought 'A' and 'B' but also bought 'C'.

    SELECT 4, 'A' UNION ALL

    SELECT 4, 'B' UNION ALL

    SELECT 4, 'C' UNION ALL

    ------- Customer #5 doesn't meet the criteria.

    -- Bought 'A' and 'B' and something else,

    -- but also bought 'C'.

    SELECT 5, 'A' UNION ALL

    SELECT 5, 'B' UNION ALL

    SELECT 5, 'A' UNION ALL

    SELECT 5, 'B' UNION ALL

    SELECT 5, 'C' UNION ALL

    SELECT 5, 'D' UNION ALL

    ------- Customer #6 doesn't meet the criteria.

    -- Bought more than 1 of 'A' and something else

    -- but not 'B'.

    SELECT 6, 'A' UNION ALL

    SELECT 6, 'A' UNION ALL

    SELECT 6, 'D' UNION ALL

    SELECT 6, 'E' UNION ALL

    ------- Customer #7 doesn't meet the criteria.

    -- Bought more than 1 of 'B' and something else

    -- but not 'A'.

    SELECT 7, 'B' UNION ALL

    SELECT 7, 'B' UNION ALL

    SELECT 7, 'D' UNION ALL

    SELECT 7, 'E'

    go

    declare @i int = 10;

    while @i < 100000

    begin

    insert into #Purchase (CustomerID, ProductCode)

    select CustomerID + @i, ProductCode + Left(ProductCode, 1)

    From #Purchase

    set @i = @i * 2

    end

    Then turn on statistics or open your SQL profiler

    SET STATISTICS IO ON

    And now you're ready.

    Then you will also need an index on ProductCode.

    Create Index IX_ProductCode On #Purchase (ProductCode) Include (CustomerID);

  • chintan.j.gandhi

    SSC Enthusiast

    Points: 127

    How about this? Although i agree query might get longer when there would be more than 3 items

    SELECT DISTINCT CustomerID

    FROM #Purchase P

    WHERE EXISTS (SELECT 1 FROM #Purchase P1

    WHERE P.customerid = P1.customerid

    AND P1.productcode = 'A'

    )

    AND EXISTS (SELECT 1 FROM #Purchase P2

    WHERE P.customerid = P2.customerid

    AND P2.productcode = 'B'

    )

    AND NOT EXISTS (SELECT 1 FROM #Purchase P3

    WHERE P.customerid = P3.customerid

    AND P3.productcode = 'C'

    )

  • ColdCoffee

    SSC-Dedicated

    Points: 39971

    chintan.j.gandhi (3/29/2012)


    How about this? Although i agree query might get longer when there would be more than 3 items

    SELECT DISTINCT CustomerID

    FROM #Purchase P

    WHERE EXISTS (SELECT 1 FROM #Purchase P1

    WHERE P.customerid = P1.customerid

    AND P1.productcode = 'A'

    )

    AND EXISTS (SELECT 1 FROM #Purchase P2

    WHERE P.customerid = P2.customerid

    AND P2.productcode = 'B'

    )

    AND NOT EXISTS (SELECT 1 FROM #Purchase P3

    WHERE P.customerid = P3.customerid

    AND P3.productcode = 'C'

    )

    YOu can combine A and B code chunks and leave the NOT EXISTS as it, reducing it tp 2x slower than the others.

  • Michael Lysons

    SSCertifiable

    Points: 6472

    Great article and discussion. Learned some new stuff, got some "food for thought" - nice one, peeps.

  • Dalibor Margotic

    Mr or Mrs. 500

    Points: 521

    You need to eliminate hash join and Index scan.

    With this query you can do that using INTERSECT AND EXCEPT

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode = 'A'

    INTERSECT

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode = 'B'

    EXCEPT

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode = 'C'

    Or, this one using CTE AND JOINS:

    ;WITH CTE AS

    (

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode = 'A'

    GROUP BY CustomerID

    )

    , CTE2 AS

    (

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode = 'B'

    GROUP BY CustomerID

    )

    , CTE3 AS

    (

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode = 'C'

    GROUP BY CustomerID

    )

    select CTE.CustomerID from CTE

    left join CTE3 ON

    CTE.CustomerID = CTE3.CustomerID

    where CTE.CustomerID in (SELECT CustomerID FROM CTE2)

    AND CTE3.CustomerID IS NULL

    GROUP BY CTE.CustomerID

  • Koen Verbeeck

    SSC Guru

    Points: 258955

    Dalibor Margotic (3/29/2012)


    You need to eliminate hash join and Index scan.

    With this query you can do that using INTERSECT AND EXCEPT

    ...

    Or, this one using CTE AND JOINS:

    ...

    In the CTE and JOINS query, I would replace the IN clause with an EXISTS clause.

    But that could be personal preference.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Johan Bijnens

    SSC Guru

    Points: 134265

    Another great SQL Spackle, Jeff :w00t:

    Koen Verbeeck (3/29/2012)


    Dalibor Margotic (3/29/2012)


    You need to eliminate hash join and Index scan.

    With this query you can do that using INTERSECT AND EXCEPT

    ...

    Or, this one using CTE AND JOINS:

    ...

    In the CTE and JOINS query, I would replace the IN clause with an EXISTS clause.

    But that could be personal preference.

    One should certainly replace the IN by EXISTS or a left join adding a where clause containing the "outer-joined object key " is null

    or at least compare their performance 😉

    interesting series to read on this matter:

    - http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/

    Johan


    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt ?

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me[/url] :alien: but most of the time this is me :hehe:

Viewing 15 posts - 1 through 15 (of 167 total)

You must be logged in to reply to this topic. Login to reply