Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

  • This one gives you even more flexibility to filter on/output counts of each product:

    SELECT CustomerID, cnt_A, cnt_B, cnt_C

    FROM (

    SELECT

    CustomerID

    , SUM(CASE ProductCode WHEN 'A' THEN 1 ELSE 0 END) AS cnt_A

    , SUM(CASE ProductCode WHEN 'B' THEN 1 ELSE 0 END) AS cnt_B

    , SUM(CASE ProductCode WHEN 'C' THEN 1 ELSE 0 END) AS cnt_C

    FROM #Purchase

    WHERE ProductCode IN ('A', 'B', 'C')

    GROUP BY CustomerID

    ) t

    WHERE cnt_A > 0 AND cnt_B > 0 AND cnt_C = 0

  • @jeff

    Thanks for testing.


    @spam 17205

    I think a simpler version might be

    SELECT

    CustomerID

    , SUM(CASE ProductCode WHEN 'A' THEN 1 ELSE 0 END) AS cnt_A

    , SUM(CASE ProductCode WHEN 'B' THEN 1 ELSE 0 END) AS cnt_B

    , SUM(CASE ProductCode WHEN 'C' THEN 1 ELSE 0 END) AS cnt_C

    FROM #Purchase

    WHERE ProductCode IN ('A', 'B', 'C')

    GROUP BY CustomerID

    Having cnt_A > 0 AND cnt_B > 0 AND cnt_C = 0

    Or to let the inner/initial query just do the string-to-number translation.

    SELECT CustomerID

    , SUM(Case_A) AS cnt_A

    , SUM(Case_B) AS cnt_B

    , SUM(Case_C) AS cnt_C

    FROM (

    SELECT

    CustomerID

    , CASE ProductCode WHEN 'A' THEN 1 ELSE 0 END AS Case_A

    , CASE ProductCode WHEN 'B' THEN 1 ELSE 0 END AS Case_B

    , CASE ProductCode WHEN 'C' THEN 1 ELSE 0 END AS Case_C

    FROM #Purchase

    WHERE ProductCode IN ('A', 'B', 'C')

    ) t

    WHERE cnt_A <> 0 AND cnt_B <> 0 AND cnt_C = 0

    GROUP BY CustomerID

  • select customer,

    sum(case where product = 'A' then 1 else 0 end) as A,

    sum(case where product = 'B' then 1 else 0 end) as B,

    sum(case where product = 'C' then 1 else 0 end) as C

    from

    group by customer

    having sum(case where product = 'A' then 1 else 0 end) > 1

    and sum(case where product = 'B' then 1 else 0 end) > 1

    and sum(case where product = 'C' then 1 else 0 end) = 0

  • Might I suggest another, old school, alternative that seems to run a little bit faster, which uses a sub-query to eliminate those customers who all ready purchased item "C".

    Although CPU time difference is not very great, the difference is much bigger for the elapsed time.

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode IN ('A','B')

    AND CustomerID NOT IN (SELECT DISTINCT CustomerID FROM #Purchase AS P WHERE ProductCode IN ('C'))

    GROUP BY CustomerID

    HAVING COUNT(DISTINCT ProductCode) = 2

    Here are the statistics:

    Find Customers that did not buy "C" from article

    (6673 row(s) affected)

    SQL Server Execution Times:

    CPU time = 94 ms, elapsed time = 265 ms.

    Find Customers that did not buy "C" using subquery

    (6673 row(s) affected)

    SQL Server Execution Times:

    CPU time = 93 ms, elapsed time = 202 ms.

  • Berl (3/30/2012)


    Although CPU time difference is not very great, the difference is much bigger for the elapsed time.

    Elapsed time includes wait time and is not a good measure of performance.

  • Dean Cochrane (3/30/2012)


    Berl (3/30/2012)


    Although CPU time difference is not very great, the difference is much bigger for the elapsed time.

    Elapsed time includes wait time and is not a good measure of performance.

    Actually, it is. The wait time you speak of also includes I/O time and that can be a major part of a performance problem. I do agree, though, that you really need to take "time to display" out of the picture because it will severly mask problems. It takes a certain amount of time to display a million rows and can make differences between two methods seem much more trivial than they might be.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thank you Jeff.

  • Imho

    SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN

    (SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode = 'C')))

    GROUP BY CustomerID, ProductCode

    HAVING (COUNT(CustomerID) = 1)

    Regards

    Mike

  • michal.lisinski (4/2/2012)


    Imho

    SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN

    (SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode = 'C')))

    GROUP BY CustomerID, ProductCode

    HAVING (COUNT(CustomerID) = 1)

    Regards

    Mike

    This also will select customers who have purchased just one of 'A' or 'B' and not purchased 'C'. the requirement was that they should have purchased both 'A' and 'B' and not purchased 'C'.

  • Jeff Moden (3/31/2012)


    Dean Cochrane (3/30/2012)


    Berl (3/30/2012)


    Although CPU time difference is not very great, the difference is much bigger for the elapsed time.

    Elapsed time includes wait time and is not a good measure of performance.

    Actually, it is. The wait time you speak of also includes I/O time and that can be a major part of a performance problem. I do agree, though, that you really need to take "time to display" out of the picture because it will severly mask problems. It takes a certain amount of time to display a million rows and can make differences between two methods seem much more trivial than they might be.

    Well depends, the elapsed time can an ok performance indicator, however when you run the queries for testing back to back in the same batch without clearing the buffers you may get a false sense of performance from elapsed time and many other stats. I have a local test SQL Server which I always test by stopping the instance then starting so I can be sure everything is clear for testing. But the query plans, so I would be carefull in saying one scenario over another is truely best if you compare in a single batch. Now all that said, a huge difference in elapsed time shows a performance change and when I say huge took 30minutes to run first time then 30 seconds with second query.

  • Hmm, r u sure? Customer no 6 took product A and no B, customer 7 took product B and no A, both didn't take C, but query doesn't return them.

    Regards

    Mike

  • Jonathan AC Roberts (4/2/2012)


    michal.lisinski (4/2/2012)


    Imho

    SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN

    (SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode = 'C')))

    GROUP BY CustomerID, ProductCode

    HAVING (COUNT(CustomerID) = 1)

    Regards

    Mike

    This also will select customers who have purchased just one of 'A' or 'B' and not purchased 'C'. the requirement was that they should have purchased both 'A' and 'B' and not purchased 'C'.

  • Jonathan AC Roberts (4/2/2012)


    michal.lisinski (4/2/2012)


    Imho

    SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN

    (SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode = 'C')))

    GROUP BY CustomerID, ProductCode

    HAVING (COUNT(CustomerID) = 1)

    Regards

    Mike

    This also will select customers who have purchased just one of 'A' or 'B' and not purchased 'C'. the requirement was that they should have purchased both 'A' and 'B' and not purchased 'C'.

    Hmm, r u sure? Customer no 6 took product A and no B, customer 7 took product B and no A, both didn't take C, but query doesn't return them.

    Regards

    Mike

  • michal.lisinski (4/2/2012)


    Jonathan AC Roberts (4/2/2012)


    michal.lisinski (4/2/2012)


    Imho

    SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN

    (SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode = 'C')))

    GROUP BY CustomerID, ProductCode

    HAVING (COUNT(CustomerID) = 1)

    Regards

    Mike

    This also will select customers who have purchased just one of 'A' or 'B' and not purchased 'C'. the requirement was that they should have purchased both 'A' and 'B' and not purchased 'C'.

    Hmm, r u sure? Customer no 6 took product A and no B, customer 7 took product B and no A, both didn't take C, but query doesn't return them.

    Regards

    Mike

    Hi Mike,

    I tried the following on your code:

    ;WITH [#Purchase] AS

    (

    SELECT 'A' ProductCode, 1 CustomerId UNION ALL

    SELECT 'B' ProductCode, 1 CustomerId UNION ALL

    SELECT 'A' ProductCode, 2 CustomerId UNION ALL

    SELECT 'B' ProductCode, 3 CustomerId UNION ALL

    SELECT 'A' ProductCode, 4 CustomerId UNION ALL

    SELECT 'A' ProductCode, 4 CustomerId UNION ALL

    SELECT 'A' ProductCode, 5 CustomerId UNION ALL

    SELECT 'B' ProductCode, 5 CustomerId UNION ALL

    SELECT 'C' ProductCode, 5 CustomerId

    )

    SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode IN ('A','B')) AND (CustomerID NOT IN

    (SELECT DISTINCT CustomerID

    FROM [#Purchase]

    WHERE (ProductCode = 'C')))

    GROUP BY CustomerID, ProductCode

    HAVING (COUNT(CustomerID) = 1)

    Customer 1 has bought 'A' and 'B' and not 'C' and he gets found correctly

    Customer 2 has bought 'A' and not 'B' and not 'C' and he gets found incorrectly

    Customer 3 has bought not 'A' and 'B' and not 'C' and he gets found incorrectly

    Customer 4 has bought two 'A' and not 'B' and not 'C' and he doesn't get found correctly

    Customer 5 has bought 'A' and 'B' and 'C' and he doesn't get found correctly

  • select * from

    (

    select id,max(case when cd = 'A' then 1 else 0 end) + max(case when cd = 'B' then 2 else 0 end) + max(case when cd = 'C' then 4 else 0 end) status from a

    --where cd in ('A','B','C')

    group by id

    ) dta

    where

    1=1

    --and status = 3 -- a,b but not c

    --and status = 5--does have a,c but not b

    --and status = 6--does have b,c but not a

    --and status = 7 --all 3

    --and status = 0 --none of them

    --and status = 1 -- just A

    --and status = 2 -- just b

    and status = 4 -- just C

    Replace Id with CustomerId and cd with ProductCode. You can generalize the code the way you want..You can use analytic function instead of group by as well.

    GulliMeel

    Finding top n Worst Performing queries[/url]
    Improve the performance of Merge Join(special case)
    How to Post Performance Problem -Gail Shaw[/url]

Viewing 15 posts - 76 through 90 (of 166 total)

You must be logged in to reply to this topic. Login to reply