Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

  • ben.norris

    Valued Member

    Points: 58

    The Wizard Of Oz (6/6/2014)


    ben.norris (6/6/2014)


    The Wizard Of Oz (6/6/2014)


    There's no dependance on ordering, so no cheating going on. It's a natural part of MIN() and MAX() functions.

    It would be strange if someone caught SwePeso pulling a fast one, but he's "quite good with his SQL" to put it lightly 😀

    Are you sure? It looks to me that if C was between B and A then it wouldn't exclude customers who bought C

    If we are ever unsure about how some code will behave, we can always test out assumptions on an actual server:

    Below I have 3 customers with A, B and C "ordered" differently, but the MIN() and MAX() functions pull out the correct results irrespective of the orderings.

    WITH TestTable

    AS (

    SELECT 1 AS CustomerID, ProductID

    FROM (

    VALUES ('A'), ('C'), ('B')

    ) x(ProductID)

    UNION ALL

    SELECT 2 AS CustomerID, ProductID

    FROM (

    VALUES ('A'), ('B'), ('C')

    ) x(ProductID)

    UNION ALL

    SELECT 3 AS CustomerID, ProductID

    FROM (

    VALUES ('C'), ('A'), ('B')

    ) x(ProductID)

    )

    SELECT CustomerID, MIN(ProductID) AS MinProductID, MAX(ProductID) AS MaxProductID

    FROM TestTable

    GROUP BY CustomerID

    No you misunderstand, I don't mean the ordering of values within the table. Check the code for example to find customers who bought 'A' and 'C' but not 'B', it doesn't work for that.

  • valeryk2000

    SSCarpal Tunnel

    Points: 4237

    To craig 81366: Thank you, 81366. Certainly the problem can be easily solved with cte. I just wanted to make sure that order by does not work with intersect - or something wrong with my app

    2000

  • Jonathan AC Roberts

    SSCoach

    Points: 16928

    Alexander-449406 (6/6/2014)


    I have generated test data with 1 000 000 000 000 000 000 000 rows and according to my tests all the queries proposed run within 1 second. You guys can choose the query that has fewer characters!

    That is some sort of joke, right? :ermm: I'm struggling to find what's humorous about it. I take it writing jokes isn't your day job?

  • The Wizard Of Oz

    SSC Eights!

    Points: 845

    ben.norris (6/6/2014)

    No you misunderstand, I don't mean the ordering of values within the table. Check the code for example to find customers who bought 'A' and 'C' but not 'B', it doesn't work for that.

    Ohhhhh, I get what you mean now!

    Well, the scope of the question was clearly "A and B but not C", and yes SwePeso's code depends on C coming after B, but I don't think that's cheating, considering it gives the right answer to the question 🙂

    If you wanted "A and C but not B", you could change the HAVING filters to look for MIN(A), MAX(C) and COUNT(*) = 2 😉

  • craig 81366

    SSC Eights!

    Points: 808

    valeryk2000 (6/6/2014)


    To craig 81366: Thank you, 81366. Certainly the problem can be easily solved with cte. I just wanted to make sure that order by does not work with intersect - or something wrong with my app

    2000

    Do you have an implicit type conversion?

    The following makes mention of the same kind of problem you're experiencing.

    http://connect.microsoft.com/SQLServer/feedback/details/630087/intersect-with-order-by-abends-when-data-types-are-implicitly-converted-on-right-hand-side

    The good news is maybe you just need to install a patch.

  • craig 81366

    SSC Eights!

    Points: 808

    The Wizard Of Oz (6/6/2014)


    ben.norris (6/6/2014)

    No you misunderstand, I don't mean the ordering of values within the table. Check the code for example to find customers who bought 'A' and 'C' but not 'B', it doesn't work for that.

    Ohhhhh, I get what you mean now!

    Well, the scope of the question was clearly "A and B but not C", and yes SwePeso's code depends on C coming after B, but I don't think that's cheating, considering it gives the right answer to the question 🙂

    If you wanted "A and C but not B", you could change the HAVING filters to look for MIN(A), MAX(C) and COUNT(*) = 2 😉

    Not quite. This would exclude customers who also bought product D, or bought multiple of A.

  • Naomi N

    Say Hey Kid

    Points: 695

    I would suspect this code to run faster than count distinct. Can you perform tests on a big table?

  • The Wizard Of Oz

    SSC Eights!

    Points: 845

    craig 81366 (6/6/2014)


    The Wizard Of Oz (6/6/2014)


    ben.norris (6/6/2014)

    No you misunderstand, I don't mean the ordering of values within the table. Check the code for example to find customers who bought 'A' and 'C' but not 'B', it doesn't work for that.

    Ohhhhh, I get what you mean now!

    Well, the scope of the question was clearly "A and B but not C", and yes SwePeso's code depends on C coming after B, but I don't think that's cheating, considering it gives the right answer to the question 🙂

    If you wanted "A and C but not B", you could change the HAVING filters to look for MIN(A), MAX(C) and COUNT(*) = 2 😉

    Not quite. This would exclude customers who also bought product D, or bought multiple of A.

    D gets filtered out before the HAVING, but you're right about the multiple A's.

    Code should actually be:

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode IN ('A', 'B', 'C')

    GROUP BY CustomerID

    HAVING MIN(ProductCode) = 'A'

    AND MAX(ProductCode) = 'C'

    AND COUNT(DISTINCT ProductCode) = 2;

  • Sean Lange

    SSC Guru

    Points: 286469

    valeryk2000 (6/6/2014)


    I have two tables with patient ID - one contains several records for one ID (tblLetterFlag), in the second it is a primary key.

    Please next time start your own thread instead of tossing a completely different issue into the middle of a discussion about something completely different.

    And NEVER sort by ordinal position, use the column name. 😎

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Naomi N

    Say Hey Kid

    Points: 695

    Also, I have an article on a similar topic which may be of interest and it's related to this topic

    http://social.technet.microsoft.com/wiki/contents/articles/22165.t-sql-relational-division.aspx

  • craig 81366

    SSC Eights!

    Points: 808

    A few answers have suggested first pivoting the data into a column per product, counting how many of each product were bought.

    This makes it rather trivial to write a succint WHERE clause to obtain data matching the desired combination of rules. E.g.

    WHERE [A] > 0 AND > 0 AND [C] = 0

    More complex rules are also easy to achieve.

    E.g. Customers that bought:

    * 5 to 9 of A

    * Any number of B or C, but not both.

    * And did not buy D

    This would use the following where clause:

    WHERE (A BETWEEN 5 AND 9)

    AND (B > 0 OR C > 0)

    AND (B = 0 OR C = 0)

    AND (D = 0)

    As for performance, I tested the following query against the list Jeff tested 2 years ago:

    ;WITH PivotTable AS (

    SELECT CustomerID, [A], , [C]

    FROM (

    SELECT CustomerID, ProductCode

    FROM #Purchase

    WHERE ProductCode IN ('A', 'B', 'C')

    ) AS src

    PIVOT

    (

    COUNT(ProductCode)

    FOR ProductCode IN ([A], , [C])

    ) AS pvt

    )

    SELECT CustomerID

    FROM PivotTable

    WHERE [A] > 0 AND > 0 AND [C] = 0

    It performed in the same ballpark as the majority of the existing queries, but lagged behind the top performers.

    Its main benefit is the ability to stipulate complex rules very succinctly.

  • danielk1

    SSC-Addicted

    Points: 462

    This is the way I would have done it and it feels a lot cleaner and easier to read than the example. IMHO

    Arjun S (3/28/2012)


    Another way to get the result would be to use Intersect and then combine it with Except. The distinct part is handled implicitly.

    --===== Find Customers that bought both "A" AND "B"

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode IN ('A')

    INTERSECT

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode IN ('B')

    EXCEPT

    --===== Find Customers that bought "C".

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode IN ('C')

    ;

  • valeryk2000

    SSCarpal Tunnel

    Points: 4237

    Ordering by column name does not help

  • Sean Lange

    SSC Guru

    Points: 286469

    valeryk2000 (6/6/2014)


    Ordering by column name does not help

    It isn't a logic thing. The reason you sort by column name is because if you order by ordinal position and your query changes you have to change your order by also.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • SwePeso

    SSC-Dedicated

    Points: 39693

    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode IN ('A', 'B', 'C')

    GROUP BY CustomerID

    HAVING MIN(ProductCode) = 'A'

    AND MAX(ProductCode) = 'C'

    AND SUM(CASE WHEN ProductCode = 'B' THEN 1 ELSE 0 END) = 0;

    Avoid DISTINCT. It is a huge performance killer.


    N 56°04'39.16"
    E 12°55'05.25"

Viewing 15 posts - 106 through 120 (of 167 total)

You must be logged in to reply to this topic. Login to reply