Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle)

  • SwePeso (6/6/2014)


    SELECT CustomerID

    FROM #Purchase

    WHERE ProductCode IN ('A', 'B', 'C')

    GROUP BY CustomerID

    HAVING MIN(ProductCode) = 'A'

    AND MAX(ProductCode) = 'C'

    AND SUM(CASE WHEN ProductCode = 'B' THEN 1 ELSE 0 END) = 0;

    Avoid DISTINCT. It is a huge performance killer.

    Nifty trick, tack PeSo πŸ™‚

  • Thank you for this filler Jeff, good read as always, and amazing thread! Only ran your initial code, any thoughts on how to eliminate the second scan?

    😎

  • Peter,

    I think we wanted customers who bought A and B but not C, so your query will be even simpler,

    MIN(productID) = 'A' and MAX(productID) = 'B'

  • Naomi N (6/6/2014)


    Peter,

    I think we wanted customers who bought A and B but not C, so your query will be even simpler,

    MIN(productID) = 'A' and MAX(productID) = 'B'

    What if we had a product 'D'?

    Or if we wanted customers who bought A and C but not B?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (6/6/2014)


    Naomi N (6/6/2014)


    Peter,

    I think we wanted customers who bought A and B but not C, so your query will be even simpler,

    MIN(productID) = 'A' and MAX(productID) = 'B'

    What if we had a product 'D'?

    Or if we wanted customers who bought A and C but not B?

    SUM(case ) solution will always work and was already proposed. Basically, if we want A & B and not C, then

    select CustomerID

    from Orders

    WHERE ProductID IN ('A','B','C)

    GROUP BY CustomerID

    HAVING MIN(ProductID) = 'A' and MAX(ProductID) = 'B'

    ---------------

    If we want A & C, but not B, then add SUM(case ...) condition

    In any case, using these simple algorithms will be more efficient than using COUNT(distinct)

    --------------

    If you check my article T-SQL Relational Division it solves more complex problem. In my case I was trying to find customers who made same (or almost the same) orders regardless of what products they bought.

  • Naomi N (6/6/2014)


    Luis Cazares (6/6/2014)


    Naomi N (6/6/2014)


    Peter,

    I think we wanted customers who bought A and B but not C, so your query will be even simpler,

    MIN(productID) = 'A' and MAX(productID) = 'B'

    What if we had a product 'D'?

    Or if we wanted customers who bought A and C but not B?

    SUM(case ) solution will always work and was already proposed. Basically, if we want A & B and not C, then

    select CustomerID

    from Orders

    WHERE ProductID IN ('A','B','C)

    GROUP BY CustomerID

    HAVING MIN(ProductID) = 'A' and MAX(ProductID) = 'B'

    ---------------

    If we want A & C, but not B, then add SUM(case ...) condition

    In any case, using these simple algorithms will be more efficient than using COUNT(distinct)

    --------------

    If you check my article T-SQL Relational Division it solves more complex problem. In my case I was trying to find customers who made same (or almost the same) orders regardless of what products they bought.

    Yes, but not more efficient than the intersect/except clauses proposed previously on this discussion. πŸ˜‰

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Naomi N (6/6/2014)


    Peter,

    I think we wanted customers who bought A and B but not C, so your query will be even simpler,

    MIN(productID) = 'A' and MAX(productID) = 'B'

    See page 10.


    N 56Β°04'39.16"
    E 12Β°55'05.25"

  • SwePeso (6/6/2014)


    Naomi N (6/6/2014)


    Peter,

    I think we wanted customers who bought A and B but not C, so your query will be even simpler,

    MIN(productID) = 'A' and MAX(productID) = 'B'

    See page 10.

    Page 10? I only have 3 pages. πŸ˜‰

  • Lynn Pettis (6/6/2014)


    SwePeso (6/6/2014)


    Naomi N (6/6/2014)


    Peter,

    I think we wanted customers who bought A and B but not C, so your query will be even simpler,

    MIN(productID) = 'A' and MAX(productID) = 'B'

    See page 10.

    Page 10? I only have 3 pages. πŸ˜‰

    I have 5. :w00t:

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Sean Lange (6/6/2014)


    Lynn Pettis (6/6/2014)


    SwePeso (6/6/2014)


    Naomi N (6/6/2014)


    Peter,

    I think we wanted customers who bought A and B but not C, so your query will be even simpler,

    MIN(productID) = 'A' and MAX(productID) = 'B'

    See page 10.

    Page 10? I only have 3 pages. πŸ˜‰

    I have 5. :w00t:

    I have 13 πŸ™‚ BTW, is there a way to receive only 'interesting' replies notifications, not every silly reply? πŸ™‚ Or include the actual text of the message in the notification?

  • How about using bitwise checking?

    select customerid,

    sum(distinct case

    when productcode='A' then 1

    when productcode='B' then 2

    when productcode='C' then 4

    else 0 end)

    from #purchase

    group by customerid

    having sum(distinct case

    when productcode='A' then 1

    when productcode='B' then 2

    when productcode='C' then 4

    else 0 end) = 3

    πŸ™‚

  • I think a lot of people seem to focus purely on performance, a good solution to a given issue considers many factors, NOT just performance, people who focus on just the performance aspect tend to write vastly more code and it eventually gets messy, I've seen this many times, stored procedures with vast swathes of if/then/else statements separating complete SQL statements and every time someone requests something even slightly different a new if/then/else and query is added, eventually leading to exceedingly messy code. If you need to add another column to the output you’re changing many SQL statements.

    The truth is sometimes a little bit of performance needs to be sacrificed for cleaner and more flexible code, it will make your life easier and unless there is a vast difference the user/consumer won't care. What they will care about is how long it takes to change the query to return the data with slightly different criteria. I have worked as a contractor for many years and when I see these huge if/then/else structures, as I have many times, I do not think the developer was all that clever at all, unless there was absolutely no choice, they have just made it harder for themselves and the next guy who looks at the code. And before anyone has the idea of building these queries dynamically to cater for different input parameters, just be aware this is the absolute last resort because it is inefficient and the query will be compiled when executed and dependencies cannot be considered by SQL Server, not when you deploy the procedure, so add that compile time and the building of the query to your performance and see where it stands.

    All of the solutions here are valid in specific situations. If this query were behind a report there is a distinct possibility that just returning the CustomerID would not be enough and they would want some figures and some customer information returned, never forget the end goal nor the people who will be consuming the data, who will ALWAYS ask for the same thing but sliced slightly differently tomorrow. How long is it going to be before someone asks for a filter on product "E" or where β€œC” is greater than 5 but less than 10? They would/should expect these things to be parameterised, not have to ask for a new report. Who wants to write 10 reports when they could write 1 with parameters which allow some flexibility, I like my life and the life of the guy who looks at the code after me to be easy not hard.

    This problem is obviously one which deals with sets of data so the most obvious solution is to use INTERSECT/EXCEPT/UNION/ETC..., the set operators but is this the optimal solution, well maybe, depends what you are going to do with the results and how flexible the criteria needs to be.

    The trick is to weigh up the pros/cons of all the possible solutions and pick the best one for the given situation, NOT to just say this one is faster or this one has less characters to type so I'm going to use that. You should always come up with more than one solution to a given problem if possible, this problem has many solutions, as we have seen. The right one is the one which considers all the factors of good coding and serviceability, which is always completely dependent on the situation/usage.

    Many times I have had to pull up recent graduates and ask them what they are going to do when the user asks for something slightly different, the response is always "but the spec says they want this", you need look past specifications and add some real world knowledge to the solution, give them what they want but also consider what they are likely to want tomorrow. That's why I threw in the PIVOT example, which is quite out of the square for the given problem and the other solutions which don’t use the set operators are excellent, the obvious solution isn’t always the right solution, under certain circumstances, not all circumstances, these could prove to be the right solution. The PIVOT solution would be vastly easier to alter the criteria for and a hell of a lot easier to read than say 10 queries separated by set operators. The downside is the number of columns for the pivot are set and the performance is not quite as good. Like I said, weigh up the pros/cons and choose the right solution which will work best in the long run for you considering all the factors, NEVER just performance.

    Sorry for being so verbose πŸ™‚

  • SimonC, I have to disagree with you here.

    Even if the PIVOT solution seem to work well, it doesn't scale well. And that is one of the cardinal sins junior developers tend to make. If you make a PIVOT solution in your test environment (which may have 100 or 1000 rows), you will halt the server when deploying your solution to the production server which may have a million or more rows.

    I have seen this over and over again. And then some more times again.

    It's a pity you despise performance written code, in favor of "simpler to understand code". With that mindset and attitude, junior developers will never be good developers, and some of them have potential to be excellent developers.

    Why don't give them a chance to evolve and actually write code that has a chance in the real world?

    The PIVOT solution you gave is also specific for this problem statement. If you want a "one solution rule them all", you would need to learn relational division (which happens to be one of my specialities). The problem with the PIVOT solution is that you introduce the three-tiered logic with NULL, since PIVOT aggregates NULL for unknown values, not 0. If you write the PIVOT solution with the good old-fashioned SUM(CASE ... WHEN ... THEN ... END), you are safe for that to, but then the solution is so close to the other solutions.

    My $0.02


    N 56Β°04'39.16"
    E 12Β°55'05.25"

  • Hi SSCrazy,

    Thanks for your input.

    Actually, I didn't say that I don't like performance written code what I am getting at is that performance is not the only consideration, sacrificing a few milli-seconds of performance for code that's readable and more flexible is an acceptable trade-off. And I don't despise junior developers, don't know where you got that from that's a bit extreme, I just want them, and everyone else, to consider more than one solution to a given problem and that they should look at other alternatives as well, not just the most obvious and choose the best for the given situation taking into account not just performance. Performance is a major consideration if there is a vast difference.

    eg) using 200 lines of code that runs 10 milliseconds faster than 10 lines of readable and flexible code behind a report that is run maybe twice a day is not a good thing, I'll take the 10 lines of code any day. Like I said it all depends on the situation.

    As for the PIVOT example, I didn't say it was the β€œone solution to rule them all”, there isn’t one, just another alternative, like Relational Division and Set Operators all should be considered and the most appropriate chosen for the given situation.

    I did do the test on a million rows, however no query is an island and in a production situation has to contend with other process/queries running at the same time, so in a given situation you may be correct, this would need to be tested. I ran this on my home computer with nothing else running at the same time but I'm thinking so did everyone else, I doubt anyone ran their examples on their production server.

  • Hello All

    I'm just wondering, is there anything wrong with doing it this way:

    SELECT DISTINCT

    P.CustomerID

    FROM #Purchase P

    LEFT JOIN (

    SELECT

    P.CustomerID

    FROM Purchase P

    WHERE

    P.ProductCode = 'C'

    ) NP ON NP.CustomerID = P.CustomerID

    WHERE

    P.ProductCode IN ( 'A', 'B' )

    AND NP.CustomerID IS NULL

    This seems a lot simpler to me and is the way I would probably have done it but are the performance implications in comparison with other methods demonstrated?

    Thanks

    Robin

Viewing 15 posts - 121 through 135 (of 166 total)

You must be logged in to reply to this topic. Login to reply