Recent PostsRecent Posts Popular TopicsPopular Topics
 Home Search Members Calendar Who's On

 Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle) Rate Topic Display Mode Topic Options
Author
 Message
 Posted Friday, May 11, 2012 3:24 AM
 SSC Veteran Group: General Forum Members Last Login: Sunday, June 23, 2013 10:39 PM Points: 231, Visits: 506
 select * from (select id,max(case when cd = 'A' then 1 else 0 end) + max(case when cd = 'B' then 2 else 0 end) + max(case when cd = 'C' then 4 else 0 end) status from a--where cd in ('A','B','C')group by id) dtawhere 1=1--and status = 3 -- a,b but not c--and status = 5--does have a,c but not b--and status = 6--does have b,c but not a--and status = 7 --all 3--and status = 0 --none of them --and status = 1 -- just A--and status = 2 -- just band status = 4 -- just CChange the id with CustomerId ,cd with ProductCode. You are done.You can use anaytical function using parttion instead of group if you need all the columns from the table. You can use the same concept and could get the results for any combination of product codes.
Post #1298429
 Posted Friday, May 11, 2012 8:37 AM
 SSC-Dedicated Group: General Forum Members Last Login: Today @ 10:05 AM Points: 35,529, Visits: 32,112
 Gullimeel (5/11/2012)select * from (select id,max(case when cd = 'A' then 1 else 0 end) + max(case when cd = 'B' then 2 else 0 end) + max(case when cd = 'C' then 4 else 0 end) status from a--where cd in ('A','B','C')group by id) dtawhere 1=1--and status = 3 -- a,b but not c--and status = 5--does have a,c but not b--and status = 6--does have b,c but not a--and status = 7 --all 3--and status = 0 --none of them --and status = 1 -- just A--and status = 2 -- just band status = 4 -- just CChange the id with CustomerId ,cd with ProductCode. You are done.You can use anaytical function using parttion instead of group if you need all the columns from the table. You can use the same concept and could get the results for any combination of product codes.Ummm... that's real nice but what builds the status column? Some magical analytical function that you haven't shown us? --Jeff Moden"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column." (play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013Helpful Links:How to post code problemsHow to post performance problems
Post #1298629
 Posted Friday, May 11, 2012 9:42 AM
 SSC Veteran Group: General Forum Members Last Login: Sunday, June 23, 2013 10:39 PM Points: 231, Visits: 506
 Hi JeffIt is just sum of 2 power (0 for A, 1 for B ,2 for C,3 for D ..and so on)..Thus if you have all 3 codes then 2* 0 + 2* 1 + 2*2 = 1 + 2 + 4. = 7if any of the code is missing the value for that will be 0. That way you know which code is there and which one are missing... Consider these as bits and bits are set to 1for the codes which are there for a given customer id and for others these are set to 0.Hopefully, this will help..
Post #1298708
 Posted Friday, May 11, 2012 10:23 AM
 SSC Veteran Group: General Forum Members Last Login: Sunday, June 23, 2013 10:39 PM Points: 231, Visits: 506
 Below is the sql to show how much cpu time and logical IO's it takes to find the customers who bought A,B but not C.Also, I have done a performance comparison when you want to find the details from Purchase tablefor the customer who bought A, B but not C.The method using parttion by which eliminates join with purchase column is way too expensive...Thus before using any new method benchmark it against existing best solution... Just to make sure which method is better..Follow one rule..Benchmark benchmark Benchmark.....`use tempdbgo/* --uncomment this to create purchase table.. This script to generate the 1 million rows is --created by Jeff Moden on sql server central . com--===== This code takes about 23 seconds to run on a 10 year old, -- single CPU desktop machine.--===== Conditionally drop the test table to make reruns in SSMS easier. -- This is NOT a part of the solution. IF OBJECT_ID('tempdb..Purchase','U') IS NOT NULL DROP TABLE Purchase;--===== Create and populate the test table. -- This is NOT a part of the solution. SELECT TOP (1000000) PurchaseID = IDENTITY(INT,1,1), CustomerID = ABS(CHECKSUM(NEWID())) % 50000 + 1, ProductCode = CHAR(ABS(CHECKSUM(NEWID())) % 26 + 65) INTO Purchase FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2;--===== Add the expected PK ALTER TABLE Purchase ADD PRIMARY KEY CLUSTERED (PurchaseID);--===== Can you guess which index will be used? CREATE INDEX IX_Purchase_CustomerID_ProductCode ON Purchase (CustomerID, ProductCode); CREATE INDEX IX_Purchase_ProductCode_CustomerID ON Purchase (ProductCode, CustomerID) update statistics Purchase with fullscan*/ --- find only the customer ids who bought a,b but not cwith cte1as(select CustomerId,MAX(case when ProductCode = 'A' then 1 else 0 end) +max(case when ProductCode = 'B' then 2 else 0 end) + max(case when ProductCode = 'C' then 4 else 0 end ) as statusfrom Purchasegroup by CustomerID)select * from cte1 where status = 3--Above uses a non clustered index scan and clocked 1867 logical IO and around 800 ms cpu and elapsed time.--query to get the details of customers along with product code and using join back to the original table..with cte1as(select CustomerId,MAX(case when ProductCode = 'A' then 1 else 0 end) +max(case when ProductCode = 'B' then 2 else 0 end) + max(case when ProductCode = 'C' then 4 else 0 end ) as statusfrom Purchasegroup by CustomerID)select p.CustomerId,p.ProductCode,cte1.status from cte1 inner join purchase p on cte1.CustomerId = p.CustomerIdwhere status = 3and p.ProductCode not in ('C'); --as we --It clocked 3734 Io's and 1200 ms CPU time and 1500 ms of elapsed time for 137 K rows..This is quite fast--This is using without join but using parttion by ..This is performaing badly.. --Thus do not try the new methods just to make the statement simple..--Always benchmark with the best method you have with any new method which is introduced in the new versions..--It is clocking 2.2 million IO's approx...Which is way too high 7533 ms CPU time... and 2853 ms elapsed time for 137 k rowswith cte1as(select CustomerID,ProductCode, case when ProductCode = 'A' then 1 else 0 end as IsAExist, case when ProductCode = 'B' then 2 else 0 end as IsBExist, case when ProductCode = 'C' then 4 else 0 end as IsCExistfrom Purchase),cte2as(select CustomerID,ProductCode, MAX(IsAExist) over(PARTITION by CustomerID) + MAX(IsBExist)over(PARTITION by CustomerID) + MAX(IsCExist)over(PARTITION by CustomerID) as status from cte1)select * from cte2where status = 3 --option(maxdop 1); --a and b but not Cgo--/*To compare the above...without using parttitionLogical IO CPU Time Elapsed Time3734 800ms 800ms2201871 7500ms 2850ms*/`
Post #1298749
 Posted Sunday, July 1, 2012 12:50 PM
 SSCertifiable Group: General Forum Members Last Login: 2 days ago @ 8:48 AM Points: 5,366, Visits: 8,977
 Great article Jeff.Awesome discussion by all! Thanks! WayneMicrosoft Certified Master: SQL Server 2008If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #1323574
 Posted Tuesday, April 16, 2013 3:48 AM
 SSC-Addicted Group: General Forum Members Last Login: Saturday, November 8, 2014 4:46 AM Points: 479, Visits: 309
 Another way to skin the cat? (results are for the million-row table provided in the article)(apologies if someone already posted this method)`--SET STATISTICS IO ON--SET STATISTICS TIME ONSELECT CustomerIDFROM #PurchaseWHERE ProductCode IN('A','B','C') -- only A,B,C, AB,AC,BC and ABCGROUP BY CustomerIDHAVING COUNT(DISTINCT ProductCode) = 2 -- only AB,AC and BCAND MAX(ProductCode) = 'B' -- only AB! :)/*(6763 row(s) affected)Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table '#Purchase'. Scan count 3, logical reads 226, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 187 ms, elapsed time = 187 ms.*/`
Post #1442664
 Posted Friday, June 6, 2014 12:27 AM
 Grasshopper Group: General Forum Members Last Login: Thursday, October 2, 2014 11:36 AM Points: 23, Visits: 187
 I have generated test data with 1 000 000 000 000 000 000 000 rows and according to my tests all the queries proposed run within 1 second. You guys can choose the query that has fewer characters!
Post #1578166
 Posted Friday, June 6, 2014 12:31 AM
 Forum Newbie Group: General Forum Members Last Login: Sunday, June 29, 2014 6:46 AM Points: 8, Visits: 53
 Yet another way to skin the cat, using a PIVOT this time, not as quick as the others but returns the counts of each product, sometimes you want to know this too `with results(CustomerID, A, B, C) as(SELECT CustomerID, A, B, C FROM(SELECT CustomerID, ProductCode FROM #Purchase WHERE ProductCode IN ('A','B','C')) AS t1 PIVOT (COUNT(ProductCode) FOR ProductCode IN (A,B,C) ) AS t2)SELECT * from resultswhere A > 0 and B > 0 and C = 0`Original query: CPU time = 94 ms, elapsed time = 183 ms.This Query: CPU time = 109 ms, elapsed time = 221 ms.
Post #1578167
 Posted Friday, June 6, 2014 4:00 AM
 SSCrazy Group: General Forum Members Last Login: Tuesday, October 28, 2014 11:02 AM Points: 2,397, Visits: 3,419
 You know me Jeff Can't help myself since this is a Relational Division Problem.`SELECT CustomerIDFROM #PurchaseWHERE ProductCode IN ('A', 'B', 'C')GROUP BY CustomerIDHAVING MIN(ProductCode) = 'A' AND MAX(ProductCode) = 'B' --AND COUNT(*) = 2;` N 56°04'39.16"E 12°55'05.25"
Post #1578219
 Posted Friday, June 6, 2014 8:00 AM
 Grasshopper Group: General Forum Members Last Login: Thursday, October 9, 2014 7:51 AM Points: 22, Visits: 83
 `SELECT CustomerIDFROM #PurchaseWHERE ProductCode IN ('A', 'B', 'C')GROUP BY CustomerIDHAVING MIN(ProductCode) = 'A' AND MAX(ProductCode) = 'B' --AND COUNT(*) = 2;`This relies on A, B and C being ordered rather than some arbitary values that happen to be A, B and C in this instance so it is kind of a cheat surely?
Post #1578321

 Permissions