Recent PostsRecent Posts Popular TopicsPopular Topics
 Home Search Members Calendar Who's On

 Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle) Rate Topic Display Mode Topic Options
Author
 Message
 Posted Friday, June 13, 2014 4:52 AM
 SSC-Addicted Group: General Forum Members Last Login: Wednesday, May 4, 2016 4:11 PM Points: 479, Visits: 314
 Jonathan AC Roberts (6/12/2014)Normally when an example is given using values A, B and C they should be taken as algebraic variables who's contents can change.Agreed, it gives a solution that can be used in many situations.I tested Jeff's original solution and 2 other solutions on a 10-million-row random table (using the code attached in Jeff's article):`DECLARE @A CHAR(1), @B CHAR(1), @C CHAR(1)SELECT @A = 'X', @B = 'F', @C = 'M'SET NOCOUNT ONSET STATISTICS IO ONSET STATISTICS TIME ONPRINT '----- Jeff''s Original -----'SELECT CustomerIDFROM #PurchaseWHERE ProductCode IN (@A, @B)GROUP BY CustomerIDHAVING COUNT(DISTINCT ProductCode) = 2EXCEPTSELECT CustomerIDFROM #PurchaseWHERE ProductCode = @C---------------------------------PRINT '----- NOT EXISTS -----'SELECT CustomerIDFROM #Purchase P1WHERE ProductCode IN (@A, @B) AND NOT EXISTS ( SELECT 1 FROM #Purchase P2 WHERE ProductCode = @C AND P1.CustomerID = P2.CustomerID )GROUP BY CustomerIDHAVING COUNT(DISTINCT ProductCode) = 2---------------------------------PRINT '----- SUM(CASE...) -----'SELECT CustomerIDFROM #Purchase P1WHERE ProductCode IN (@A, @B, @C)GROUP BY CustomerIDHAVING SUM(CASE ProductCode WHEN @A THEN 1 ELSE 0 END) > 0 AND SUM(CASE ProductCode WHEN @B THEN 1 ELSE 0 END) > 0 AND SUM(CASE ProductCode WHEN @C THEN 1 ELSE 0 END) = 0SET STATISTICS TIME OFFSET STATISTICS IO OFFSET NOCOUNT OFF`Results (cleaned for readability):`----- Jeff's Original -----Table '#Purchase'. Scan count 49960, logical reads 151394.SQL Server Execution Times: CPU time = 406 ms, elapsed time = 412 ms.----- NOT EXISTS -----Table '#Purchase'. Scan count 3, logical reads 2160.SQL Server Execution Times: CPU time = 296 ms, elapsed time = 288 ms.----- SUM(CASE...) -----Table '#Purchase'. Scan count 3, logical reads 2160.SQL Server Execution Times: CPU time = 655 ms, elapsed time = 661 ms.`
Post #1580447
 Posted Friday, June 13, 2014 5:20 AM
 SSCrazy Group: General Forum Members Last Login: Tuesday, April 14, 2015 6:45 AM Points: 2,403, Visits: 3,431
 The Wizard Of Oz (6/13/2014)I tested Jeff's original solution and 2 other solutions on a 10-million-row random table (using the code attached in Jeff's article):They are not equivalent and return different result.COUNT(DISTINCT ...) = 2andSUM(CASE WHEN ... THEN ... ELSE ... END) > 0Will return different results. N 56°04'39.16"E 12°55'05.25"
Post #1580459
 Posted Friday, June 13, 2014 5:56 AM
 SSC-Addicted Group: General Forum Members Last Login: Wednesday, May 4, 2016 4:11 PM Points: 479, Visits: 314
 SwePeso (6/13/2014)They are not equivalent and return different result.Hmm, I'm going to be sceptical for now because I haven't found a case where any of the 3 solutions brought back different results What settings did you use for @A, @B and @C so I can replicate?
Post #1580473
 Posted Friday, June 13, 2014 6:16 AM
 SSCrazy Group: General Forum Members Last Login: Tuesday, April 14, 2015 6:45 AM Points: 2,403, Visits: 3,431
 You don't need to.The COUNT(DISTINCT ...) approach will only return the groups that has exactly one A and one B.The SUM(CASE ...) approach will return all groups having at least one A and at least one B. It will also return the groups having 20 A's and 14 B's. N 56°04'39.16"E 12°55'05.25"
Post #1580480
 Posted Friday, June 13, 2014 6:28 AM
 SSC-Addicted Group: General Forum Members Last Login: Wednesday, May 4, 2016 4:11 PM Points: 479, Visits: 314
 SwePeso (6/13/2014)The COUNT(DISTINCT ...) approach will only return the groups that has exactly one A and one B.Not sure about this, I thought the whole point of the DISTINCT is that it collapses any and all duplicates in the group, so it returns all groups having at least one A and at least one B, just like the SUM(CASE ...) approach.SwePeso (6/13/2014)The SUM(CASE ...) approach will return all groups having at least one A and at least one B. It will also return the groups having 20 A's and 14 B's.Yes, I thought this was exactly what we needed?
Post #1580485
 Posted Friday, June 13, 2014 6:30 AM
 SSCrazy Group: General Forum Members Last Login: Tuesday, April 14, 2015 6:45 AM Points: 2,403, Visits: 3,431
 So, depending on the distribution of the sample data, the two different queries will return different results.The COUNT(DISTINCT ... ) will return a fewer number of rows, than the SUM(CASE ...) will. N 56°04'39.16"E 12°55'05.25"
Post #1580486
 Posted Friday, June 13, 2014 6:54 AM
 SSC-Addicted Group: General Forum Members Last Login: Wednesday, May 4, 2016 4:11 PM Points: 479, Visits: 314
 I don't think so, because before that point we've already done our GROUP BY CustomerID, so really the 2 methods we're comparing are:`...WHERE ProductCode IN (@A, @B)--people that bought A or B or AB AND NOT EXISTS ( SELECT 1 FROM #Purchase P2 WHERE ProductCode = @C AND P1.CustomerID = P2.CustomerID )--people that bought (A or B or AB) and (not C)GROUP BY CustomerIDHAVING COUNT(DISTINCT ProductCode) = 2--people that bought (AB) and (not C)...WHERE ProductCode IN (@A, @B, @C)--people that bought (A or B or C or AB or AC or BC or ABC)GROUP BY CustomerIDHAVING SUM(CASE ProductCode WHEN @A THEN 1 ELSE 0 END) > 0 AND SUM(CASE ProductCode WHEN @B THEN 1 ELSE 0 END) > 0 AND SUM(CASE ProductCode WHEN @C THEN 1 ELSE 0 END) = 0--people that bought (AB) and (not C)`Either I need more coffee today (99% likely), or you do Peso (1%).I'm eager to learn more about SQL either way
Post #1580498
 Posted Friday, June 13, 2014 7:00 AM
 Forum Newbie Group: General Forum Members Last Login: Sunday, June 29, 2014 6:46 AM Points: 8, Visits: 53
 I'm finding that the results are the same for all 3 queries regardless of the input parameters and, as far as I can see, logically they should be.SSCrazy, Can you supply some example data to highlight what you are saying here?I'm pretty sceptical as well and want some evidence that what you are saying is correct.
Post #1580502
 Posted Friday, June 13, 2014 7:16 AM
 SSCrazy Eights Group: General Forum Members Last Login: Today @ 3:55 PM Points: 9,972, Visits: 9,378
 As always, an excellent article, Jeff. Then again, we've come to expect nothing less.
Post #1580508
 Posted Friday, June 13, 2014 7:17 AM
 SSCrazy Group: General Forum Members Last Login: Tuesday, April 14, 2015 6:45 AM Points: 2,403, Visits: 3,431
 No. You are right.I am the one needing more coffee. Just a mind lapse and disregarding the DISTINCT for some reason. N 56°04'39.16"E 12°55'05.25"
Post #1580509

 Permissions