Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1314151617»»

Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle) Expand / Collapse
Author
Message
Posted Friday, June 13, 2014 4:52 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, June 18, 2014 2:37 AM
Points: 479, Visits: 297
Jonathan AC Roberts (6/12/2014)
Normally when an example is given using values A, B and C they should be taken as algebraic variables who's contents can change.


Agreed, it gives a solution that can be used in many situations.

I tested Jeff's original solution and 2 other solutions on a 10-million-row random table (using the code attached in Jeff's article):

DECLARE @A CHAR(1), @B CHAR(1), @C CHAR(1)

SELECT @A = 'X', @B = 'F', @C = 'M'

SET NOCOUNT ON
SET STATISTICS IO ON
SET STATISTICS TIME ON

PRINT '
----- Jeff''s Original -----'

SELECT CustomerID
FROM #Purchase
WHERE ProductCode IN (@A, @B)
GROUP BY CustomerID
HAVING COUNT(DISTINCT ProductCode) = 2

EXCEPT

SELECT CustomerID
FROM #Purchase
WHERE ProductCode = @C

---------------------------------

PRINT '
----- NOT EXISTS -----'

SELECT CustomerID
FROM #Purchase P1
WHERE ProductCode IN (@A, @B)
AND NOT EXISTS (
SELECT 1
FROM #Purchase P2
WHERE ProductCode = @C
AND P1.CustomerID = P2.CustomerID
)
GROUP BY CustomerID
HAVING COUNT(DISTINCT ProductCode) = 2

---------------------------------

PRINT '
----- SUM(CASE...) -----'

SELECT CustomerID
FROM #Purchase P1
WHERE ProductCode IN (@A, @B, @C)
GROUP BY CustomerID
HAVING SUM(CASE ProductCode WHEN @A THEN 1 ELSE 0 END) > 0
AND SUM(CASE ProductCode WHEN @B THEN 1 ELSE 0 END) > 0
AND SUM(CASE ProductCode WHEN @C THEN 1 ELSE 0 END) = 0

SET STATISTICS TIME OFF
SET STATISTICS IO OFF
SET NOCOUNT OFF

Results (cleaned for readability):

----- Jeff's Original -----

Table '#Purchase'. Scan count 49960, logical reads 151394.

SQL Server Execution Times: CPU time = 406 ms, elapsed time = 412 ms.

----- NOT EXISTS -----

Table '#Purchase'. Scan count 3, logical reads 2160.

SQL Server Execution Times: CPU time = 296 ms, elapsed time = 288 ms.

----- SUM(CASE...) -----

Table '#Purchase'. Scan count 3, logical reads 2160.

SQL Server Execution Times: CPU time = 655 ms, elapsed time = 661 ms.

Post #1580447
Posted Friday, June 13, 2014 5:20 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, September 11, 2014 1:58 AM
Points: 2,397, Visits: 3,407
The Wizard Of Oz (6/13/2014)
I tested Jeff's original solution and 2 other solutions on a 10-million-row random table (using the code attached in Jeff's article):
They are not equivalent and return different result.

COUNT(DISTINCT ...) = 2

and

SUM(CASE WHEN ... THEN ... ELSE ... END) > 0

Will return different results.




N 56°04'39.16"
E 12°55'05.25"
Post #1580459
Posted Friday, June 13, 2014 5:56 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, June 18, 2014 2:37 AM
Points: 479, Visits: 297
SwePeso (6/13/2014)
They are not equivalent and return different result.


Hmm, I'm going to be sceptical for now because I haven't found a case where any of the 3 solutions brought back different results
What settings did you use for @A, @B and @C so I can replicate?
Post #1580473
Posted Friday, June 13, 2014 6:16 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, September 11, 2014 1:58 AM
Points: 2,397, Visits: 3,407
You don't need to.

The COUNT(DISTINCT ...) approach will only return the groups that has exactly one A and one B.
The SUM(CASE ...) approach will return all groups having at least one A and at least one B. It will also return the groups having 20 A's and 14 B's.



N 56°04'39.16"
E 12°55'05.25"
Post #1580480
Posted Friday, June 13, 2014 6:28 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, June 18, 2014 2:37 AM
Points: 479, Visits: 297
SwePeso (6/13/2014)
The COUNT(DISTINCT ...) approach will only return the groups that has exactly one A and one B.

Not sure about this, I thought the whole point of the DISTINCT is that it collapses any and all duplicates in the group, so it returns all groups having at least one A and at least one B, just like the SUM(CASE ...) approach.

SwePeso (6/13/2014)
The SUM(CASE ...) approach will return all groups having at least one A and at least one B. It will also return the groups having 20 A's and 14 B's.

Yes, I thought this was exactly what we needed?
Post #1580485
Posted Friday, June 13, 2014 6:30 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, September 11, 2014 1:58 AM
Points: 2,397, Visits: 3,407
So, depending on the distribution of the sample data, the two different queries will return different results.
The COUNT(DISTINCT ... ) will return a fewer number of rows, than the SUM(CASE ...) will.



N 56°04'39.16"
E 12°55'05.25"
Post #1580486
Posted Friday, June 13, 2014 6:54 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, June 18, 2014 2:37 AM
Points: 479, Visits: 297
I don't think so, because before that point we've already done our GROUP BY CustomerID, so really the 2 methods we're comparing are:

...
WHERE ProductCode IN (@A, @B)
--people that bought A or B or AB
AND NOT EXISTS (
SELECT 1
FROM #Purchase P2
WHERE ProductCode = @C
AND P1.CustomerID = P2.CustomerID
)
--people that bought (A or B or AB) and (not C)
GROUP BY CustomerID
HAVING COUNT(DISTINCT ProductCode) = 2
--people that bought (AB) and (not C)

...
WHERE ProductCode IN (@A, @B, @C)
--people that bought (A or B or C or AB or AC or BC or ABC)
GROUP BY CustomerID
HAVING SUM(CASE ProductCode WHEN @A THEN 1 ELSE 0 END) > 0
AND SUM(CASE ProductCode WHEN @B THEN 1 ELSE 0 END) > 0
AND SUM(CASE ProductCode WHEN @C THEN 1 ELSE 0 END) = 0
--people that bought (AB) and (not C)

Either I need more coffee today (99% likely), or you do Peso (1%).
I'm eager to learn more about SQL either way
Post #1580498
Posted Friday, June 13, 2014 7:00 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, June 29, 2014 6:46 AM
Points: 8, Visits: 53
I'm finding that the results are the same for all 3 queries regardless of the input parameters and, as far as I can see, logically they should be.

SSCrazy, Can you supply some example data to highlight what you are saying here?

I'm pretty sceptical as well and want some evidence that what you are saying is correct.
Post #1580502
Posted Friday, June 13, 2014 7:16 AM
SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 1:54 PM
Points: 4,319, Visits: 3,362
As always, an excellent article, Jeff. Then again, we've come to expect nothing less.


Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions
Post #1580508
Posted Friday, June 13, 2014 7:17 AM


SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Thursday, September 11, 2014 1:58 AM
Points: 2,397, Visits: 3,407
No. You are right.
I am the one needing more coffee. Just a mind lapse and disregarding the DISTINCT for some reason.




N 56°04'39.16"
E 12°55'05.25"
Post #1580509
« Prev Topic | Next Topic »

Add to briefcase «««1314151617»»

Permissions Expand / Collapse