Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12345»»»

Find Customers Who Bought "A" and "B" But Not "C" (SQL Spackle) Expand / Collapse
Author
Message
Posted Thursday, March 29, 2012 2:27 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, September 12, 2014 5:21 PM
Points: 2,262, Visits: 5,406
chintan.j.gandhi (3/29/2012)
How about this? Although i agree query might get longer when there would be more than 3 items

 SELECT DISTINCT CustomerID
FROM #Purchase P
WHERE EXISTS (SELECT 1 FROM #Purchase P1
WHERE P.customerid = P1.customerid
AND P1.productcode = 'A'
)
AND EXISTS (SELECT 1 FROM #Purchase P2
WHERE P.customerid = P2.customerid
AND P2.productcode = 'B'
)
AND NOT EXISTS (SELECT 1 FROM #Purchase P3
WHERE P.customerid = P3.customerid
AND P3.productcode = 'C'
)



YOu can combine A and B code chunks and leave the NOT EXISTS as it, reducing it tp 2x slower than the others.
Post #1274926
Posted Thursday, March 29, 2012 2:43 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Yesterday @ 2:34 AM
Points: 1,151, Visits: 1,104
Great article and discussion. Learned some new stuff, got some "food for thought" - nice one, peeps.
Post #1274931
Posted Thursday, March 29, 2012 3:15 AM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Today @ 2:29 AM
Points: 425, Visits: 301
You need to eliminate hash join and Index scan.

With this query you can do that using INTERSECT AND EXCEPT

	SELECT CustomerID
FROM #Purchase
WHERE ProductCode = 'A'
INTERSECT
SELECT CustomerID
FROM #Purchase
WHERE ProductCode = 'B'
EXCEPT
SELECT CustomerID
FROM #Purchase
WHERE ProductCode = 'C'

Or, this one using CTE AND JOINS:

;WITH CTE AS
(
SELECT CustomerID
FROM #Purchase
WHERE ProductCode = 'A'
GROUP BY CustomerID
)
, CTE2 AS
(
SELECT CustomerID
FROM #Purchase
WHERE ProductCode = 'B'
GROUP BY CustomerID
)
, CTE3 AS
(
SELECT CustomerID
FROM #Purchase
WHERE ProductCode = 'C'
GROUP BY CustomerID

)
select CTE.CustomerID from CTE
left join CTE3 ON
CTE.CustomerID = CTE3.CustomerID
where CTE.CustomerID in (SELECT CustomerID FROM CTE2)
AND CTE3.CustomerID IS NULL
GROUP BY CTE.CustomerID

Post #1274943
Posted Thursday, March 29, 2012 3:20 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 4:24 AM
Points: 13,275, Visits: 11,062
Dalibor Margotic (3/29/2012)
You need to eliminate hash join and Index scan.

With this query you can do that using INTERSECT AND EXCEPT
...

Or, this one using CTE AND JOINS:

...


In the CTE and JOINS query, I would replace the IN clause with an EXISTS clause.
But that could be personal preference.




How to post forum questions.
Need an answer? No, you need a question.
What’s the deal with Excel & SSIS?

Member of LinkedIn. My blog at LessThanDot.

MCSA SQL Server 2012 - MCSE Business Intelligence
Post #1274946
Posted Thursday, March 29, 2012 3:39 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Thursday, October 9, 2014 10:42 AM
Points: 6,731, Visits: 8,480
Another great SQL Spackle, Jeff

Koen Verbeeck (3/29/2012)
Dalibor Margotic (3/29/2012)
You need to eliminate hash join and Index scan.

With this query you can do that using INTERSECT AND EXCEPT
...

Or, this one using CTE AND JOINS:

...


In the CTE and JOINS query, I would replace the IN clause with an EXISTS clause.
But that could be personal preference.


One should certainly replace the IN by EXISTS or a left join adding a where clause containing the "outer-joined object key " is null

or at least compare their performance

interesting series to read on this matter:
- http://sqlinthewild.co.za/index.php/2010/03/23/left-outer-join-vs-not-exists/


Johan


Don't drive faster than your guardian angel can fly ...
but keeping both feet on the ground won't get you anywhere

- How to post Performance Problems
- How to post data/code to get the best help


- How to prevent a sore throat after hours of presenting ppt ?


"press F1 for solution", "press shift+F1 for urgent solution"


Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me
Post #1274958
Posted Thursday, March 29, 2012 4:07 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 4:19 PM
Points: 1,796, Visits: 5,799
Hi Jeff,

Great Spackle, thanks!

In my tests, using your code to build a million row test, I found this method to be twice as fast for the same logical reads.


SELECT DISTINCT
CustomerID
FROM #Purchase p1
WHERE ProductCode = 'A'
AND EXISTS (SELECT
1
FROM #Purchase p2
WHERE p2.CustomerID = p1.CustomerID
AND p2.ProductCode = 'B')
AND NOT EXISTS (SELECT
1
FROM #Purchase p3
WHERE p3.CustomerID = p1.CustomerID
AND p3.ProductCode = 'C')


I tested this against 10,000,000 rows as well as 1,000,000 and found that it also scales better (IMHO)...

For 10,000,000 rows, the EXCEPT query had these stats:

Table 'Worktable'. Scan count 49955, logical reads 151393, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 655 ms, elapsed time = 739 ms.


The EXISTS query had these:

Table 'Worktable'. Scan count 3, logical reads 2156, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 406 ms, elapsed time = 478 ms.


I wondered if you considered this method and if so, what it was that steered you away from it?


Thanks



MM


  • MMGrid Addin
  • MMNose Addin


  • Forum Etiquette: How to post Reporting Services problems
  • Forum Etiquette: How to post data/code on a forum to get the best help - by Jeff Moden
  • How to Post Performance Problems - by Gail Shaw

  • Post #1274970
    Posted Thursday, March 29, 2012 4:23 AM
    Forum Newbie

    Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

    Group: General Forum Members
    Last Login: Friday, June 1, 2012 2:02 AM
    Points: 5, Visits: 13
    Thanks for article and comments - I learnt a lot.

    In MySQL this is the fastest by far:

    SELECT 
    CustomerId
    FROM Purchase
    WHERE ProductCode IN ('A','B', 'C')
    GROUP BY CustomerID
    having sum(case when ProductCode = 'A' then 1 else 0 end) > 0
    and sum(case when ProductCode = 'B' then 1 else 0 end) > 0
    and sum(case when ProductCode = 'C' then 1 else 0 end) = 0

    Unfortunately MySQL does not have the EXCEPT operator.

    -jj
    Post #1274974
    Posted Thursday, March 29, 2012 6:34 AM


    SSC-Dedicated

    SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

    Group: General Forum Members
    Last Login: Yesterday @ 1:53 PM
    Points: 35,366, Visits: 31,905
    Hi folks,

    Thanks for the great discussion going on and the thoughtful feedback. I'd love to jump in this instant but I'm on my way to work. I'll be back tonight to try to answer some of the questions.

    Thanks, again.


    --Jeff Moden
    "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

    First step towards the paradigm shift of writing Set Based code:
    Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

    (play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

    Helpful Links:
    How to post code problems
    How to post performance problems
    Post #1275023
    Posted Thursday, March 29, 2012 6:48 AM
    Ten Centuries

    Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

    Group: General Forum Members
    Last Login: Yesterday @ 4:49 AM
    Points: 1,027, Visits: 1,750
    Here's another alternative. I haven't tested the performance as I don't have a setup to do that, and I don't know if it will be similar to Jeff's in query plan. It does avoid the COUNT(DISTINCT...) that I read a while back (can't find the linky) was sometimes bad (something about rewinds).

    EDIT: Eh, nevermind... you can ignore this one... Did a small test and it's between 5 and 10x slower than other methods with about 2x the reads.
    Second Edit: BUT, it's easier to adjust if you know that the situation is a random number of "Purchased these" with a single "but didn't purchase this". Most of the other methods rely on replicating a block of SQL for each item in the list of "Purchased". I think both Jeff's (and mine) work without that restriction.

    SELECT
    b.CustomerID
    FROM
    (
    SELECT
    a.CustomerID
    FROM
    (
    SELECT
    DISTINCT
    CustomerID,
    Product
    FROM
    Purchase
    WHERE
    Product IN ('A','B')) AS a
    GROUP BY
    a.CustomerID
    HAVING
    COUNT(Product) = 2
    ) AS b LEFT OUTER JOIN
    Purchase p ON b.CustomerID = p.CustomerID AND p.Product = 'C'
    WHERE
    p.Product IS NULL


    Post #1275034
    Posted Thursday, March 29, 2012 7:01 AM
    Old Hand

    Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

    Group: General Forum Members
    Last Login: Wednesday, October 15, 2014 6:42 AM
    Points: 390, Visits: 380
    As always, Jeff turns complex problems into simple solutions. I'm still looking for "SQL Spackles" by Jeff Moden at Barnes & Noble.
    Post #1275041
    « Prev Topic | Next Topic »

    Add to briefcase ««12345»»»

    Permissions Expand / Collapse