|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Wednesday, May 01, 2013 4:37 PM
Points: 2,248,
Visits: 5,352
|
|
chintan.j.gandhi (3/29/2012)
How about this? Although i agree query might get longer when there would be more than 3 items SELECT DISTINCT CustomerID FROM #Purchase P WHERE EXISTS (SELECT 1 FROM #Purchase P1 WHERE P.customerid = P1.customerid AND P1.productcode = 'A' ) AND EXISTS (SELECT 1 FROM #Purchase P2 WHERE P.customerid = P2.customerid AND P2.productcode = 'B' ) AND NOT EXISTS (SELECT 1 FROM #Purchase P3 WHERE P.customerid = P3.customerid AND P3.productcode = 'C' )
YOu can combine A and B code chunks and leave the NOT EXISTS as it, reducing it tp 2x slower than the others.
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Yesterday @ 4:05 AM
Points: 1,127,
Visits: 937
|
|
| Great article and discussion. Learned some new stuff, got some "food for thought" - nice one, peeps.
|
|
|
|
|
Old Hand
      
Group: General Forum Members
Last Login: Today @ 5:33 AM
Points: 398,
Visits: 233
|
|
You need to eliminate hash join and Index scan.
With this query you can do that using INTERSECT AND EXCEPT
SELECT CustomerID FROM #Purchase WHERE ProductCode = 'A' INTERSECT SELECT CustomerID FROM #Purchase WHERE ProductCode = 'B' EXCEPT SELECT CustomerID FROM #Purchase WHERE ProductCode = 'C'
Or, this one using CTE AND JOINS:
;WITH CTE AS ( SELECT CustomerID FROM #Purchase WHERE ProductCode = 'A' GROUP BY CustomerID ) , CTE2 AS ( SELECT CustomerID FROM #Purchase WHERE ProductCode = 'B' GROUP BY CustomerID ) , CTE3 AS ( SELECT CustomerID FROM #Purchase WHERE ProductCode = 'C' GROUP BY CustomerID ) select CTE.CustomerID from CTE left join CTE3 ON CTE.CustomerID = CTE3.CustomerID where CTE.CustomerID in (SELECT CustomerID FROM CTE2) AND CTE3.CustomerID IS NULL GROUP BY CTE.CustomerID
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Yesterday @ 6:30 AM
Points: 9,410,
Visits: 6,495
|
|
|
|
|
|
SSCertifiable
       
Group: General Forum Members
Last Login: 2 days ago @ 2:13 PM
Points: 6,866,
Visits: 8,071
|
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Today @ 1:08 PM
Points: 1,334,
Visits: 4,020
|
|
Hi Jeff,
Great Spackle, thanks!
In my tests, using your code to build a million row test, I found this method to be twice as fast for the same logical reads.
SELECT DISTINCT CustomerID FROM #Purchase p1 WHERE ProductCode = 'A' AND EXISTS (SELECT 1 FROM #Purchase p2 WHERE p2.CustomerID = p1.CustomerID AND p2.ProductCode = 'B') AND NOT EXISTS (SELECT 1 FROM #Purchase p3 WHERE p3.CustomerID = p1.CustomerID AND p3.ProductCode = 'C')
I tested this against 10,000,000 rows as well as 1,000,000 and found that it also scales better (IMHO)...
For 10,000,000 rows, the EXCEPT query had these stats:
Table 'Worktable'. Scan count 49955, logical reads 151393, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 655 ms, elapsed time = 739 ms.
The EXISTS query had these:
Table 'Worktable'. Scan count 3, logical reads 2156, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times: CPU time = 406 ms, elapsed time = 478 ms.
I wondered if you considered this method and if so, what it was that steered you away from it?
Thanks
MM
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Friday, June 01, 2012 2:02 AM
Points: 5,
Visits: 13
|
|
Thanks for article and comments - I learnt a lot.
In MySQL this is the fastest by far:
SELECT CustomerId FROM Purchase WHERE ProductCode IN ('A','B', 'C') GROUP BY CustomerID having sum(case when ProductCode = 'A' then 1 else 0 end) > 0 and sum(case when ProductCode = 'B' then 1 else 0 end) > 0 and sum(case when ProductCode = 'C' then 1 else 0 end) = 0
Unfortunately MySQL does not have the EXCEPT operator.
-jj
|
|
|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 7:18 AM
Points: 33,112,
Visits: 27,038
|
|
Hi folks,
Thanks for the great discussion going on and the thoughtful feedback. I'd love to jump in this instant but I'm on my way to work. I'll be back tonight to try to answer some of the questions.
Thanks, again.
--Jeff Moden "RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".
First step towards the paradigm shift of writing Set Based code: Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."
For better, quicker answers on T-SQL questions, click on the following... http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following... http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: Today @ 5:56 AM
Points: 732,
Visits: 1,231
|
|
Here's another alternative. I haven't tested the performance as I don't have a setup to do that, and I don't know if it will be similar to Jeff's in query plan. It does avoid the COUNT(DISTINCT...) that I read a while back (can't find the linky) was sometimes bad (something about rewinds).
EDIT: Eh, nevermind... you can ignore this one... Did a small test and it's between 5 and 10x slower than other methods with about 2x the reads. Second Edit: BUT, it's easier to adjust if you know that the situation is a random number of "Purchased these" with a single "but didn't purchase this". Most of the other methods rely on replicating a block of SQL for each item in the list of "Purchased". I think both Jeff's (and mine) work without that restriction.
SELECT b.CustomerID FROM ( SELECT a.CustomerID FROM ( SELECT DISTINCT CustomerID, Product FROM Purchase WHERE Product IN ('A','B')) AS a GROUP BY a.CustomerID HAVING COUNT(Product) = 2 ) AS b LEFT OUTER JOIN Purchase p ON b.CustomerID = p.CustomerID AND p.Product = 'C' WHERE p.Product IS NULL
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Yesterday @ 6:36 AM
Points: 268,
Visits: 272
|
|
| As always, Jeff turns complex problems into simple solutions. I'm still looking for "SQL Spackles" by Jeff Moden at Barnes & Noble.
|
|
|
|