How to find products commonly purchased together

Question

Post reply

How to find products commonly purchased together

schleep

SSChampion

Points: 12582
More actions
November 20, 2018 at 9:40 am

#373080

Hi all,
I feel certain this should be a common query, just maybe not using the right search.
Here goes: I have an ordered items table. How can i determine items that are commonly sold together?
In the following example, i want to see that items 3 and 4 are commonly sold together, and 2, 3, and 4 are also sometimes ordered together.
There can be 2-n items in an order.
Any help will be appreciated!
Thanks,
Paul
CREATE TABLE #tmp_data (OrderNo INTEGER, OrderItem VARCHAR(8));
INSERT #tmp_data SELECT 1, 'Item1' UNION ALL SELECT 1, 'Item2' UNION ALL SELECT 1, 'Item3' UNION ALL SELECT 1, 'Item4' UNION ALL SELECT 2, 'Item2' UNION ALL SELECT 2, 'Item3' UNION ALL SELECT 2, 'Item4' UNION ALL SELECT 3, 'ItemA' UNION ALL SELECT 3, 'Item3' UNION ALL SELECT 3, 'Item4' UNION ALL SELECT 4, 'ItemB' UNION ALL SELECT 4, 'Item3' UNION ALL SELECT 4, 'Item4';

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

pietlinden SSC Guru Points: 63417 More actions · Answer 1

You're looking for something called Market Basket Analysis.
This article might be helpful

schleep SSChampion Points: 12582 More actions · Answer 2

Yup, sounds like what I want to do. Except in SQL Server 2008. I don't have access to DW.

pietlinden SSC Guru Points: 63417 More actions · Answer 3

A data warehouse is just another database with a star schema instead of a standard 3NF schema.

schleep SSChampion Points: 12582 More actions · Answer 4

Ah sorry, I thought thew WITH(DISTRIBUTION = xxx) was a required part of the tools for the analysis.

I'll have a go at this, thanks!

Alan Burstein SSC Guru Points: 61136 More actions · Answer 5

This is an fun and interesting problem. If my solution is what you are looking for (or close enough) then I would suggest this index:
CREATE CLUSTERED INDEX cl_nu__temp_data__OrderItem ON #tmp_data(OrderItem);

Solution:
SELECT getTop1.searchItem, getTop1.orderItem, getTop1.total, getTop1.countRank FROM ( SELECT searchItem = tx.OrderItem, orderItem = itvf_itempairs.OrderItem, total = itvf_itempairs.total, countRank = DENSE_RANK() OVER (PARTITION BY tx.OrderItem ORDER BY -itvf_itempairs.total) FROM ( SELECT t.OrderItem FROM #tmp_data AS t GROUP BY t.OrderItem ) AS tx CROSS APPLY ( SELECT t.OrderItem, total = COUNT(*) FROM #tmp_data AS t WHERE t.OrderNo IN ( SELECT t.OrderNo FROM #tmp_data AS t WHERE EXISTS ( SELECT * FROM #tmp_data AS t2 WHERE t2.OrderItem = tx.orderItem AND t2.OrderItem = t.orderItem ) ) AND t.OrderItem <> tx.orderItem GROUP BY t.OrderItem ) AS itvf_itempairs ) AS getTop1 WHERE getTop1.countRank = 1;

Returns:
searchItem orderItem total countRank ---------- --------- ----------- -------------------- Item1 Item2 1 1 Item1 Item3 1 1 Item1 Item4 1 1 Item2 Item3 2 1 Item2 Item4 2 1 Item3 Item4 4 1 Item4 Item3 4 1 ItemA Item3 1 1 ItemA Item4 1 1 ItemB Item3 1 1 ItemB Item4 1 1

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

schleep SSChampion Points: 12582 More actions · Answer 6

That's pretty impressive Alan, and applied to my real-world data, the results look something like expected.

Thank you very much.