Find The Baseball Players

,

Finding the Baseball Players

Or how to find out which customer only bought one type of product.

Introduction

I recently saw a question on one of our discussion lists that asked for a query that

would solve a query problem. I thought it was an interesting query and one that others

might benefit from, so I decided to jot some notes and describe the problem and my solution.

The Problem

Suppose you run a sporting goods store and you just got some new product and want to

market it to the pitchers who come into your store. You have lots of data on products

that you have sold, including the type of sport for which the sale is intended. You decide

(perhaps erroneously) that the true baseball players who would buy this product, only buy

other baseball products.

How can you find out which customers have only purchased baseball products?

Here is the schema of your database:

SteveProd (Holds sale information):

ProdID  ProdName			 ProdTyp
------- -------------------- --------------
1       Bat                  Baseball
2       Baseball             Baseball
3       Football             Football
4       Tee                  Baseball
5       Tee                  Football
6       Tee                  Golf
7       Basketball           Basketball
8       Mask                 Hockey

SteveSales (Holds sale information):

CustID  ProdID  Qty    Price
------- ------- ------ --------
1       1       3      4.00
2       1       1      5.00
2       2       1      99.00
3       4       1      50.00
4       1       2      3.00
4       2       1      4.00
4       3       2      4.00
4       5       3      9.99
5       1       3      4.00
5       3       3      14.00
6       5       3      24.00
7       7       1      54.00

If you run a query to see the sales, you will get the following:

Query:

select s.CustID, s.ProdId, p.prodname, p.ProdTyp
 from SteveSales s
	inner join SteveProd p
		on s.Prodid = p.prodid
 order by custid

Results:

CustID  ProdID  ProdName     ProdTyp
------- ------- -----------  --------
1   	1	    Bat          Baseball  
2	    1	    Bat          Baseball  
2	    2	    Baseball     Baseball  
3	    4	    Tee          Baseball  
4	    1	    Bat          Baseball  
4	    2	    Baseball     Baseball  
4	    3	    Football     Football  
4	    5	    Tee          Football  
5	    1	    Bat          Baseball  
5	    3	    Football     Football  
6	    5	    Tee          Football  
7	    7	    Basketball   Basketball

The Solution

Now, how can we find out who has purchased only baseball products? It turns out that there

are a few different ways. I will show you two and explain why I prefer one over the other.

The basic problem is to find the customers who have purchased baseball products and then

remove those customers who have also purchased other products. To do this, you have two basic

choices that I came up with.

The first solution involves taking the query above and qualifying it with the type of product

that one is searching for. Once you have this qualification, you add an additional qualification using

a correlated subquery to remove customers that have purchased other products. This query looks like:

declare @typ varchar( 20)
select @typ = 'Baseball'
select distinct s.CustID
 from SteveSales s
	inner join SteveProd p
		on s.Prodid = p.prodid
 where p.prodtyp = @typ
 and s.custid not in ( select s1.custid
						from SteveSales s1
							inner join SteveProd p1
								on s1.prodid = p1.prodid
						where s1.custid = s.custid
						and p1.prodtyp != @typ
					)
 order by custid

This query returns customers 1, 2, and 3. If you check the result above, you will find that

these customers have purchased only baseball products. If you want to check this, you can change

the type to football and you should receive only customer 6.

The problem that I see with this query, is the use of the Not in and != operators. These

operators in general cause performance issues on large result sets because the entire table

that is being checked must be scanned to verify that a row does not fit.

My second solution is to rewrite this query to remove the negative operators. To remove

these operators, I decided to check on which customers had not purchase other items, but checking

the quantity of other types of product purchased. To check this quanity, I created an aggregate

in the subquery and then performed a positive join on this sum with the value zero (0). If this

condition is true, then no other types of products were purchased. To remove the != operator, I

changed this to a simple OR operator. Here is the query:

declare @typ varchar( 20)
select @typ = 'Baseball'
select distinct s.CustID
 from SteveSales s
	inner join SteveProd p
		on s.Prodid = p.prodid
 where p.prodtyp = @typ
 and 0 = ( select count(s1.custid)
						from SteveSales s1
							inner join SteveProd p1
								on s1.prodid = p1.prodid
						where s1.custid = s.custid
						and (p1.prodtyp < @typ or p1.prodtyp > @typ)
					)
 order by custid

Conclusions

I am sure some of you out there may have better or more creative solutions, but I tried to

keep things simple and did not see a shorter way of performing this query. Of course, temp tables,

derived tables, cursors, etc. could be used, but many of these solutions may cause a performance or

resource hit that I would not want to take.

As always, I welcome feedback (click the Your Opinion) button below and hope you will rate this

article.

Steve Jones

©dkRanch.net August 2001


Return to Steve Jones Home

 

Rate

Share

Share

Rate