Need data relationship by rows problem to solve ?

Question

Need data relationship by rows problem to solve ?

Digs

SSCertifiable

Points: 7333
More actions
February 23, 2006 at 1:03 pm

#103096

My sales data table.

TransactionID      Product      Quantity           SalesDollars       StoreID
100101               A               1                      $10.00                40
100101               B                2                      $15.00                40
100101               C               1                      $50.00                40
100101               D               3                      $18.00                40
100102               B                3                      $21.50                40
100102               D               6                      $36.00                40
100103               A               1                      $10.00                40
100103               D               10                    $70.50                40
100103               B                2                      $15.00                40
100105               A               1                      $10.00                53
100105               B                2                      $15.00                53
100105               C               1                      $50.00                53
100105               D               3                      $18.00                53
100106               B                3                      $21.50                53
100106               D               6                      $36.00                53
100107               A               1                      $10.00                53
100107               D               10                    $70.50                53
100107               B                2                      $15.00                53

This is the character of the dataset I am dealing with. However in real life I have 400 products, thousands of transactions, and 20 million records for a year sales. Transactions are from 40 stores and hundreds of customers.

I have my own ideas how to solve the following data problem, but I am after differing points of view.

I need to determine a data relationship query or matrix between table rows. To answer questions like when Product A is sold how often does product B in the same transaction in same store? Then the same question has the following permutations and beyond.

Store, Customer, Product
Store, Customer, Product Category
Customer, Product Category
Customer Group, Product
Etc.

I thought I build a Transaction summary table, that turns the data information from a row to a column, for SQL searches, So I can search field Product LIKE ‘B’

Like

TransactionID      Product      Quantity           SalesDollars       StoreID
100107               A,B,D        13                    $95.50                53
100106               B,D            9                      $57.50                53
etc

So how would you tackle this issue ?
Is this an OLAP solve, ( dont wanna do that)...

Please advise...

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Muaddib Old Hand Points: 350 More actions · Answer 1

I'm not 100% sure what you are asking here, but I don't like your summary table idea at the bottom. Merging the rows, by transaction will be much harder to retrieve. Using a LIKE instead of = is also much slower. I think you want to normalize your tables like this with these relationships: I put Primary Keys in Bold and Foreign Keys in Italics:

Create Table PRODUCT_CATEGORY (CatId int not null, Description ....)

Create Table CUSTOMER_GROUP (GroupId int not null, Description ...)

Create Table STORE (StoreId int not null, other_store_col ...)

Create Table CUSTOMER (CustomerId int not null, GroupId int not null, ...)

Create Table PRODUCT (ProductId int not null, CatId int not null, Name ...)

Create Table TRANSACTION (TransId int not null, CustomerId int not null, StoreId int not null, Date datetime not null)

Create Table TRANS_ITEM(TransId int not null, ProductId int not null, Quantity int not null, SalesDollars money not null)

Now create the Primary and Foreign Keys that I've mentioned and you can run all the queries you want !! For example if you want "if Product A is sold how often does product B in the same transaction in same store? "

select *

from TRANS_ITEM i inner join TRANS_ITEM ii on i.TransId = ii.TransId

inner join TRANSACTION t on i.TransId = t.TransId

inner join TRANSACTION tt on ii.TransId = tt.TransId

where i.ProductId = 'A'

and ii.ProductId = 'B'

and tt.StoreId = 'Mystore'

and t.StoreId = 'Mystore'

If you can also use the GROUPBY functions to groups your data by CUSTOMER_GROUP or PRODUCT_CATEGORY or lots of things. I put a date in the TRANSACTION table so you can ask questions like "how many PRODUCT A did we sell on mondays?" "How many did we sell this month"? THis quarter? etc etc

The trick here is normalizing your data so that the tables have ONLY the columns that apply to that table. Once the data is in the right place, you can run all the queries you want, with great speed.

Hope this helps,

Muaddib Old Hand Points: 350 More actions · Answer 2

OOPs my BOLD and Italics got lost !! ugggh

See if my above post makes sense anyway.

Digs SSCertifiable Points: 7333 More actions · Answer 3

All the tables except TRANS_ITEM have a unique ID, however the TRANS_ITEM table have many TransID to ProductID, thus TransID is not unique in the table. And thats my point.

So if I wanted to add up transactions with product A and B in same transaction. I would get in the scase of TransID 100107

100107 A 1 $10.00 53

100107 B 2 $15.00 53

So a question like how many transactions have sold A and B products, this resultset answers a count of 2 ( ie 2 rows), when the TransID count is really 1 of products A and B in same transaction.. I am trying to avoid DISTINCT as it slow on a big table. So how to get a result set of only ONE row for ONE transaction that has multiple rows of data per transaction.

Any ideas on this ?

Muaddib Old Hand Points: 350 More actions · Answer 4

Sorry for the slow reply, I don't surf on the weekends!

If you want to count how many transactions where are that contain both Product A and B, I'd try

select count( distinct t.transid) from TRANSACTION t inner join TRANS_ITEM a on a.transid = t.transid

inner join TRANS_ITEM b on b.transid = t.transid

where a.ProductId = 'A' and b.ProductId = 'B'

but this uses a distinct, which you said you didn't want. I don't see why this would be slow though if you have primary keys defined with indexs. You could also try:

select count(*) from TRANSACTION where transid in

(select a.TransId from TRANS_ITEM a inner join TRANS_ITEM b on a.transid = b.transid

where a.ProductId = 'A' and b.ProductId = 'B')

The slow thing here is the "in" clause. I don't know which is faster. I don't have a million row table to test it. If you have indexes on the PKs and FKs, these should be ok.

Hope this helps!