Need data relationship by rows problem to solve ?

  • My sales data table.

     

    TransactionID      Product      Quantity           SalesDollars        StoreID

    100101               A               1                      $10.00                40

    100101               B                2                      $15.00                40

    100101               C               1                      $50.00                40

    100101               D               3                      $18.00                40

    100102               B                3                      $21.50                40

    100102               D               6                      $36.00                40

    100103               A               1                      $10.00                40                   

    100103               D               10                    $70.50                40

    100103               B                2                      $15.00                40                   

    100105               A               1                      $10.00                53

    100105               B                2                      $15.00                53

    100105               C               1                      $50.00                53

    100105               D               3                      $18.00                53

    100106               B                3                      $21.50                53

    100106               D               6                      $36.00                53

    100107               A               1                      $10.00                53                   

    100107               D               10                    $70.50                53

    100107               B                2                      $15.00                53                   

     

     

    This is the character of the dataset I am dealing with. However in real life I have 400 products, thousands of transactions, and 20 million records for a year sales. Transactions are from 40 stores and hundreds of customers.

     

    I have my own ideas how to solve the following data problem, but I am after differing points of view.

     

    I need to determine a data relationship query or matrix between table rows. To answer questions like when Product A is sold how often does product B in the same transaction in same store?  Then the same question has the following permutations and beyond.

     

    Store, Customer, Product

    Store, Customer, Product Category

    Customer, Product Category

    Customer Group, Product

    Etc.

     

    I thought I build a Transaction summary table,  that turns the data information from a row to a column, for SQL searches, So I can search field Product LIKE ‘B’

     

    Like

     

    TransactionID      Product      Quantity           SalesDollars        StoreID

    100107               A,B,D        13                    $95.50                53

    100106               B,D            9                      $57.50                53

    etc

     

     

    So how would you tackle this issue ?

    Is this an OLAP solve, ( dont wanna do that)...

     

    Please advise...

     

     

     

     

  • I'm not 100% sure what you are asking here, but I don't like your summary table idea at the bottom.  Merging the rows, by transaction will be much harder to retrieve.  Using a LIKE instead of = is also much slower.  I think you want to normalize your tables like this with these relationships:  I put Primary Keys in Bold and Foreign Keys in Italics:

    Create Table PRODUCT_CATEGORY (CatId int not null, Description ....)

    Create Table CUSTOMER_GROUP (GroupId int not null, Description ...)

    Create Table STORE (StoreId int not null, other_store_col ...)

    Create Table CUSTOMER (CustomerId int not null, GroupId int not null, ...)

    Create Table  PRODUCT (ProductId int not null, CatId int not null, Name ...)

    Create Table TRANSACTION (TransId int not null, CustomerId int not null,  StoreId int not null, Date datetime not null)

    Create Table TRANS_ITEM(TransId int not null, ProductId int not null, Quantity int not null, SalesDollars money not null)

    Now create the Primary and Foreign Keys that I've mentioned and you can run all the queries you want !! For example if you want  "if Product A is sold how often does product B in the same transaction in same store? "

    select *

    from   TRANS_ITEM i inner join TRANS_ITEM ii on i.TransId = ii.TransId

         inner join TRANSACTION t  on i.TransId = t.TransId

     inner join TRANSACTION tt on ii.TransId = tt.TransId

    where        i.ProductId = 'A' 

     and ii.ProductId = 'B'

     and tt.StoreId = 'Mystore'

     and  t.StoreId = 'Mystore'

    If you can also use the GROUPBY functions to groups your data by CUSTOMER_GROUP or PRODUCT_CATEGORY or lots of things.  I put a date in the TRANSACTION table so you can ask questions like "how many PRODUCT A did we sell on mondays?"  "How many did we sell this month"? THis quarter? etc etc

    The trick here is normalizing your data so that the tables have ONLY the columns that apply to that table.  Once the data is in the right place, you can run all the queries you want, with great speed. 

    Hope this helps,

     

  • OOPs my BOLD and Italics got lost !! ugggh

    See if my above post makes sense anyway.

  • All the tables except TRANS_ITEM have a unique ID, however the TRANS_ITEM table have many TransID to ProductID, thus TransID is not unique in the table. And thats my point.

    So if I wanted to add up transactions with product A and B in same transaction. I would get in the scase of TransID 100107

    100107               A               1                      $10.00                53                   

    100107               B                2                      $15.00                53   

     

    So a question like how many transactions have sold A and B products, this resultset answers a count of 2 ( ie 2 rows), when  the TransID count is really 1 of products A and B in same transaction.. I am trying to avoid DISTINCT as it slow on a big table. So how to get a result set of only ONE row for ONE transaction that has multiple rows of data per transaction.

    Any ideas on this ?

  • Sorry for the slow reply, I don't surf on the weekends!

    If you want to count how many transactions where are that contain both Product A and B, I'd try

    select count( distinct t.transid) from TRANSACTION t inner join TRANS_ITEM a on a.transid = t.transid

     inner join TRANS_ITEM b on b.transid = t.transid

     where a.ProductId = 'A' and b.ProductId = 'B'

    but this uses a distinct, which you said you didn't want.  I don't see why this would be slow though if you have primary keys defined with indexs.  You could also try:

    select count(*) from TRANSACTION  where transid in

    (select a.TransId from TRANS_ITEM a inner join TRANS_ITEM b on a.transid = b.transid

    where a.ProductId = 'A' and b.ProductId = 'B')

    The slow thing here is the "in" clause.  I don't know which is faster.  I don't have a million row table to test it.  If you have indexes on the PKs and FKs, these should be ok.

    Hope this helps!

     

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply