Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Data mining on a single table Expand / Collapse
Author
Message
Posted Friday, November 30, 2012 8:46 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, December 7, 2012 4:56 AM
Points: 4, Visits: 11
Hello everyone,

I am wondering, is it possible to have data mining on a single table with a large amount of data?
The given table is about sales and contains all the needed info (SellerType, ShopLocation, ItemCategory, ItemCondition, Sold, Price, ShippingCost).

I tried to create a data cube but I can only get one dimension.

What I want to exam is the affect shipping cost has on sales based on category, or the price based on condition and category, etc.

Any thoughts?

Thank you in advance.
Post #1391380
Posted Friday, November 30, 2012 8:50 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, March 6, 2014 1:35 AM
Points: 178, Visits: 547
Check out the link.



For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.

"Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort
Post #1391391
Posted Friday, November 30, 2012 9:10 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, December 7, 2012 4:56 AM
Points: 4, Visits: 11
I'm sorry but I don't understand your post.

I didn't add code or table data, I'm just asking if data mining can be done successfully on a single table.
Since English is not my native language I tried to describe the table, by giving the name of the fields, in case someone was wondering why the dataset consists only one table.

Thank you for your time though.
Post #1391408
Posted Saturday, December 1, 2012 3:22 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, December 7, 2012 4:56 AM
Points: 4, Visits: 11
I found the solution and I'm posting it in case someone else have the same problem.
The answer is to create relational tables as following, instead of having one table with all the data in it.

Table Sales (SalesID, SellerTypeID, ShopLocationID, ItemCategoryID,ItemConditionID, Sold, Price, ShippingCost)
Table SellerType (SellerTypeID,SellerTypeDescription)
Table ShopLocation (ShopLocationID,ShopLocationDescription)
Table ItemCondition (ItemConditionID, ItemConditionDescription)
Table ItemCategory (ItemCategoryID, ItemCategoryDescription)

Regards
Post #1391637
Posted Sunday, December 2, 2012 8:40 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:36 AM
Points: 36,995, Visits: 31,524
Chriz L. (12/1/2012)
I found the solution and I'm posting it in case someone else have the same problem.
The answer is to create relational tables as following, instead of having one table with all the data in it.

Table Sales (SalesID, SellerTypeID, ShopLocationID, ItemCategoryID,ItemConditionID, Sold, Price, ShippingCost)
Table SellerType (SellerTypeID,SellerTypeDescription)
Table ShopLocation (ShopLocationID,ShopLocationDescription)
Table ItemCondition (ItemConditionID, ItemConditionDescription)
Table ItemCategory (ItemCategoryID, ItemCategoryDescription)

Regards


I have to say that "It Depends". What you have above will work fine until a shop changes location (and that does happen) and then you'll have "lost" historical data. Descriptions can also change at the drop of a hat. The only way you could preserve the historical nature of the information is if you instantiate something like "Type 2 Slowly Changing Dimensions" in the "finger tables" of this "Star Schema".

You're also missing some key information in your "hub" table (even without the "finger" tables). You're missing the date of the sale which is why the other people asked you to post the CREATE TABLE statement for your table if you had one. They wanted to check on all the stuff you might be missing.

To wit, this hasn't answered your question of "is it possible to have data mining on a single table with a large amount of data", at all, because you still have a large, single table as a hub and you'll need to join it to several other tables and, to keep the historical information intact, you'll also need to instantiate the Type 2 SCD's I previously spoke of.

Sales tables, like this one, are historical archives. There's very few reasonable explanations as to why any normalization, like you're done above, should even be attempted. All it does is severely complicate things and, without the Type 2 SCD's (one of the few reasons to try such normalization on such a static reporting table), it won't allow you to mine for things like "did sales go up or down when we changed the name of an item".

To answer your question, "It Depends". Do you want to be able to mine for things like if description changes have affected sales? If so, this is going to get a whole lot more complicated. If not, then go back to your flat historical table and save yourself the trouble. Don't forget to add a date.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1391731
Posted Tuesday, December 4, 2012 3:26 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, December 7, 2012 4:56 AM
Points: 4, Visits: 11
Thank you for the post, it's very helpful.
I'm just getting started with data mining so I can use all the help I can get.

Post #1392328
Posted Tuesday, December 4, 2012 7:27 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 12:36 AM
Points: 36,995, Visits: 31,524
To add to what I previously stated. If you're using an analysis tool, such as SSAS, you may have to "normalize" to a star or snowflake to take advantage of the things that "pivots" and the like can do. Just be aware that since it is for sales and sales are, by their very nature, "historical", how you handle changes to locations and product names will play a key role in what you can actually report. Type 1 or Type 2 SCDs or other historical reporting methods can play a very important role in that.

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1392479
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse