SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Data mining on a single table


Data mining on a single table

Author
Message
Chriz L.
Chriz L.
Grasshopper
Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)

Group: General Forum Members
Points: 14 Visits: 11
Hello everyone,

I am wondering, is it possible to have data mining on a single table with a large amount of data?
The given table is about sales and contains all the needed info (SellerType, ShopLocation, ItemCategory, ItemCondition, Sold, Price, ShippingCost).

I tried to create a data cube but I can only get one dimension.

What I want to exam is the affect shipping cost has on sales based on category, or the price based on condition and category, etc.

Any thoughts?

Thank you in advance.
Dennis Post
Dennis Post
Ten Centuries
Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)Ten Centuries (1K reputation)

Group: General Forum Members
Points: 1007 Visits: 547
Check out the link.



For better, quicker answers on T-SQL questions, read Jeff Moden's suggestions.

"Million-to-one chances crop up nine times out of ten." ― Terry Pratchett, Mort
Chriz L.
Chriz L.
Grasshopper
Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)

Group: General Forum Members
Points: 14 Visits: 11
I'm sorry but I don't understand your post.

I didn't add code or table data, I'm just asking if data mining can be done successfully on a single table.
Since English is not my native language I tried to describe the table, by giving the name of the fields, in case someone was wondering why the dataset consists only one table.

Thank you for your time though.
Chriz L.
Chriz L.
Grasshopper
Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)

Group: General Forum Members
Points: 14 Visits: 11
I found the solution and I'm posting it in case someone else have the same problem.
The answer is to create relational tables as following, instead of having one table with all the data in it.

Table Sales (SalesID, SellerTypeID, ShopLocationID, ItemCategoryID,ItemConditionID, Sold, Price, ShippingCost)
Table SellerType (SellerTypeID,SellerTypeDescription)
Table ShopLocation (ShopLocationID,ShopLocationDescription)
Table ItemCondition (ItemConditionID, ItemConditionDescription)
Table ItemCategory (ItemCategoryID, ItemCategoryDescription)

Regards :-)
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)

Group: General Forum Members
Points: 218301 Visits: 41995
Chriz L. (12/1/2012)
I found the solution and I'm posting it in case someone else have the same problem.
The answer is to create relational tables as following, instead of having one table with all the data in it.

Table Sales (SalesID, SellerTypeID, ShopLocationID, ItemCategoryID,ItemConditionID, Sold, Price, ShippingCost)
Table SellerType (SellerTypeID,SellerTypeDescription)
Table ShopLocation (ShopLocationID,ShopLocationDescription)
Table ItemCondition (ItemConditionID, ItemConditionDescription)
Table ItemCategory (ItemCategoryID, ItemCategoryDescription)

Regards :-)


I have to say that "It Depends". What you have above will work fine until a shop changes location (and that does happen) and then you'll have "lost" historical data. Descriptions can also change at the drop of a hat. The only way you could preserve the historical nature of the information is if you instantiate something like "Type 2 Slowly Changing Dimensions" in the "finger tables" of this "Star Schema".

You're also missing some key information in your "hub" table (even without the "finger" tables). You're missing the date of the sale which is why the other people asked you to post the CREATE TABLE statement for your table if you had one. They wanted to check on all the stuff you might be missing.

To wit, this hasn't answered your question of "is it possible to have data mining on a single table with a large amount of data", at all, because you still have a large, single table as a hub and you'll need to join it to several other tables and, to keep the historical information intact, you'll also need to instantiate the Type 2 SCD's I previously spoke of.

Sales tables, like this one, are historical archives. There's very few reasonable explanations as to why any normalization, like you're done above, should even be attempted. All it does is severely complicate things and, without the Type 2 SCD's (one of the few reasons to try such normalization on such a static reporting table), it won't allow you to mine for things like "did sales go up or down when we changed the name of an item".

To answer your question, "It Depends". Do you want to be able to mine for things like if description changes have affected sales? If so, this is going to get a whole lot more complicated. If not, then go back to your flat historical table and save yourself the trouble. Don't forget to add a date. ;-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Chriz L.
Chriz L.
Grasshopper
Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)

Group: General Forum Members
Points: 14 Visits: 11
Thank you for the post, it's very helpful.
I'm just getting started with data mining so I can use all the help I can get.

:-)
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)SSC Guru (218K reputation)

Group: General Forum Members
Points: 218301 Visits: 41995
To add to what I previously stated. If you're using an analysis tool, such as SSAS, you may have to "normalize" to a star or snowflake to take advantage of the things that "pivots" and the like can do. Just be aware that since it is for sales and sales are, by their very nature, "historical", how you handle changes to locations and product names will play a key role in what you can actually report. Type 1 or Type 2 SCDs or other historical reporting methods can play a very important role in that.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search