SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Using SQL Server and R Services for analyzing Sales data (Part 3)


Using SQL Server and R Services for analyzing Sales data (Part 3)

Author
Message
Tomaz Kastrun
Tomaz Kastrun
Old Hand
Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)

Group: General Forum Members
Points: 363 Visits: 754
Comments posted to this topic are about the item Using SQL Server and R Services for analyzing Sales data (Part 3)

Tomaž Kaštrun | twitter: @tomaz_tsql | blog: https://tomaztsql.wordpress.com/
RonKyle
RonKyle
Hall of Fame
Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)

Group: General Forum Members
Points: 3825 Visits: 3552
This looks like a great article. I'm going to try this over the weekend on my home SQL Server 2016 sandbox. If all works as you've laid out, I'll come back and give you a top rating. Thanks for diving into this.



Tomaz Kastrun
Tomaz Kastrun
Old Hand
Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)

Group: General Forum Members
Points: 363 Visits: 754
RonKyle - Tuesday, January 17, 2017 7:53 AM
This looks like a great article. I'm going to try this over the weekend on my home SQL Server 2016 sandbox. If all works as you've laid out, I'll come back and give you a top rating. Thanks for diving into this.

Much appreciated. Especially your opinion and how you will apply this.
Code is working fine with WideWorldImporters/WideWorldImportersDW database.

Best, Tomaž


Tomaž Kaštrun | twitter: @tomaz_tsql | blog: https://tomaztsql.wordpress.com/
RonKyle
RonKyle
Hall of Fame
Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)Hall of Fame (3.8K reputation)

Group: General Forum Members
Points: 3825 Visits: 3552
I'll be testing against that database. I have it set up, but have only started some R work. This gives me something concrete to follow and try.



Tomaz Kastrun
Tomaz Kastrun
Old Hand
Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)

Group: General Forum Members
Points: 363 Visits: 754
RonKyle - Tuesday, January 17, 2017 12:43 PM
I'll be testing against that database. I have it set up, but have only started some R work. This gives me something concrete to follow and try.




Great.
If you have any other questions, just post them here so we can discuss them!

Tomaž Kaštrun | twitter: @tomaz_tsql | blog: https://tomaztsql.wordpress.com/
Jonathan Mallia
Jonathan Mallia
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1337 Visits: 1291
Amazing article!

Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?

Thanks a lot!
Jon
Jonathan Mallia
Jonathan Mallia
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1337 Visits: 1291
Hi,

I was wondering why you did not use the CustomerKey when you created the cluster in:
dist(Sales[,c(1,3,5)])


Wouldn't it have been more effective to cluster by customer rather than by productgroup only?

Thanks in advance for the explanation!
Tomaz Kastrun
Tomaz Kastrun
Old Hand
Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)

Group: General Forum Members
Points: 363 Visits: 754
Jonathan Mallia - Saturday, January 21, 2017 6:58 AM
Hi,

I was wondering why you did not use the CustomerKey when you created the cluster in:
dist(Sales[,c(1,3,5)])


Wouldn't it have been more effective to cluster by customer rather than by productgroup only?

Thanks in advance for the explanation!



Hi,

customerkey is just a running ID for each of the customers in the database. In this case, Clustering is done on the attributes of the customers (observation), and CustomerKey is not an attribute that would describe or unveil any information about the customer. If it would be included, it can only create dis-information in relation to other real/natural attributes.
Attribute for customer can be: business information: number of transactions created, value of invoices, basket values, business type; demographic information: area, city, country, age, etc. All these attributes describe customers. CustomerKey on the other hand, does not describe customer, nor is anyhow related to customer. it is just a database identifier.

ProductGroup can be added, because it describes products customer is buying/selling. But if you have all the customers buying all the products, it might also be a good to rethink if you want to include it / how you want to include such attribute.

Hope I made it more understanding.
Best, Toamaž


Tomaž Kaštrun | twitter: @tomaz_tsql | blog: https://tomaztsql.wordpress.com/
Tomaz Kastrun
Tomaz Kastrun
Old Hand
Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)Old Hand (363 reputation)

Group: General Forum Members
Points: 363 Visits: 754
Jonathan Mallia - Saturday, January 21, 2017 6:49 AM
Amazing article!

Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?

Thanks a lot!
Jon



Hi,
Both support and confidence are important to identify and find relevant relationship between left hand side (LHS) and right hand side (RHS). Left hand side is interpreted as IF item A.... and Right hand side as THEN item B and item C.
Or shown graphically {A} => {B,C}. Imagine, this is our rule. To this rule, support represents, how many times this rules was found in the dataset. If support is 0.123, this means that this rules appeared 12,3% out of all the rules in dataset.
Confidence will tell you, how many times this rules has been proven as True. If confidence for our rules is 0.99, this means that in 99% of the dataset containing all the rules, customers that bought item A will in 99% times also buy item B and C.

Best, Tomaž


Tomaž Kaštrun | twitter: @tomaz_tsql | blog: https://tomaztsql.wordpress.com/
Jonathan Mallia
Jonathan Mallia
Ten Centuries
Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)Ten Centuries (1.3K reputation)

Group: General Forum Members
Points: 1337 Visits: 1291
tomaz.kastrun - Saturday, January 21, 2017 9:02 AM
Jonathan Mallia - Saturday, January 21, 2017 6:49 AM
Amazing article!

Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?

Thanks a lot!
Jon



Hi,
Both support and confidence are important to identify and find relevant relationship between left hand side (LHS) and right hand side (RHS). Left hand side is interpreted as IF item A.... and Right hand side as THEN item B and item C.
Or shown graphically {A} => {B,C}. Imagine, this is our rule. To this rule, support represents, how many times this rules was found in the dataset. If support is 0.123, this means that this rules appeared 12,3% out of all the rules in dataset.
Confidence will tell you, how many times this rules has been proven as True. If confidence for our rules is 0.99, this means that in 99% of the dataset containing all the rules, customers that bought item A will in 99% times also buy item B and C.

Best, Tomaž


Thank you Tomaz,

it's a lot clearer with your explanation.

Many thanks.

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search