Using SQL Server and R Services for analyzing Sales data (Part 3)

  • tomaz.kastrun

    SSCrazy

    Points: 2085

    Comments posted to this topic are about the item Using SQL Server and R Services for analyzing Sales data (Part 3)

    Tomaž Kaštrun | twitter: @tomaz_tsql | blog:  https://tomaztsql.wordpress.com/

  • RonKyle

    SSC-Dedicated

    Points: 31462

    This looks like a great article.  I'm going to try this over the weekend on my home SQL Server 2016 sandbox.  If all works as you've laid out, I'll come back and give you a top rating.  Thanks for diving into this.

  • tomaz.kastrun

    SSCrazy

    Points: 2085

    RonKyle - Tuesday, January 17, 2017 7:53 AM

    This looks like a great article.  I'm going to try this over the weekend on my home SQL Server 2016 sandbox.  If all works as you've laid out, I'll come back and give you a top rating.  Thanks for diving into this.

    Much appreciated. Especially your opinion and how you will apply this.
    Code is working fine with WideWorldImporters/WideWorldImportersDW database.

    Best, Tomaž

    Tomaž Kaštrun | twitter: @tomaz_tsql | blog:  https://tomaztsql.wordpress.com/

  • RonKyle

    SSC-Dedicated

    Points: 31462

    I'll be testing against that database.  I have it set up, but have only started some R work.  This gives me something concrete to follow and try.

  • tomaz.kastrun

    SSCrazy

    Points: 2085

    RonKyle - Tuesday, January 17, 2017 12:43 PM

    I'll be testing against that database.  I have it set up, but have only started some R work.  This gives me something concrete to follow and try.

    Great.
    If you have any other questions, just post them here so we can discuss them!

    Tomaž Kaštrun | twitter: @tomaz_tsql | blog:  https://tomaztsql.wordpress.com/

  • Jonathan Mallia

    SSCertifiable

    Points: 5192

    Amazing article!

    Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?

    Thanks a lot!
    Jon

  • Jonathan Mallia

    SSCertifiable

    Points: 5192

    Hi,

    I was wondering why you did not use the CustomerKey when you created the cluster in:
    dist(Sales[,c(1,3,5)])

    Wouldn't it have been more effective to cluster by customer rather than by productgroup only?

    Thanks in advance for the explanation!

  • tomaz.kastrun

    SSCrazy

    Points: 2085

    Jonathan Mallia - Saturday, January 21, 2017 6:58 AM

    Hi,

    I was wondering why you did not use the CustomerKey when you created the cluster in:
    dist(Sales[,c(1,3,5)])

    Wouldn't it have been more effective to cluster by customer rather than by productgroup only?

    Thanks in advance for the explanation!

    Hi,

    customerkey is just a running ID for each of the customers in the database. In this case, Clustering is done on the attributes of the customers (observation), and CustomerKey is not an attribute that would describe or unveil any information about the customer. If it would be included, it can only create dis-information in relation to other real/natural attributes.
    Attribute for customer can be: business information: number of transactions created, value of invoices, basket values, business type;  demographic information: area, city, country, age, etc. All these attributes describe customers. CustomerKey on the other hand, does not describe customer, nor is anyhow related to customer. it is just a database identifier.

    ProductGroup can be added, because it describes products customer is buying/selling. But if you have all the customers buying all the products, it might also be a good to rethink if you want to include it / how you want to include such attribute.

    Hope I made it more understanding.
    Best, Toamaž

    Tomaž Kaštrun | twitter: @tomaz_tsql | blog:  https://tomaztsql.wordpress.com/

  • tomaz.kastrun

    SSCrazy

    Points: 2085

    Jonathan Mallia - Saturday, January 21, 2017 6:49 AM

    Amazing article!

    Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?

    Thanks a lot!
    Jon

    Hi,
    Both support and confidence are important to identify and find relevant relationship between left hand side (LHS) and right hand side (RHS). Left hand side is interpreted as IF item A.... and Right hand side as THEN item B and item C.
    Or shown graphically {A}  => {B,C}. Imagine, this is our rule. To this rule, support represents, how many times this rules was found in the dataset. If support is 0.123, this means that this rules appeared 12,3% out of all the rules in dataset.
    Confidence will tell you, how many times this rules has been proven as True. If confidence for our rules is 0.99, this means that in 99% of the dataset containing all the rules, customers that bought item A will in 99% times also buy item B and C.

    Best, Tomaž

    Tomaž Kaštrun | twitter: @tomaz_tsql | blog:  https://tomaztsql.wordpress.com/

  • Jonathan Mallia

    SSCertifiable

    Points: 5192

    tomaz.kastrun - Saturday, January 21, 2017 9:02 AM

    Jonathan Mallia - Saturday, January 21, 2017 6:49 AM

    Amazing article!

    Can you please shed some light on confidence and support for the last part of the article, how they are related directly or indirectly, and what difference does it make when you adjust their values in arules ?

    Thanks a lot!
    Jon

    Hi,
    Both support and confidence are important to identify and find relevant relationship between left hand side (LHS) and right hand side (RHS). Left hand side is interpreted as IF item A.... and Right hand side as THEN item B and item C.
    Or shown graphically {A}  => {B,C}. Imagine, this is our rule. To this rule, support represents, how many times this rules was found in the dataset. If support is 0.123, this means that this rules appeared 12,3% out of all the rules in dataset.
    Confidence will tell you, how many times this rules has been proven as True. If confidence for our rules is 0.99, this means that in 99% of the dataset containing all the rules, customers that bought item A will in 99% times also buy item B and C.

    Best, Tomaž

    Thank you Tomaz,

    it's a lot clearer with your explanation.

    Many thanks.

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply