Analysing Sales Patterns: R + SQL Server

  • nick.dale.burns

    SSCrazy

    Points: 2226

    Comments posted to this topic are about the item Analysing Sales Patterns: R + SQL Server

  • Kyrilluk

    Ten Centuries

    Points: 1269

    Exellent article, thank you.

    For those of you that just hate creating views or would like to keep experimenting with different sql queries and have windows connection rather than a SQL ones, the following my help:

    SQLconnection<-odbcDriverConnect("Driver=SQL Server; Server=MyDummyInstance\\BI_DEV; Database=AdventureWorksDW2012; trusted_connection=true;")

    salesData<- sqlQuery(SQLconnection, "

    select

    sales.ProductKey,

    p.ProductSubcategoryKey,

    ps.EnglishProductSubcategoryName,

    p.EnglishProductName,

    sales.SalesTerritoryKey,

    t.SalesTerritoryRegion,

    sales.CustomerKey

    from dbo.FactInternetSales as sales

    inner join dbo.DimProduct as p on p.ProductKey = sales.ProductKey

    inner join dbo.DimProductSubcategory as ps on ps.ProductSubcategoryKey = p.ProductSubcategoryKey

    inner join dbo.DimSalesTerritory as t on t.SalesTerritoryKey = sales.SalesTerritoryKey;

    ")

    odbcClose(SQLconnection)

    The are some issues with the code (some missing parenthesis and some reference to data frame or variables that don't exist such as the product_contributions one).

  • nick.dale.burns

    SSCrazy

    Points: 2226

    Kyrilluk (12/15/2015)


    Excellent article, thank you.

    For those of you that just hate creating views or would like to keep experimenting with different sql queries and have windows connection rather than a SQL ones, the following my help:

    SQLconnection<-odbcDriverConnect("Driver=SQL Server; Server=MyDummyInstance\\BI_DEV; Database=AdventureWorksDW2012; trusted_connection=true;")

    salesData<- sqlQuery(SQLconnection, "

    select

    sales.ProductKey,

    p.ProductSubcategoryKey,

    ps.EnglishProductSubcategoryName,

    p.EnglishProductName,

    sales.SalesTerritoryKey,

    t.SalesTerritoryRegion,

    sales.CustomerKey

    from dbo.FactInternetSales as sales

    inner join dbo.DimProduct as p on p.ProductKey = sales.ProductKey

    inner join dbo.DimProductSubcategory as ps on ps.ProductSubcategoryKey = p.ProductSubcategoryKey

    inner join dbo.DimSalesTerritory as t on t.SalesTerritoryKey = sales.SalesTerritoryKey;

    ")

    odbcClose(SQLconnection)

    Very nice - thank you for adding that 🙂

    Cheers,

    Nick

  • yogmangela

    Valued Member

    Points: 68

    Hi Nick,

    Thanks for the tutorial. Great to see someone putting effort to share their knowledge.

    Just a little hickup.

    below variables are not assigned for last graph.

    - product_contributions

    - by_region_trimmed

    I am getting error :

    > by_region_filter<- with(majorSalesData, table(SalesTerritoryRegion, ProductKey))

    > results_filters <- CA(by_region_trimmed, graph = FALSE)

    Error in is.table(X) : object 'by_region_trimmed' not found

    >

    I look forward to more of these tutorial.

    Thanks,

    Yogs

  • chris.smith 3432

    SSC Rookie

    Points: 27

    Hi Nick,

    I enjoyed this article very much. I am an Analysis Services user, but I put together some (much simpler) R examples for our R users to access our Data Warehouse. This model is spot on for SQL Server 2012 and 2014.

    I am about to build some examples using 2016 CTP3.1 embedding the R calls in SQL scripts. Any thoughts about how you take advantage of the embedded model for the types of analyses in your examples?

    Thank you for sharing!

    Chris

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Thanks for the great article.

  • bteague

    SSC Eights!

    Points: 894

    Great article and timely as I am currently in the process of explaining the power of R relative to what is currently in the MSBI stack. I might add the built-in visualizations as a differentiator compared to the SSAS basket analysis solution. I've been challenged to investigate options in SSAS, but my intuition tells me that it would be futile given the current flexibility of R, the direction of 2016 and the inevitable granularity required to conduct such analyses.

  • Alan Burstein

    SSC Guru

    Points: 61087

    Great article!

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • yogmangela

    Valued Member

    Points: 68

    have you created ODBC connection form Data Source ?

  • nick.dale.burns

    SSCrazy

    Points: 2226

    yogmangela (12/15/2015)


    Hi Nick,

    Thanks for the tutorial. Great to see someone putting effort to share their knowledge.

    Just a little hickup.

    below variables are not assigned for last graph.

    - product_contributions

    - by_region_trimmed

    I am getting error :

    > by_region_filter<- with(majorSalesData, table(SalesTerritoryRegion, ProductKey))

    > results_filters <- CA(by_region_trimmed, graph = FALSE)

    Error in is.table(X) : object 'by_region_trimmed' not found

    >

    I look forward to more of these tutorial.

    Thanks,

    Yogs

    Hi Yogs, thanks for finding these 🙂

    I think that 'product_contributions' should be replaced by 'strength'. And 'by_region_trimmed' should be 'by_region_filter'.

    Cheers,

    Nick

  • nick.dale.burns

    SSCrazy

    Points: 2226

    chris.smith 3432 (12/15/2015)


    Hi Nick,

    I enjoyed this article very much. I am an Analysis Services user, but I put together some (much simpler) R examples for our R users to access our Data Warehouse. This model is spot on for SQL Server 2012 and 2014.

    I am about to build some examples using 2016 CTP3.1 embedding the R calls in SQL scripts. Any thoughts about how you take advantage of the embedded model for the types of analyses in your examples?

    Thank you for sharing!

    Chris

    Hi Chris,

    Thank you for your comments and questions 🙂 I am really really interested at the moment about the use cases for 'embedded R' (to borrow you term). By 'embedded' I specifically mean including R in a stored procedure. I've been playing around with this now for a couple of months, and in general I have found that directly embedding R in stored procedures is really useful, but it is also less flexible than using R externally. Here are my thoughts, I would be keen to get your thoughts as well:

    1. Embedding R in a stored procedure seems to be really useful when there is limited (i.e. no) required interaction. For example, a defined model that needs to be trained.

    2. Embedded R returns a data frame, just like a query. This means, that just like a query, users still need an interface to visualise / communicate / consume the results. It isn't quite like having results locally that you can then play around with (like in this tutorial).

    3. I haven't been able to find a way to subset data within an embedded R script!!!! This is really important. For example, if you have logical groups in your database table and you want to perform an analysis (like clustering for example) on each group - I haven't been able to figure out how to do this. The only solution I have found at present is to have a separate table or a separate query / view for each group and then applying the analysis to each of these. Very very frustrating!

    Regards,

    Nick

  • nick.dale.burns

    SSCrazy

    Points: 2226

    bteague (12/15/2015)


    Great article and timely as I am currently in the process of explaining the power of R relative to what is currently in the MSBI stack. I might add the built-in visualizations as a differentiator compared to the SSAS basket analysis solution. I've been challenged to investigate options in SSAS, but my intuition tells me that it would be futile given the current flexibility of R, the direction of 2016 and the inevitable granularity required to conduct such analyses.

    Hi - really interesting to hear about other people doing the same exploration of R / Microsoft's traditional BI stack.

    I agree with you, that R's flexibility is it's major strength. R is far more suited to exploratory analysis than SSAS is. And for many analysts (and especially younger analysts and graduates) R is likely to be a more natural fit to their skill set - which I think is really important as businesses look outside of the IT department to actually make use of their data.

    R's flexibility can also be a liability sometimes I think. For me, if a model is stable, proven and requires little or no interaction from users, then I think SSAS is a perfectly suitable way of deploying a model in production. I will be very interested to see whether my thoughts / attitudes change on this as Azure ML grows in popularity...

    Many thanks,

    Nick

  • nick.dale.burns

    SSCrazy

    Points: 2226

    yogmangela (12/15/2015)


    have you created ODBC connection form Data Source ?

    Hi Yogs,

    Yes, I have and existing ODBC connection named "RSQLAnalytics2016" which I use to connect. Hope that answers your question?

    Cheers,

    Nick

  • bteague

    SSC Eights!

    Points: 894

    Yeah. I guess the question is if you have cubes but haven't invested the time in learning the requirements to implement or constraints/limitations of the SSAS data mining models, are you better off starting with R given the features and functionality of each ... at this point in time and anticipating where each will be in the next year or so. Will SSAS data mining persist or continue to evolve when Revolution R is now an integrated and supported option?

  • nick.dale.burns

    SSCrazy

    Points: 2226

    bteague (12/15/2015)


    Yeah. I guess the question is if you have cubes but haven't invested the time in learning the requirements to implement or constraints/limitations of the SSAS data mining models, are you better off starting with R given the features and functionality of each ... at this point in time and anticipating where each will be in the next year or so. Will SSAS data mining persist or continue to evolve when Revolution R is now an integrated and supported option?

    Interesting question and surely this comes down to personal preference. But, if you have cubes in analysis services already - and you don't have R Services configured and ready to go (which no one does in production right now), then I would go with analysis services. There are some great tutorials on MSDN for using analysis services to analyse customer churn data and market basket analysis. And ultimately, I believe that the learning curve is shallower for analysis services vs. R. But I am an R user... 🙂

    Nick

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply