Data Science for SQL Folks: Leveraging SQL and R

  • Comments posted to this topic are about the item Data Science for SQL Folks: Leveraging SQL and R

    Frank Banin
    BI and Advanced Analytics Professional.

  • Thanks for this, didn't know about rsqlserver. There's a lot of new information to digest here but all looks very useful.


  • Great article!

    I'd suggest the ARIMA and ensemble pieces could do with a bit more intro as they are a big jump from the time series stuff, but otherwise a neat explanation of dplyr, ggplot2 and time series analysis ๐Ÿ˜€

  • Here is a good start for R

    "R for Everyone Advanced Analytics and Graphics" by Jared P. Lander

  • one of the most interesting posts I've read recently, very cool!

  • Pluralsight just added an R course. I am presently enjoying it (and hopefully learning at the same time!). I especially liked the first graph showing increasing value as you learn to see the future.

  • My idea was to focus more on leveraging SQL in other languages. Did not want it to turn into a PA article. I can expatiate on that if there is a lot of interest.

    Frank Banin
    BI and Advanced Analytics Professional.

  • Help a nublet brother out. ๐Ÿ˜€

    I failed to see where and how ada comes up with the correct formula to use when you simply used the words "model.formula" as the parameter for the ada formula? I was expecting to see the dependent response over the explanatory variables? Some how, from what you've got there, it figured out the last vector in the frame was the response? Unless you declared model.formula in code somewhere outside of what you posted? I also don't understand why there's the creation of two empty data frames, which go terminally unused, stuck between two identical runs of the ada model on the same training data? I suspect this is a mis-paste, especially since one of the data frames starts out uncharacteristically with a capital letter, which falls out of the naming convention used elsewhere, and we know R is strictly case sensitive. So confused. :w00t:

    On another note, I regularly use RODBC to get to my SQL Server data. Is there any particular benefit to using the rsqlserver package vs the other? If it's anything like other interfaces, there may be some syntactical benefits?

    Was thrilled to see this article on here today. We need more merging of the SQL Server and R communities.

  • Hi Incredible,

    the section is missing the code below for the model.formula. A wrong version of my write-ups made the cut for publication thus the few other typos.

    model.formula <- as.formula(BikeBuyer ~ Maritalstatus + gender + YearlyIncome + TotalChildren + NumberChildrenAtHome + englisheducation, englishoccupation + houseownerflag + NumberCarsOwned + countrycode + commutedistance + region + age + Amount )

    when it comes to connecting to sql server dplyr is just faster than ODBC so you can still use your ODBC for pulling your data and dplyr for querying the dataframes after you pull your data.

    you can ignore or remove the two empty frames I normally put model results in frame which I didn't here.

    Thanks for the catch and let me know if you have any more questions

    Frank Banin
    BI and Advanced Analytics Professional.

  • Yes, that formula makes a lot more sense now. Will they let you send updates to the code, or do they lock it down?

    Would love to see this get expanded as a series on the convergence of SQL Server and statistical analytics. Perhaps even going as far as showing sql server to model processing automation techniques, to the disjointed extent that it can be done at this time.

    Thanks a bunch,

  • There is an EdX course introducing R, but using biology statistics as its background. Still, could be useful/interesting.

    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • Excellent article. Thank you.

    I have used R with Posrgres where I used R as the server to the DB (pl/r extension for postgres) as opposed to the approach outlined here where R is the client and DB is the server. There is also a project which is accessible through clr functions in sql server. On my list of things to do. I think that there are great possibilities using the powerful set based properties of the DB and invoking analytical value-add through a R service.


    Liam Caffrey

  • Frank,

    Can you email the R code please, The rClr library is no longer exist in 3.2.3 version

    Also I'm getting unable to find an inherited method for function โ€˜dbConnectโ€™ for signature โ€˜"character"โ€™

    I'm using R-Studio


    Oded Dror

  • Thank you. I was waiting for the day when R and SQL Server play nice.

  • Thanks for this article.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply