Data Science for SQL Folks: Leveraging SQL and R

  • Frank Banin

    Ten Centuries

    Points: 1323

    Comments posted to this topic are about the item Data Science for SQL Folks: Leveraging SQL and R

    Frank Banin
    BI and Advanced Analytics Professional.

  • matthew.marini 59735

    Old Hand

    Points: 318

    Thanks for this, didn't know about rsqlserver. There's a lot of new information to digest here but all looks very useful.

    Matt

  • Steph Locke

    SSCrazy

    Points: 2857

    Great article!

    I'd suggest the ARIMA and ensemble pieces could do with a bit more intro as they are a big jump from the time series stuff, but otherwise a neat explanation of dplyr, ggplot2 and time series analysis ๐Ÿ˜€

  • piotrka

    SSCrazy

    Points: 2023

    Here is a good start for R

    "R for Everyone Advanced Analytics and Graphics" by Jared P. Lander

  • boriskey

    Mr or Mrs. 500

    Points: 511

    one of the most interesting posts I've read recently, very cool!

  • John Hanrahan

    Hall of Fame

    Points: 3825

    Pluralsight just added an R course. I am presently enjoying it (and hopefully learning at the same time!). I especially liked the first graph showing increasing value as you learn to see the future.

  • Frank Banin

    Ten Centuries

    Points: 1323

    My idea was to focus more on leveraging SQL in other languages. Did not want it to turn into a PA article. I can expatiate on that if there is a lot of interest.

    Frank Banin
    BI and Advanced Analytics Professional.

  • INCREDIBLEmouse

    SSC Eights!

    Points: 814

    Help a nublet brother out. ๐Ÿ˜€

    I failed to see where and how ada comes up with the correct formula to use when you simply used the words "model.formula" as the parameter for the ada formula? I was expecting to see the dependent response over the explanatory variables? Some how, from what you've got there, it figured out the last vector in the frame was the response? Unless you declared model.formula in code somewhere outside of what you posted? I also don't understand why there's the creation of two empty data frames, which go terminally unused, stuck between two identical runs of the ada model on the same training data? I suspect this is a mis-paste, especially since one of the data frames starts out uncharacteristically with a capital letter, which falls out of the naming convention used elsewhere, and we know R is strictly case sensitive. So confused. :w00t:

    On another note, I regularly use RODBC to get to my SQL Server data. Is there any particular benefit to using the rsqlserver package vs the other? If it's anything like other interfaces, there may be some syntactical benefits?

    Was thrilled to see this article on here today. We need more merging of the SQL Server and R communities.

  • Frank Banin

    Ten Centuries

    Points: 1323

    Hi Incredible,

    the section is missing the code below for the model.formula. A wrong version of my write-ups made the cut for publication thus the few other typos.

    model.formula <- as.formula(BikeBuyer ~ Maritalstatus + gender + YearlyIncome + TotalChildren + NumberChildrenAtHome + englisheducation, englishoccupation + houseownerflag + NumberCarsOwned + countrycode + commutedistance + region + age + Amount )

    when it comes to connecting to sql server dplyr is just faster than ODBC so you can still use your ODBC for pulling your data and dplyr for querying the dataframes after you pull your data.

    you can ignore or remove the two empty frames I normally put model results in frame which I didn't here.

    Thanks for the catch and let me know if you have any more questions

    Frank Banin
    BI and Advanced Analytics Professional.

  • INCREDIBLEmouse

    SSC Eights!

    Points: 814

    Yes, that formula makes a lot more sense now. Will they let you send updates to the code, or do they lock it down?

    Would love to see this get expanded as a series on the convergence of SQL Server and statistical analytics. Perhaps even going as far as showing sql server to model processing automation techniques, to the disjointed extent that it can be done at this time.

    Thanks a bunch,

  • Wayne West

    SSC-Insane

    Points: 22586

    There is an EdX course introducing R, but using biology statistics as its background. Still, could be useful/interesting.

    https://www.edx.org/course/kix/kix-kiexplorx-explore-statistics-r-1524

    -----
    [font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]

  • LiamCaffrey

    SSC Rookie

    Points: 42

    Excellent article. Thank you.

    I have used R with Posrgres where I used R as the server to the DB (pl/r extension for postgres) as opposed to the approach outlined here where R is the client and DB is the server. There is also a r.net project which is accessible through clr functions in sql server. On my list of things to do. I think that there are great possibilities using the powerful set based properties of the DB and invoking analytical value-add through a R service.

    Regards

    Liam Caffrey

  • odeddror

    SSC Eights!

    Points: 878

    Frank,

    Can you email the R code please, The rClr library is no longer exist in 3.2.3 version

    Also I'm getting unable to find an inherited method for function โ€˜dbConnectโ€™ for signature โ€˜"character"โ€™

    I'm using R-Studio

    Thanks

    Oded Dror

  • tinausa

    SSCrazy

    Points: 2657

    Thank you. I was waiting for the day when R and SQL Server play nice.

  • akljfhnlaflkj

    SSC Guru

    Points: 76202

    Thanks for this article.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply