The Data Scientist

  • Comments posted to this topic are about the item The Data Scientist

    Best wishes,
    Phil Factor

  • It would be nice to see how others have integrated R with SQL Server. Using the RODBC package is great for initiating DB interactions from R but there didn't seem to be any way to call R from SQL Server (2008).

    We had a need to call R scripts from within SQL but didn't want to go to the effort and overhead of using an xp_cmdshell approach - this approach has security and performance considerations of needed to start a new process for each request.

    What we came up with as a prototype was a subscribe/publish approach using request/response tables in SQL Server and one or more continuously running R processes to service the requests and return results in other tables. This would also allow us to have distributed farms of R processes as needed. Not real-time but responsive enough for our needs.

  • Thank you for an interesting topic. I have only recently heard people use the term "data scientist", and yet I think it applies well to what I do. I have always called it "knowing the data". When I "pull" data from the database, I always run histogram frequencies or other stats to see if the data make sense and to look for outliers. Are the outliers real, or caused by typos? (very frequently they are!)

    Thanks for pointing out that this is a real skill and not just an obsession on my part!

  • "Stairway to R" interactive lesson, great place to start & totally free.[/url]

  • One major role of an aggregate data reports writer (i.e. "Data Scientist") was Business Analyst. You can do all the tests you want, but when the actual users are deliberately using piece of data A, which means B, and if you have documentation at all is listed in the documentation as meaning B, for Q and T (depending) instead, it's entirely possible that your tests will show data A as being about as valid as your other data... and without knowing it's actually sometimes Q and sometimes T, using A to show B generates incorrect results.

  • How dare you besmirch the good name of "actionable insights" sir! Take it back! Take back what you said!


    Dog & Pony Shows


  • Thanks davoscollective for the link

  • Is this not just a fancy new name for what was already happening?

    Business Intelligence teams should have Statisticians who will analyse and provide insight into the data. If you don't, how do you know the data you are pulling and the conclusions reached are relevant and meaningful?

  • You'd expect BI people to have a proper grounding in statistics but they haven't. It is very rare for someone who gives themselves a fancy title like this to be able to explain what a normal distribution or whether a finding is statistically significant.

    You'll get a BI person to come up with an 'actionable insight' (whatever the hell they think that is).

    ' Oh? Is that statistically significant?' I ask.

    'Yeah' they say.

    'Interesting, can I see the calculations, please?'


    'Yes, what is the probability level' (blank look from BI person) 'So what do you mean by 'significant'"

    'Well, management will really go for it. It will get them really interested'

    (Phil Factor storms off bad-temperedly)

    Best wishes,
    Phil Factor

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply