Democratic Data Science

  • Steve Jones - SSC Editor (6/28/2016)


    However now most major sports, at high amateur and pro levels, use all sorts of analytics.

    And, most of the time, they don't help at all... unless they're tracking the optimum dosage of steroids. πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Steve Jones - SSC Editor (6/28/2016)


    Do we need DBAs to manage backups?

    Heh... I know you know better but it does bring up a point of observation. Why is it that so many people think that's all a DBA does? Doesn't anyone realize that a few of us have to know how to get the current date and time using T-SQL? πŸ˜›

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I really feel we go off the deep end in understanding these articles. For example, data scientist can already automate what they are doing in Python. It's very easy to run through multiple models in a single multiple regression algorithm and find the best fit one. This can surely scale further and further than just that. Yet, getting to that point (cleaning, prepping, etc) is the hard part, not automating what they have in front of them when it's all said and done (hence why I and most of us exist).

    However, how far do you want these scientist to swim away from the dock before they drown themselves? These are not BI Developers, these are very math focused statisticians who are driven by a scientific approach and likely have domain experience that if worth their salt, can validate what they are doing through proper evaluation and testing. Not just picking something because the P value says these variables are significant and pushing forward.

    With machines doing the grunt work, citizen data scientists don’t need in-depth understandings of algorithms and coding parameters; the automation takes care of that.

    "Don't need in-depth understandings" of what and how a machine is giving you is right or wrong. I don't think I could make this fly with my higher ups that hey, we want to move towards a model where we have less understanding so we can focus on other things we have yet to understand. But it's totally fine, it's only your money, not mine. πŸ˜‰

  • For some reason"data homeopathy" is a phrase bouncing around my head at the moment.

    The saying goes that a statistician draws a mathematically precise line between an unwarranted assumption and a foregone conclusion.

    I've seen people carve themselves a lucrative niche producing artifacts in support of popular opinion and beautifying the obvious. The ones that say the data doesn't support your dearly held belief are much rarer

  • David.Poole (6/29/2016)


    ...The ones that say the data doesn't support your dearly held belief are much rarer

    They are targets for big game hunters!!!

    Gaz

    -- Stop your grinnin' and drop your linen...they're everywhere!!!

  • Steve Jones - SSC Editor (6/28/2016)


    porter.james (6/28/2016)


    As much as I'd like to think our organization could use more data scientists, the reality is that the most accurate statistical models delivered by my colleagues were rejected by the business because they couldn't understand them. I guess if data science does become huge, it will take some time for data consumers to adjust to the trend--they still seem to be more interested in the straightforward aggregation that SQL-based reporting has always provided.

    This is going to be something that takes time. Telecommuting couldn't catch on for a long time because too many managers couldn't understand how people would work. It is now.

    Analytics in sports were dismissed by many, because they couldn't understand them and wanted an "eye test". However now most major sports, at high amateur and pro levels, use all sorts of analytics. There are a few holdouts that may lose their jobs in the next couple years for not including data analysis as part of their coaching. Not all, but part.

    I think businesses will adjust, though it will be years for some, months for others.

    Upper management buy in to more sophisticated data analytics? Heck, I'm still trying to find some place to work where I can telecommute.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Rod at work (6/29/2016)


    Steve Jones - SSC Editor (6/28/2016)


    porter.james (6/28/2016)


    As much as I'd like to think our organization could use more data scientists, the reality is that the most accurate statistical models delivered by my colleagues were rejected by the business because they couldn't understand them. I guess if data science does become huge, it will take some time for data consumers to adjust to the trend--they still seem to be more interested in the straightforward aggregation that SQL-based reporting has always provided.

    This is going to be something that takes time. Telecommuting couldn't catch on for a long time because too many managers couldn't understand how people would work. It is now.

    Analytics in sports were dismissed by many, because they couldn't understand them and wanted an "eye test". However now most major sports, at high amateur and pro levels, use all sorts of analytics. There are a few holdouts that may lose their jobs in the next couple years for not including data analysis as part of their coaching. Not all, but part.

    I think businesses will adjust, though it will be years for some, months for others.

    Upper management buy in to more sophisticated data analytics? Heck, I'm still trying to find some place to work where I can telecommute.

    +1 yep πŸ˜€

  • Eric M Russell (6/28/2016)


    While we're on the topic of organizations who leverage technology to automate "grunt work", this is a funny read.

    How to Snare Millions of Men with Web Bots

    http://gizmodo.com/ashley-madison-code-shows-more-women-and-more-bots-1727613924

    Very interesting article, Eric.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • Rod at work (6/29/2016)


    Eric M Russell (6/28/2016)


    While we're on the topic of organizations who leverage technology to automate "grunt work", this is a funny read.

    How to Snare Millions of Men with Web Bots

    http://gizmodo.com/ashley-madison-code-shows-more-women-and-more-bots-1727613924

    Very interesting article, Eric.

    We can laugh, yet at the same time it is deeply disturbing how the bulk contents of the Ashley Madison data breach quickly became a sort of defacto "sample database" for use in the public domain as a basis for analytical experiments. The inner workings of the company have been laid bare, and the identities included in the dataset (more or less) belong to real people who never imagined they'd be exposed in this way. It happened to Sony and the US State Department too. It's a nightmare scenario and a cautionary tale for anyone responsible for protecting their organization's sensitive data.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • David.Poole (6/28/2016)


    The big challenge with statistics is in communicating what the results actually mean. Understanding that point 51.9 vs 48.1 has no significance in a small sample but becomes immensely significant in a larger population.

    Picking 51.9 and 48.1 and claiming that it's significant because it's from alarge sample just at the present is a rather [political statement, isn't it?

    To see whether it's significant, it's better to use genuine numbers instead of fake ones - when you call it 37.37% YES, 34.63% NO, and 28.00% DON'T CARE you see a very different picture from 51.9 vs 48.1.

    Serious decisions to change require a serious majority; that can be enforced in a number of ways (USA constitution changes require a two to one majority; the 1979 Scottish devolution referendum stipulated that a vote for change required at least 40% of eligible voters, so a mere 37.37% wouldn't have cut it (and the 52% majority for yes meant that devolution didn't happen then). There is no intention at all to look for a serious majority with that 51.9 vs 48.1 case, the voice of a fraction more than a third of electors will be treated as significant and sufficient.

    Tom

  • TomThomson (7/13/2016)


    David.Poole (6/28/2016)


    The big challenge with statistics is in communicating what the results actually mean. Understanding that point 51.9 vs 48.1 has no significance in a small sample but becomes immensely significant in a larger population.

    Picking 51.9 and 48.1 and claiming that it's significant because it's from alarge sample just at the present is a rather [political statement, isn't it?

    To see whether it's significant, it's better to use genuine numbers instead of fake ones - when you call it 37.37% YES, 34.63% NO, and 28.00% DON'T CARE you see a very different picture from 51.9 vs 48.1.

    Serious decisions to change require a serious majority; that can be enforced in a number of ways (USA constitution changes require a two to one majority; the 1979 Scottish devolution referendum stipulated that a vote for change required at least 40% of eligible voters, so a mere 37.37% wouldn't have cut it (and the 52% majority for yes meant that devolution didn't happen then). There is no intention at all to look for a serious majority with that 51.9 vs 48.1 case, the voice of a fraction more than a third of electors will be treated as significant and sufficient.

    All depends on context. The 3.8 could represent 100 people, it could represent 1,000,000 people. It could also represent a few thousand dollars in loss revenue, it could represent 1,000,000 in loss revenue.

    Maybe it's a slow decay over the past week? While the impact is not as bad, if the trend is proven to stick on average or get worse, then you're talking about 15.2 in the next month.

    So, you just take the number for face value with no context or anything, then you're basically fitting the stereotype that ignorance is bliss.

  • David.Poole (6/29/2016)


    I've seen people carve themselves a lucrative niche producing artifacts in support of popular opinion and beautifying the obvious. The ones that say the data doesn't support your dearly held belief are much rarer

    That's how things have been ever since I can remember, and statistics isn't the only tool misused in that manner. I've occassionally used it to show that some other influence must be masking the effect that everyone "knows" is there, but I learned a very long time ago not to suggest that the figures indicate that the effect everyone "knows" is there is pure tommyrot.

    It's amazing how much utterly illogical stuff is called logic when it's used to "prove" things like "mathematics doesn't work", "the halting problem isn't a problem", "Goedel's incompleteness theorem is false", and "Cohen was wrong (GCH is not independent of ZFC)" - ideas that are very popular amongst crackpots. So the farce nest-feathering works with stuff which is popular only with crackpots, not just with stuff that is generally popular.

    Tom

  • Jeff Moden (6/28/2016)


    Steve Jones - SSC Editor (6/28/2016)


    However now most major sports, at high amateur and pro levels, use all sorts of analytics.

    And, most of the time, they don't help at all... unless they're tracking the optimum dosage of steroids. πŸ˜‰

    I don't think that's true at all. They certainly help, and plenty of teams, especially in the NBA have done well with using data along with knowledge. It certainly has helped baseball, and the Red Sox would look at analytics as an important part of their championship runs.

    Data analytics don't guarantee championships, which is what plenty of people think. They do help reduce the risk of poor decisions and increase the chance that you will make better decisions on how to use players at points in the game, but players still have to perform.

Viewing 13 posts - 16 through 27 (of 27 total)

You must be logged in to reply to this topic. Login to reply