Becoming a Data Scientist

  • Comments posted to this topic are about the item Becoming a Data Scientist

  • I have a SQL/BI background and was intrigued by the hype around Data Science, so I enrolled on EdX and completed the MS Certificate in Data Science. I do NOT consider myself a Data Scientist after my 5 month effort, but as an individual who needed a base zero launchpad, I could not have asked for a more appropriate start.

    I found the lectures to be well structured in that each video presented at most two topics that I needed to grasp. If I lacked understanding I simply started over. I was surprised at how much information I retained when it came to exam time. Be warned! This is knowledge assimilation time. You cannot rely on you 15 years of experience. It will all be new, so take notes.

    I appreciated the ability to make course choices, R or Python, Excel or PowerBI, Advanced Machine Learning or Developing with Spark. The final Capstone Project had its difficulties, some students had a different data set to others and were able to train their models with a higher degree of accuracy. Ultimately it did not affect any individuals work.My very next challenge will be to become proficient in R. This will mean I need to improve my statistics knowledge at the same time as learning the nuances of a new language.

    Thats has been my start. Thank goodness I know T-Sql.

  • To be honest, I'm rather suspicious of the whole "data science" aspect.  In my humble opinion, much of what I've done over my career could be considered "data science" with advanced data analysis, result aggregations and projections, comparative analysis and even including "what-if" scenarios.  All of that was done with familiar tools such as T-SQL, SSRS and occasionally Excel.

    Lately, I've seen multiple examples of queries from "data scientists" that were simply abysmal.  Either the queries were written by someone who clearly does not have the most basic concept of writing moderately efficient queries with a minimal understanding of relational concepts and a basic knowledge of the data organization or the queries were written by some sort of tool that produced queries based upon obscure criteria given by someone who still has a lack of understanding how the data is organized.  Not only were the queries horribly inefficient (for some queries in particular I could write a whole paper on everything that was done wrong and why), but they were executed on a production server while the primary daily job was running.  This impacted not only the system in a general sense but specifically impacted the very tables the primary job required for updates.  As expected, this had a severe negative impact on the Service Level Agreement (SLA) the customer expected of the primary job.  The fact that the SLA is without regard for whatever other customer activity that may occur on their production server is a topic for another discussion.

    I'm sure the "data scientist" had no inkling of the impact that he/she had on the system and merely wanted the data sought so it could be placed into a location for further analysis.  Yet, it seems prudent to hold a "data scientist" to a higher standard than a junior level database developer.  While statistics are often employed for data science, it is much more than merely applying statistical models against data to see what the outcome may be.  Much of it is truly understanding the data, reviewing it to look for patterns (or the lack thereof) to identify trends and to determine how to leverage that knowledge into useful activity to support the business endeavor.  In that very sense, many of us are already data scientists.

    Data scientist or not, we all have a responsibility to be prudent in how we perform our duties such as performing data analysis on a copy of production, not production itself, to avoid an adverse impact on production operations.  We need to be mindful of others and not operate in isolation.  In truth, we need to always consider the impact that we have on each other and the business.  After all, if we're not making a positive impact, why are we there?

  • I'll be honest...my first role has had many "Data science" aspects or projects (Data mining, reporting, Cleansing data, Machine learning projects...etc). And my next role is a full time BI role...but at what point would you be a "Data scientist", to me it just seems like a typical buzz word. I'm more interested in helping businesses and organizations make more prudent decisions based on the wealth of data they have, the buzz words don't really interest me too much.

  • There is no profession on earth that is immune from pretenders or from people who misunderstand its purpose.
    I've worked with a few people worthy of the data scientist nomenclature.  What they can derive and predict with frightening accuracy from data is incredible.  What they don't do is produce production grade code, develop tests to ensure rigour in that code.  That is why a "data engineer" and a "data scientist" are two things rather than one.

    Where I am sceptical is that huge value can be derived from what is effectively the fly tipping of data.  If an organisation wants to get value from its data it has to treat the capturing of that data and the husbandry of that data as a first class citizen.  Getting value from data is as much about the architecture of the enterprise as it is about mechanical assets and skills.

    You also have to consider which insights are useful and which are merely interesting.  Then you need to overcome the hippo (highest paid person's opinion) principle.  Your data might tell you something that is counter-intuitive and kills a few sacred cows.  You need an environment that expects and supports that outcome otherwise you are paying expensive people to confirm what you are doing today.

  • I am currently enrolled in the Microsoft Data Science program.  I have undergrad and graduate degrees in Computer Science, over 20 years of professional experience but other than a few training classes, I don't have any certifications, etc. since I graduated 20 years ago.  When I first heard about the MS Data Science program, I was intrigued because it appeared to be more "college-like" rather than memorizing a bunch terms, etc. to take a multiple-guess test to get a certification (I totally love that there's a programming project at the end).  I started in January and am currently working with the Statistics course.  I know I'm not working as fast through the program like many others (I have a life outside of work:-) but I've spent time researching books to supplement the classes and am really looking forward to learning R.  I've learned more about ways to use Excel than I ever would have and now look at it as a respectable tool versus a pain to work with.

    Just the few months working through the courses has enabled me to help others at my work place and given the types of problems people here are working on, I know the skills I will gain will be helpful for the future.  Do I think I will be a "data scientist" upon completion?  No.  But my overall skill set and knowledge will be enhanced and I'll be able to develop more efficient solutions for the company.

  • robin 27944 - Wednesday, April 26, 2017 4:34 AM

    I have a SQL/BI background and was intrigued by the hype around Data Science, so I enrolled on EdX and completed the MS Certificate in Data Science. I do NOT consider myself a Data Scientist after my 5 month effort, but as an individual who needed a base zero launchpad, I could not have asked for a more appropriate start.

    I found the lectures to be well structured in that each video presented at most two topics that I needed to grasp. If I lacked understanding I simply started over. I was surprised at how much information I retained when it came to exam time. Be warned! This is knowledge assimilation time. You cannot rely on you 15 years of experience. It will all be new, so take notes.

    I appreciated the ability to make course choices, R or Python, Excel or PowerBI, Advanced Machine Learning or Developing with Spark. The final Capstone Project had its difficulties, some students had a different data set to others and were able to train their models with a higher degree of accuracy. Ultimately it did not affect any individuals work.My very next challenge will be to become proficient in R. This will mean I need to improve my statistics knowledge at the same time as learning the nuances of a new language.

    Thats has been my start. Thank goodness I know T-Sql.

    Hi Robin,

    Glad to see you wrote about your experience. I've been thinking some about Data Science. I've got a degree in math and would love to get back into that in my profession. Have you finished with EdX courses so that now you're certififed through them? I wasn't sure, by what you wrote, that you are.

    Steve,

    Thanks for this article.

    Kindest Regards, Rod Connect with me on LinkedIn.

  • I hear that McDonalds corporation is recruiting a Data Scientist. The new hire will share a corner office with their Nutritionist.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • I finished the EDX program early this year.
    My experience has been recruiters and companies aren't caring about the education, even with the Microsoft name behind it, and even with the completion certificate, since it isn't backed by on-the-job experience.
    They have told me I'd have to be junior (read: low pay) to qualify for a learning position.  At my senior experience and pay, they demand someone who has done the job before, who can get right to work.

    So if not getting me a new job as a data scientist, what's the value?
    * I have not been able in current job to apply most of the ML & analysis technologies.  So no current value there.
    * But the value has come from new ways in thinking about problems, data, analysis.  I am challenging my business in what they see as worthless data to be discarded, I can see current and potential value.
    * I have gone back and looked at previous analysis projects with fresh eyes, came out with new insight to bring to the business.  I have amazed them and myself with what we found this time.
    * I am getting more project requests that before.  Not using the fun technologies, but still am doing more work.  It's good to be the person the business wants to help them -- success brings more success.

    So this program has not  won me a new job as a data scientist.  I'm not using the fun technologies.  But it did improve my thinking, and I'm doing the same job as before, but better.
    Better results in this job leads to great success stories, and success stories opens doors for future jobs -- even if those jobs do not have the data scientist title.
    Success comes from what I do, not what I'm called.

    The arguable part on EDX program is paying for the courses to get the useless (for me) certificate, when the courses can be audited for free.
    I have chosen to see it as as paying EDX/MS for the delivery platform & content.

  • Aaron N. Cutshall - Wednesday, April 26, 2017 6:04 AM

    To be honest, I'm rather suspicious of the whole "data science" aspect.  In my humble opinion, much of what I've done over my career could be considered "data science" with advanced data analysis, result aggregations and projections, comparative analysis and even including "what-if" scenarios.  All of that was done with familiar tools such as T-SQL, SSRS and occasionally Excel.

    Lately, I've seen multiple examples of queries from "data scientists" that were simply abysmal.  Either the queries were written by someone who ...

    I'm sure the "data scientist" had no inkling of the impact that he/she had on the system and merely wanted the data sought so it could be placed into a location for further analysis. 

    A couple things. First, I think you're confusing science and analysis with efficiency. They are two different jobs. I've worked with plenty of business analysts that could do a great job analyzing data and extracting information from lots of rows, but they couldn't do it well. They'd perform more work than necessary to prepare and organize data. However, that wasn't their job. Often the struggle with data scientists or hard core analysts has been bringing their work to a place where it can be done repeatedly and quickly.
    Note: this is one reason MS bought Revolutions and incorporated R. There were plenty of people in banking and finance and medicine writing complex analysis scripts that were brittle, slow, single threaded, and not really ready for anyone else to run. Work was dependent on that one person.

    Whether a data scientist should be able to write queries, that's a separate issue.
    Think back, what is a DBA? Doesn't a sysadmin, the file server person know how to run backups and check logs? Why do we need a separate person when those jobs have been handled? Doesn't a developer know how to create indexes? I was a developer and sysadmin, and I did those things for databases? I certainly created tables, and could schedule jobs, and more.

    Are there people that do a little report writing and call themselves data scientists? sure. Is that valid? Sure, if someone wants to pay. There are also people that do much more, and perform more than simple aggregate analysis, or rolling averages. Just like being a "developer" or a "programmer" or a "network engineer" is an amorphous job. In incorporates lots of skills, responsibilities, and deliverables that vary dramatically from position to position.

  • Steve Jones - SSC Editor - Wednesday, April 26, 2017 9:51 AM

    A couple things. First, I think you're confusing science and analysis with efficiency. They are two different jobs....
    Whether a data scientist should be able to write queries, that's a separate issue....

    Steve, I'll grant that not all data scientists are efficient query writers.  You're correct that they're primarily after the data, not the means to acquire it.  However, with all of the public expectations that a data scientist is much more than a data analyst or database developer, I guess I expect them to have a modicum of skills or at least the common sense to not execute their highly inefficient queries against production!

    I suppose to say that I'm rather unimpressed with the data science hype in general is perhaps an understatement.  As others have said, we've all done much of it already in our careers.  Are there new techniques and ways of looking at data? Of course there are and they are completely worthwhile to learn and implement where appropriate.  Let's just be mindful of the impact we have on others while we do that.

  • Tony++ - Wednesday, April 26, 2017 9:14 AM

    I finished the EDX program early this year.
    ...
    So this program has not  won me a new job as a data scientist.  I'm not using the fun technologies.  But it did improve my thinking, and I'm doing the same job as before, but better.
    Better results in this job leads to great success stories, and success stories opens doors for future jobs -- even if those jobs do not have the data scientist title.
    Success comes from what I do, not what I'm called.

    The arguable part on EDX program is paying for the courses to get the useless (for me) certificate, when the courses can be audited for free.
    I have chosen to see it as as paying EDX/MS for the delivery platform & content.

    Thanks for the notes. That's an interesting perspective and about what I'd expect.

  • Aaron N. Cutshall - Wednesday, April 26, 2017 10:04 AM

    Steve, I'll grant that not all data scientists are efficient query writers.  You're correct that they're primarily after the data, not the means to acquire it.  However, with all of the public expectations that a data scientist is much more than a data analyst or database developer, I guess I expect them to have a modicum of skills or at least the common sense to not execute their highly inefficient queries against production!

    I suppose to say that I'm rather unimpressed with the data science hype in general is perhaps an understatement.

    Who in your career has the common sense to never do anything silly in production? That's a constant issue, and one I've had with programmers, err, developers, constantly.

    There's always hype. Most of it isn't impressive, but some is. Because it impresses the people that write the checks. If they want to hire a data scientist instead of a report writer, I'm their guy. Who cares what they call me?

  • My thinking is that there is a role for data science, but for most small, medium, and even large sized IT organizations; it would probably make more economic and practical sense to out-source their scientific data analysis (not just the infrastructure but also the talent) to a 3rd party provider that specializes in this type of thing, rather than investing in an in-house full-time data scientist. I mean, most organization could benefit from data science, but may not have room in their budget for a $200,000 / year Data Scientist any more than they have funding for $200,000 / year worth of hardware and SQL Server licenses.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Tony++ (love that "++", btw), your post makes me wonder, are certain places better for data scientists than others? For example, when I was unemployed a few years ago, I went to the unemployment office several times seeking help and direction. During one of those visits the counselor that mentioned that I might have a harder time finding a new job in this state because for the most part most developer positions are for what he called, "lunchbox programmers". I didn't fit in with that.

    So perhaps the same is true about data scientists? Are more jobs for them in places like Silicon Valley, NYC or other major places of commerce?

    Kindest Regards, Rod Connect with me on LinkedIn.

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply