Blog Post

Data Scientist versus Data Engineer #datascience #bigdata #analytics

,

One of the main differences between a data scientist and a data engineer has to do with ETL versus DAD:

ETL (Extract/Load/Transform) is for data engineers, or sometimes data architects or database administrators (DBA).

DAD (Discover/Access /Distill) is for data scientists.

Data engineers tend to focus on software engineering, data base design, production code, and making sure data is flowing smoothly between source (where it is collected) and destination (where it is extracted and processed, with statistical summaries and output produced by data science algorithms, eventually moved back to the source or elsewhere). Data scientists, while they need to understand this data flow (and how it is optimized, especially when working with Hadoop) don’t actually optimize the data flow itself, but rather the data processing step: extracting value from data. But they work with engineers and business people to define the metrics, design data collecting schemes and make sure data science processes integrate efficiently with the enterprise data systems (storage, data flow). This is especially true for data scientists working in small companies, and a reason why data scientists should be able to write code (more and more, Python) re-usable by engineers.

via Data Scientist versus Data Engineer.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating