• ebooklub (10/3/2015)


    Hi Daniel, thank a lot about DBA carrier path article.

    Our company (investment bank ) few months ago came out with plan to use Hadoop.

    One part is clear ,they want to use Hortnonworks distribution.

    BUT responsibilities not defined ,What DBA team exactly should do..

    We are team of ORACLE and MSSQL/Sybase DBA(s) .

    Currently if we have disks spaces issues or Win/OS performance problem we invite or delegate task to System administrators.

    Who is going to manage Hadoop in our company ? My guess it going stay the same way

    1.DBA will add/remove nodes to cluster, monitor alerts /failed jobs/invalid code

    2. Sys Admins will take care of OS security /performance/disk space

    When it is going to happen is different story..

    You article raise interesting questions while looking for next job /postion

    1.Would company hire me/you as Hadoop DBA if we have 15 + years experience as SQL DBA (and we expect to be paid big $ for our knowledge), but our Hadoop experience is limited to “home” projects involved cluster with 3-6 nodes and working knowledge of HDFS, Flume, Sqoop and Hive

    2.I did “Hadoop”job search in Eastern Canada and US and most of the jobs related to Hadoop are referring to Data Scientist with knowledge of Hadoop.

    So the questions is: How to position our self on Job Market allowing potential employer will see us?

    Are we SQL DBA with knowledge of Hadoop, Hadoop cluster administrators or someone also?

    3.Currently I searching resources for volunteering in Hadoop administration to gain practical troubleshooting experience . Did anyone succeed in finding those resources ?

    The is few links bellow helped me better to understand role of Hadoop DBA, but they dated to 2013..

    Hadoop Market might be changed.

    http://www.pythian.com/blog/hadoop-faq-but-what-about-the-dbas/

    https://www.linkedin.com/pulse/hadoop-admin-job-responsibilities-sudhaa-gopinath

    Thank you

    Alex

    Hi Alex,

    I believe, the following activities are still relevant in Hadoop world:

    - working with complex sql

    - data modeling

    - performance and tuning

    In my opinion, Sql server dba/developer loosely translates into the 'Data Engineer' position in Hadoop ecosystem ( and not to "Data Scientist").

    Here is a link to how Claudera ( competitor to Hortonworks, very popular in San Francisco Bay Area ) defines "Data Engineer" duties.

    http://certification.cloudera.com/CCP-DE.html

    In the past 1-2 years, a new generation of SQL engines over Hadoop became popular; namely Apache Spark.

    It is very fast and uses SQL but in order to use it correctly you also need to know Java ( or other scripting languages like Python or Scala)

    So, in the case of Apache Spark (or Impala, another fast engine), you can not avoid a deep learning curve!

    Cheers,

    Daniel