Hadoop many flavors of SQL

  • Comments posted to this topic are about the item Hadoop many flavors of SQL

  • Hive is not necessarily slow, there is now Hive on Tez working on YARN from Hortonworks (HDP 2.0). http://hortonworks.com/hadoop/tez/#section_1

    Technologies keeps updating and being invented in big data. Yes they all use something very like SQL but behind scene they are all so different (implemented in application language). To master it, you have to dig into how it works, which means understand how sql translated into map reduce jobs etc and maybe how to optimize it. And really it's just looks like, behind scene these are so different, forget about batch based processing in T-SQL. I have learnt some big data things (hive, pig), the more I learn the more I am thinking will the job (assuming there is a big data data warehouse developer role) be more suitable for a Java developer who learns SQL or a traditional database developer who learns Java? Both will be fine imo, but latter is much more difficult.

  • This just points out to me how much I really don't know.

  • This is not complete without mentioning Hawq from Pivotal, which is much faster, and scale better than any of the mentioned SQL implementations.

    - Full fault tolerance (HDFS)

    - ANSI SQL-92, -99, and -2003 support

    Usage:

    - ETL/ELT using PFX to expose external files or Websites as tables inside Hawq

    - Advanced analytics including MADlib integration

  • YesSQL !

    Apparently coding MapRedice is ok for academics and geeks, but as Hadoop's usage expanded into the commercial realm (ie: paying customers), folks started asking:

    "Where's the SQL?".

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (11/24/2015)


    YesSQL !

    Apparently coding MapRedice is ok for academics and geeks, but as Hadoop's usage expanded into the commercial realm (ie: paying customers), folks started asking:

    "Where's the SQL?".

    Use Spark?

    Unfortunately, learning another language is going to be key with Hadoop. You cannot slide by doing everything in just SQL. Python for one good example is extremely easy. Combine Python with SQL for Spark and you are able to do some magic without having to do MapReduce in Java. I personally use a Python framework for MapReduce that works great without having to do Java.

    If you limit yourself to not doing anything because you don't want to learn another language, then you're only hindering yourself from a really great tool.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply