• I wrote this article just over 18 months ago and in that time the entire Hadoop ecosystem has evolved considerably. The whole Big Data space is moving so fast that if you were to buy a book and spend 1 hour per day working through the exercises the book would be obsolete by the time you reached the end. In my opinion text books in this arena are only useful for teaching concepts. They are not reference books as they would be if written for SQL Server.

    YARN = Yet Another Resource Negotiator. This allows YARN compatible Hadoop plugins to be assigned proportions of the available resources. It is good for allowing multiple and mixed loads to be run at once. It is so much more than MapReduce 2.

    Then there is Slider. Slider is a container technology that sits on top of YARN. This means that even if you have a Hadoop app that is not compatible with YARN, if it runs in a Slider container it will be resource managed by YARN.

    A huge amount of work has gone into TEZ and the Stinger initiative which boosts the speed of HIVE queries by a dramatic amount. Claims are for a 100x improvement over the original HIVE.

    Vendors such as IBM and Pivotal have decided that HDFS is not robust enough for the enterprise so have replaced it with their own tech.

    Hadoop 1.0 had the Name Node as a weak point. Hadoop 2.0 is much more tolerant of name node failure.

    Apache projects such as Knox and Ranger address a number of security concerns.

    The Apache Spark computational framework is an alternative to MapReduce and can use HDFS. Spark offers a layer of abstraction to make it much easier to write distributed compute code. A number of applications are being ported to make use of Apache Spark.

    The main vendors are now no-longer claiming that Hadoop replaces the traditional data warehouse. They are positioning it as a complementary technology. The genius of AND vs the tyranny of OR. Teradata have embraced AsterData and Hadoop in a very impressive way so each of the three capitalises on the strengths of the other parts.

    All in all Hadoop has gone through the hype cycle and people are now gaining realistic expectations as to what the technology can give them.