A Beginners Look at Hadoop

  • Eric, you've reminded me of a good point. There seem to be five key Hadoop distributions.

    • Hortonworks = 100% open source
    • Cloudera = open source with Cloudera Navigator, Manager and Director fulfilling security, management and governance capability
    • IBM = Core of hadoop but uses GPFS file system, increasing number of IBM proprietary extensions. Big Insights is full blown SQL over Hadoop
    • Pivotal = HDFS plus Gemfire and HawQ query engines
    • MapR = Posix compliant file system and a number of proprietary components aimed at HA/DR and enterprise grade usage

    The megavendors (ORACLE, Microsoft, Teradata etc) seem to take one of the above distributions and replace various components with their own particular field of expertise.

    There is no "best" distribution. Each one has a valid argument for why you should choose it. The job of choosing a distribution is an interesting challenge. Ultimately it is looking at the problem you are trying to solve, deciding on the culture of your organisation and choosing the distribution and vendor that best matches those factors.

  • I think AWS is a major player as well, EMR can use traditional HDFS as well as their own implementation of HDFS sitting on the top of S3 (EMRFS)

  • From what I've read AWS EMR is an implementation of MapR

  • Great article, although since technology moves so fast that can be a slight downside of a reposted editorial.

    Hope you managed to get a new PC sorted in that time! 😀

    qh

    [font="Tahoma"]Who looks outside, dreams; who looks inside, awakes. – Carl Jung.[/font]
  • babu.manoharan-1113385 (6/10/2013)


    nice work, but couple of things wrong here being an SQL server guy you should not focus on Cloudera because Cloudera is blessed by ORACLE and Hortonworks is blessed by Microsoft, the two eco-systems has some minor differences, hortonworks works with Windows Server, you no need learn LINUX or JAVA. By learning LINUX and going through traditional BIG-DATA approch you are trying to be master in both worlds. but my advice is as an SQL Server person you will be fine if you try to master the HORTONWORKS Big data offering.

    I realize it is pretty old post. The author has given excellent description how to get your feet wet in Hadoop . You cannot dictate what he should or should not be using.

    To your point, Polybase uses Linux for there Hadoop system. Do some research and learn some basic manners when you are commenting in a public forum.

  • So much has changed in the past year let alone the 3 since I wrote the original article.

    Some traditional vendors initially lined up behind a particular distribution but have now hedged their bets by supporting both Hortonworks and Cloudera.

    There are some factual errors in the original article that I can now see having had more experience.

    I did look at Hortonworks on Windows but the vast majority of effort in the Big Data and NOSQL space was and is taking place on Linux. Even Microsoft are embracing Linux.

    As a private individual I find the convenience of using Vagrant to spin up a Linux box and have a scripted install of whatever technology I want on that box is a major plus.

    Am I trying to be the master of two worlds? I'd settle for being competent with an aspiration to being above average. These days I think it pays to have a broad spectrum of capabilities. SQL Server is far from being a niche but the ability to see and use SQL Server in a broader context is important. SQL Server becomes A tool not THE tool although the more I use other DB technology the more I appreciate Microsoft.

Viewing 6 posts - 31 through 35 (of 35 total)

You must be logged in to reply to this topic. Login to reply