A Beginners Look at Hadoop

  • David.Poole

    SSC Guru

    Points: 75199

    Comments posted to this topic are about the item A Beginners Look at Hadoop

  • paul s-306273

    SSChampion

    Points: 10602

    Nice article, but I doubt I have the confidence to try this at home...

  • David.Poole

    SSC Guru

    Points: 75199

    paul s-306273 (6/5/2013)


    Nice article, but I doubt I have the confidence to try this at home...

    With Virtual Box and Ubuntu its pretty easy to get started with Linux.

    Using the Cloudera image is also very easy.

    If you cock it all you will lose is a bit of disk space until such time as you delete the virtual image. Well, maybe a couple of evenings as well!

    The only reason I haven't got it running on my home PC is that I've got an ancient 32bit PC with 3GB RAM running Windows XP and cannot run the Cloudera versions after CDH3 and CDH4.1 onwards are the ones where you have a GUI to play with.

    Have a look at the SQLBits website for Justin Langford's presentation on HDInsight. Again, its very easy to install this on a Windows box and get started. The idea is that you have a local instance even though it is just one node. It's good enough to try the mechanical bits and bobs even if you haven't got the benefit of the full map-reduce.

  • paul s-306273

    SSChampion

    Points: 10602

    Okay - I'll try that when I have some free evenings.

    Thanks David.

  • Ross McMicken

    SSCarpal Tunnel

    Points: 4373

    You might also want to take a look at Splunk. I've heard good things about its ability to analyze gigabytes of data quickly. http://www.splunk.com/?r=header

  • Paul Brewer

    SSCrazy

    Points: 2827

    Great article, the first I've read that clearly and simply explains what Hadoop is. Thanks

  • gclausen

    SSC Enthusiast

    Points: 112

    Great article!! Do you think it is worth learning a little bit of Java for this?

  • alen teplitsky

    SSC-Dedicated

    Points: 30014

    i'm looking at this as well. i've had it with full text indexing for a security log analysis solution i built up over the years. looking at analysis services and hadoop.

    playing with SSAS for now and will try hadoop later.

  • Greg_Della-Croce

    SSC Rookie

    Points: 44

    I enjoyed the article. Like others I have been in the Windows world for a very long time, and Linux is something of a new adventure for me. I just finished, with a lot of help from my friends, a RedHat 17 based NAS for my home network.

    I am interested in the "WHAT" of Hadoop. As in what would it be used for in business? Would you or someone give some hard examples of uses?

    GregDC

  • alex.d.garland

    Right there with Babe

    Points: 749

    David.Poole (6/5/2013)


    Have a look at the SQLBits website for Justin Langford's presentation on HDInsight. Again, its very easy to install this on a Windows box and get started. The idea is that you have a local instance even though it is just one node. It's good enough to try the mechanical bits and bobs even if you haven't got the benefit of the full map-reduce.

    Hi David, I attended Justin's talk at SQLBits, don't know if you were also there in person (a good introduction I thought).

    I had a quick word with him at the end and asked if there were any good walkthroughs or practical exercises for HDInsight newbies and he recommended this by Cindy Gross: http://blogs.msdn.com/b/cindygross/archive/2013/01/31/mash-up-hive-sql-server-data-in-powerpivot-amp-power-view-hurricane-sandy-2012.aspx

    I haven't had a chance to work through it yet but may well do soon, like yourself I've recently got a Linux installation up and running (Fedora) but suspect that the MS version will be a gentler learning curve to start off with.

  • David.Poole

    SSC Guru

    Points: 75199

    Hi David, I attended Justin's talk at SQLBits, don't know if you were also there in person (a good introduction I thought).

    I was the guy at the front to whom Justin passed on a question.

  • Paul Hernández

    SSCarpal Tunnel

    Points: 4880

    Hi,

    Thanks so much for this article.

    I recently got involved with Hadoop too. I also faced the RAM and 64-Bit CPU limitations on my laptop, so I decided to buy a new one 😀 (finally I found a good reason for a brand new computer). I tried the Hortonworks Sandbox and I’m pretty happy with the experience and also with the learning material.

    You forgot to mention HCATALOG, which is a metadata and table management system for Hadoop. It is very useful because provides a shared schema and data type mechanism, and a table abstraction, so the users need not be concerned with where or how their data is stored. Also Hcatalog interoperates with HIVE, PIG and other tools.

    Kind Regards,

    Paul Hernández
  • David.Poole

    SSC Guru

    Points: 75199

    I am interested in the "WHAT" of Hadoop. As in what would it be used for in business? Would you or someone give some hard examples of uses?

    This is something that I struggled with when I ran my technical spike. It is A solution, not necessarily THE solution to a number of problems.

    The case study that Thoughtworks quote for Autotrader is where millions of PDF documents had to be scanned and relevant facts extracted for the Autotrader web site.

    For mining web log files then a dedicated solution such as Tibco Log Logic or Splunk are probably a more targetted solution.

    Unless something changes dramatically in the Hadoop architecture I don't see it as being a serious datawarehouse alternative. It is at its heart a file scanning tool. It isn't designed for multi-concurrency, random access, security etc.

    I think there is a danger that people think they have to jump onto the Big Data bandwaggon when the reality is that few people actually have genuine Big Data problems. They probably have people, process and politics problems causing technical problems

  • erin.north

    Old Hand

    Points: 357

    Hortonworks has a tutorial on their own vm where Hadoop is already installed and you ca play with Hive and Pig (no much admin stuff though)

    [/url]

    http://hortonworks.com/products/hortonworks-sandbox/

  • Greg_Della-Croce

    SSC Rookie

    Points: 44

    Thanks for clearing that up for me. Since Hadoop is a great scanning tool for when you have data over a TB, are there tools that can reach in and do discovery data mining on the results? I am not familiar with PIG and the other tools you talked about.

    My interest is in taking linguistic works, running them into a database of some sort, and doing discovery mining on them. The base is large enough for Hadoop to be a candidate to help, but it is discovery tools that I am missing the idea for right now.

Viewing 15 posts - 1 through 15 (of 36 total)

You must be logged in to reply to this topic. Login to reply