A Beginners Look at Hadoop

Question

Post reply

A Beginners Look at Hadoop

David.Poole

SSC Guru

Points: 75906
More actions
June 5, 2013 at 12:00 am

#100523

Comments posted to this topic are about the item A Beginners Look at Hadoop
LinkedIn Profile

Viewing 15 posts - 1 through 15 (of 35 total)

You must be logged in to reply to this topic. Login to reply

paul s-306273 SSChampion Points: 10727 More actions · Answer 1

Nice article, but I doubt I have the confidence to try this at home...

David.Poole SSC Guru Points: 75906 More actions · Answer 2

paul s-306273 (6/5/2013)
Nice article, but I doubt I have the confidence to try this at home...

With Virtual Box and Ubuntu its pretty easy to get started with Linux.

Using the Cloudera image is also very easy.

If you cock it all you will lose is a bit of disk space until such time as you delete the virtual image. Well, maybe a couple of evenings as well!

The only reason I haven't got it running on my home PC is that I've got an ancient 32bit PC with 3GB RAM running Windows XP and cannot run the Cloudera versions after CDH3 and CDH4.1 onwards are the ones where you have a GUI to play with.

Have a look at the SQLBits website for Justin Langford's presentation on HDInsight. Again, its very easy to install this on a Windows box and get started. The idea is that you have a local instance even though it is just one node. It's good enough to try the mechanical bits and bobs even if you haven't got the benefit of the full map-reduce.

LinkedIn Profile

paul s-306273 SSChampion Points: 10727 More actions · Answer 3

Okay - I'll try that when I have some free evenings.

Thanks David.

Ross McMicken SSCarpal Tunnel Points: 4398 More actions · Answer 4

You might also want to take a look at Splunk. I've heard good things about its ability to analyze gigabytes of data quickly. http://www.splunk.com/?r=header

Paul Brewer SSCrazy Points: 2955 More actions · Answer 5

Great article, the first I've read that clearly and simply explains what Hadoop is. Thanks

gclausen SSC Enthusiast Points: 112 More actions · Answer 6

Great article!! Do you think it is worth learning a little bit of Java for this?

alen teplitsky SSC-Dedicated Points: 30014 More actions · Answer 7

i'm looking at this as well. i've had it with full text indexing for a security log analysis solution i built up over the years. looking at analysis services and hadoop.

playing with SSAS for now and will try hadoop later.

Greg_Della-Croce SSC Rookie Points: 44 More actions · Answer 8

I enjoyed the article. Like others I have been in the Windows world for a very long time, and Linux is something of a new adventure for me. I just finished, with a lot of help from my friends, a RedHat 17 based NAS for my home network.

I am interested in the "WHAT" of Hadoop. As in what would it be used for in business? Would you or someone give some hard examples of uses?

GregDC

alex.d.garland Right there with Babe Points: 749 More actions · Answer 9

David.Poole (6/5/2013)
Have a look at the SQLBits website for Justin Langford's presentation on HDInsight. Again, its very easy to install this on a Windows box and get started. The idea is that you have a local instance even though it is just one node. It's good enough to try the mechanical bits and bobs even if you haven't got the benefit of the full map-reduce.

Hi David, I attended Justin's talk at SQLBits, don't know if you were also there in person (a good introduction I thought).

I had a quick word with him at the end and asked if there were any good walkthroughs or practical exercises for HDInsight newbies and he recommended this by Cindy Gross: http://blogs.msdn.com/b/cindygross/archive/2013/01/31/mash-up-hive-sql-server-data-in-powerpivot-amp-power-view-hurricane-sandy-2012.aspx

I haven't had a chance to work through it yet but may well do soon, like yourself I've recently got a Linux installation up and running (Fedora) but suspect that the MS version will be a gentler learning curve to start off with.

David.Poole SSC Guru Points: 75906 More actions · Answer 10

Hi David, I attended Justin's talk at SQLBits, don't know if you were also there in person (a good introduction I thought).

I was the guy at the front to whom Justin passed on a question.

LinkedIn Profile

Paul Hernández SSCarpal Tunnel Points: 4961 More actions · Answer 11

Hi,

Thanks so much for this article.

I recently got involved with Hadoop too. I also faced the RAM and 64-Bit CPU limitations on my laptop, so I decided to buy a new one 😀 (finally I found a good reason for a brand new computer). I tried the Hortonworks Sandbox and I’m pretty happy with the experience and also with the learning material.

You forgot to mention HCATALOG, which is a metadata and table management system for Hadoop. It is very useful because provides a shared schema and data type mechanism, and a table abstraction, so the users need not be concerned with where or how their data is stored. Also Hcatalog interoperates with HIVE, PIG and other tools.

Kind Regards,

Paul Hernández

David.Poole SSC Guru Points: 75906 More actions · Answer 12

I am interested in the "WHAT" of Hadoop. As in what would it be used for in business? Would you or someone give some hard examples of uses?

This is something that I struggled with when I ran my technical spike. It is A solution, not necessarily THE solution to a number of problems.

The case study that Thoughtworks quote for Autotrader is where millions of PDF documents had to be scanned and relevant facts extracted for the Autotrader web site.

For mining web log files then a dedicated solution such as Tibco Log Logic or Splunk are probably a more targetted solution.

Unless something changes dramatically in the Hadoop architecture I don't see it as being a serious datawarehouse alternative. It is at its heart a file scanning tool. It isn't designed for multi-concurrency, random access, security etc.

I think there is a danger that people think they have to jump onto the Big Data bandwaggon when the reality is that few people actually have genuine Big Data problems. They probably have people, process and politics problems causing technical problems

LinkedIn Profile

erin.north SSC-Addicted Points: 407 More actions · Answer 13

Hortonworks has a tutorial on their own vm where Hadoop is already installed and you ca play with Hive and Pig (no much admin stuff though)

[/url]

http://hortonworks.com/products/hortonworks-sandbox/

Greg_Della-Croce SSC Rookie Points: 44 More actions · Answer 14

Thanks for clearing that up for me. Since Hadoop is a great scanning tool for when you have data over a TB, are there tools that can reach in and do discovery data mining on the results? I am not familiar with PIG and the other tools you talked about.

My interest is in taking linguistic works, running them into a database of some sort, and doing discovery mining on them. The base is large enough for Hadoop to be a candidate to help, but it is discovery tools that I am missing the idea for right now.