Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

A Beginners Look at Hadoop Expand / Collapse
Author
Message
Posted Thursday, June 06, 2013 2:40 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 9:54 AM
Points: 2,866, Visits: 1,707


Hi David, I attended Justin's talk at SQLBits, don't know if you were also there in person (a good introduction I thought).

I was the guy at the front to whom Justin passed on a question.


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1460577
Posted Thursday, June 06, 2013 2:43 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, April 10, 2014 5:48 AM
Points: 107, Visits: 436
Hi,

Thanks so much for this article.
I recently got involved with Hadoop too. I also faced the RAM and 64-Bit CPU limitations on my laptop, so I decided to buy a new one (finally I found a good reason for a brand new computer). I tried the Hortonworks Sandbox and I’m pretty happy with the experience and also with the learning material.

You forgot to mention HCATALOG, which is a metadata and table management system for Hadoop. It is very useful because provides a shared schema and data type mechanism, and a table abstraction, so the users need not be concerned with where or how their data is stored. Also Hcatalog interoperates with HIVE, PIG and other tools.

Kind Regards,


Paul Hernández
http://hernandezpaul.wordpress.com/
https://twitter.com/paul_eng
Post #1460578
Posted Thursday, June 06, 2013 2:55 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 9:54 AM
Points: 2,866, Visits: 1,707

I am interested in the "WHAT" of Hadoop. As in what would it be used for in business? Would you or someone give some hard examples of uses?


This is something that I struggled with when I ran my technical spike. It is A solution, not necessarily THE solution to a number of problems.

The case study that Thoughtworks quote for Autotrader is where millions of PDF documents had to be scanned and relevant facts extracted for the Autotrader web site.

For mining web log files then a dedicated solution such as Tibco Log Logic or Splunk are probably a more targetted solution.

Unless something changes dramatically in the Hadoop architecture I don't see it as being a serious datawarehouse alternative. It is at its heart a file scanning tool. It isn't designed for multi-concurrency, random access, security etc.

I think there is a danger that people think they have to jump onto the Big Data bandwaggon when the reality is that few people actually have genuine Big Data problems. They probably have people, process and politics problems causing technical problems


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1460587
Posted Thursday, June 06, 2013 11:51 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Thursday, March 27, 2014 4:30 AM
Points: 42, Visits: 227
Hortonworks has a tutorial on their own vm where Hadoop is already installed and you ca play with Hive and Pig (no much admin stuff though)

[url=http://hortonworks.com/products/hortonworks-sandbox/][/url]

http://hortonworks.com/products/hortonworks-sandbox/

Post #1460830
Posted Thursday, June 06, 2013 12:56 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Monday, June 10, 2013 10:01 AM
Points: 2, Visits: 9
Thanks for clearing that up for me. Since Hadoop is a great scanning tool for when you have data over a TB, are there tools that can reach in and do discovery data mining on the results? I am not familiar with PIG and the other tools you talked about.

My interest is in taking linguistic works, running them into a database of some sort, and doing discovery mining on them. The base is large enough for Hadoop to be a candidate to help, but it is discovery tools that I am missing the idea for right now.
Post #1460852
Posted Thursday, June 06, 2013 1:03 PM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: 2 days ago @ 10:33 AM
Points: 357, Visits: 1,928
Greg_Della-Croce (6/6/2013)
Thanks for clearing that up for me. Since Hadoop is a great scanning tool for when you have data over a TB, are there tools that can reach in and do discovery data mining on the results? I am not familiar with PIG and the other tools you talked about.

My interest is in taking linguistic works, running them into a database of some sort, and doing discovery mining on them. The base is large enough for Hadoop to be a candidate to help, but it is discovery tools that I am missing the idea for right now.


If you are mining text, then Splunk might be a good alternative. I haven't used it yet myself, but I ahve a colleague that is testing it on corporate network logs that total 500GB-1TB per day. Performance is very good. I think there are options for free downloads and testing.
Post #1460856
Posted Thursday, June 06, 2013 1:09 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: 2 days ago @ 3:16 PM
Points: 5,986, Visits: 6,931
David,

I wanted to thank you for this well laid out walk through of your experiences with Hadoop. The article is rather informative and shows the difficulties in switching from a familiar environment into a less obvious one.

Out of curiousity, did you also evaluate Mongo and are you planning one of these for that as well?



- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions | Forum Netiquette
For index/tuning help, follow these directions. |Tally Tables

Twitter: @AnyWayDBA
Post #1460857
Posted Thursday, June 06, 2013 4:06 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, April 15, 2014 1:30 PM
Points: 2, Visits: 25
I agree that Hadoop should not be used as a replacement for a data warehouse. It does seem to be getting some use as a staging environment for data warehouses. The idea is that it is a good batch processor and can scale well as data grows.

Thoughts?

(I'm messing around with the Hortonworks VM and it seems to work quite well. )
Post #1460914
Posted Thursday, June 06, 2013 4:10 PM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 9:54 AM
Points: 2,866, Visits: 1,707

Out of curiousity, did you also evaluate Mongo and are you planning one of these for that as well?


I haven't evaluated MongoDB as yet. I am more likely to look at Redis (for session state), RIAK (for customer accounts) and Neo4J (potential CRM).

I'm curious about VoltDB which is one of Michael Stonebraker's children. As he has Ingres, PostGres and Vertica to his name VoltDB should be worth a look. I'm curious to know what similarities there are between VoltDB and the SQL2014 Hekaton stuff.

It is quite hard to find the time, energy and resource to do a decent evaluation of such products. How much of my SQL Server knowledge can be leveraged in a comparison of NoSQL and is it even fair to attempt such a comparison? The most I can hope to do is to state precisely what the experiment involved so the methods and results are both up for scrutiny.


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1460916
Posted Friday, June 07, 2013 1:31 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 9:54 AM
Points: 2,866, Visits: 1,707
gclausen (6/5/2013)
Great article!! Do you think it is worth learning a little bit of Java for this?

I feel it is always useful to have at least one non-SQL development language under your belt. If you already have experience with a language then learning Java on top is a good move.

If you haven't got experience yet and you are primarily a SQL Server guy I'd start by learning C#, you will probably use it more. Java and C# have a lot of similarities which isn't surprising given their history. Once you've learnt C# then applying your knowledge to Java should be relatively straight forward.


LinkedIn Profile
Newbie on www.simple-talk.com
Post #1460983
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse