SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


The Power of Hadoop


The Power of Hadoop

Author
Message
Steve Jones
Steve Jones
SSC Guru
SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)

Group: Administrators
Points: 83919 Visits: 19223
Comments posted to this topic are about the item The Power of Hadoop

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
Eric M Russell
Eric M Russell
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16914 Visits: 10949
"Where traditional databases hit their limits, Hadoop starts to emerge as a much better fit for solving unique analytics challenges," Lockner says. "Because data can be incorporated from multiple sources with varying types of data structures, Hadoop enables more analysis across multiple data feeds in a single platform -- solving some of the toughest data integration challenges commonly associated with relational data warehouse architecture."

The article doesn't provide details about what type of data they working with, it's only described as being more or less unstructured, originating from multiple sources, and being analyzed for security purposes. If I had to guess, they are probably using Hadoop as a staging environment to injest things like emails or web forum posts and searching for keywords and phrases that would indicate security threats. For example, there are websites where hackers go to trade or sell account numbers, logins, and other personal information.
It makes sense to do this sort of document archiving and semantic data crunching outside the relational database.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
dshaddock
dshaddock
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 21
I was curious about Hadoop, and that let me to Cassandra and DataStax and tutorials and forums--and near the end of my search I still didn't really understand whether this would be useful for me. I saw a lot of big names using it, but I didn't grasp the non-relational database concept. Then I finally read a blog entry from Arin Sarkissian that (although rather crudely written) explained the concept. And I see how and why it works--for blogs and comments, or for social networking sites, or whatever. But it won't work efficiently for my job--providing a way to track huge amounts of store inventory data and generate very quick datamining reports for customers. If I were tracking data that was less structured but more inclined to be enormous in its scope, and I needed to ensure I could keep it all and keep it intact forever, Hadoop and all the things related to it would be, perhaps, ideal.
Steve Jones
Steve Jones
SSC Guru
SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)SSC Guru (83K reputation)

Group: Administrators
Points: 83919 Visits: 19223
dshaddock (2/28/2012)
I was curious about Hadoop, and that let me to Cassandra and DataStax and tutorials and forums--and near the end of my search I still didn't really understand whether this would be useful for me. I saw a lot of big names using it, but I didn't grasp the non-relational database concept. Then I finally read a blog entry from Arin Sarkissian that (although rather crudely written) explained the concept. And I see how and why it works--for blogs and comments, or for social networking sites, or whatever. But it won't work efficiently for my job--providing a way to track huge amounts of store inventory data and generate very quick datamining reports for customers. If I were tracking data that was less structured but more inclined to be enormous in its scope, and I needed to ensure I could keep it all and keep it intact forever, Hadoop and all the things related to it would be, perhaps, ideal.


That's exactly the point, and one that should be brought up to management when they see Hadoop or Cassandra as the latest fad. It works for some places, not for others.

Follow me on Twitter: @way0utwest
Forum Etiquette: How to post data/code on a forum to get the best help
My Blog: www.voiceofthedba.com
SQLRNNR
SQLRNNR
SSC-Forever
SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)SSC-Forever (41K reputation)

Group: General Forum Members
Points: 41141 Visits: 18565
I am curious to see more of how Hadoop will play in the MS BI stack with big data. Is it more of a fad, or does it really have staying power?



Jason AKA CirqueDeSQLeil
I have given a name to my pain...
MCM SQL Server, MVP


SQL RNNR

Posting Performance Based Questions - Gail Shaw

Eric M Russell
Eric M Russell
SSCoach
SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)SSCoach (16K reputation)

Group: General Forum Members
Points: 16914 Visits: 10949
SQLRNNR (2/28/2012)
I am curious to see more of how Hadoop will play in the MS BI stack with big data. Is it more of a fad, or does it really have staying power?

There are many applications that have a need to store billions of entity-attribute-value type records, in addition to their more traditional relational data. An example of this would be Microsoft's own Entity Framework product. I think it's in Microsoft's best interest to demonstrate that, for this scenario, the best solution is for a relational database (like SQL Server) to co-exist side by side with a key-value optimized database solution. Attempts at implementing large scale key-value data models in SQL Server will result in frustrated customers.
Essentially it's same concept as moving OLAP cube data outside SQL Server and into a seperate database product, like Microsoft Analysis Services or moving BLOBs into FileStream.


"The universe is complicated and for the most part beyond your control, but your life is only as complicated as you choose it to be."
dshaddock
dshaddock
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 21
You're right on track, I believe. Many years ago--before Windows and the Office suite--I constructed a roll-your-own database I called Megabase, because I wanted to store absolutely anything I felt like--pictures, code snippets, contact information, etc. It was hard to be all things to all people--even when the 'all people' was just me. I used InfoSelect for a while but was frustrated by its structured approach--although it was great to find obscure notes. And when I landed on OneNote I found I love it--but it's based on SQL Server, and I know when I store such a wide variety of junk in it that it has to be fairly bloated and less efficient than it might be if it were based on Hadoop. And perhaps the best architecture is to run something like OneNote with parts of it in SQL Server and parts in Hadoop...

David Shaddock
David.Poole
David.Poole
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10304 Visits: 3341
I've been playing with Hadoop both locally and in AWS and although a newbie to it I've had a few reality checks with it.

Firstly, it is still in the early phase of the Gartner Hype Cycle. It has yet to go through the "trough of disillusionment" let alone the "Slope of enlightenment" or to the "Plateau of Productivity".

I had it grinding up a few billion records on 4 large AWS nodes and the answer I wanted came back in 14 seconds.

Out of curiosity I took the same recordset and imported it into modest SQL Server 2008R2 instance. The same text crunching took 10 seconds!

The conclusions I draw from this are as follows:-

  • There is obviously a threshold that has to be reached before Hadoop delivers a clear advantage

  • That threshold is going to depend on the complexity of what you are trying to do to that data. I was simply extracting parts of a web log.

  • The big advantage of Hadoop is the fact that it runs on commodity kit and has been designed with the expectation that such kit will suffer failures.

  • Hadoop clusters under utilize their CPU resource, its the disk IO isolation they champion on. Rainstor have an interesting compression and data location awareness technology to boost the performance of Hadoop.

  • Apache subprojects such as Hive and PIG are essential for wider scale Hadoop adoption.



Setting up Hadoop & Hive was a baptism of fire as I was and still a Linux newbie.
These tools are 0.x releases so the instructions are of varying levels of completeness and accuracy.
There are loads of instructions out there, but they vary quite a bit.

You'll find that IF you have the pre-requisites up and running installing stuff on Linux is no worse than any other code deployment in your organisation.Hehe

If you don't have the prerequisites you will find yourself tracing through the dependencies or trying to work out what those dependencies might be. It isn't always clear and the error messages are largely Java error reports. Just too long to fit in a scrolling window and the important bit has just fallen out of the scroll window buffer!

A basic understanding of Linux is an absolute must.

I was relieved to find that the Linux community is no longer a training ground for the special forces squadrons of the troll army.

LinkedIn Profile

Newbie on www.simple-talk.com
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search