I've been playing with Hadoop both locally and in AWS and although a newbie to it I've had a few reality checks with it.
Firstly, it is still in the early phase of the Gartner Hype Cycle. It has yet to go through the "trough of disillusionment" let alone the "Slope of enlightenment" or to the "Plateau of Productivity".
I had it grinding up a few billion records on 4 large AWS nodes and the answer I wanted came back in 14 seconds.
Out of curiosity I took the same recordset and imported it into modest SQL Server 2008R2 instance. The same text crunching took 10 seconds!
The conclusions I draw from this are as follows:-
Setting up Hadoop & Hive was a baptism of fire as I was and still a Linux newbie.
These tools are 0.x releases so the instructions are of varying levels of completeness and accuracy.
There are loads of instructions out there, but they vary quite a bit.
You'll find that IF you have the pre-requisites up and running installing stuff on Linux is no worse than any other code deployment in your organisation.:hehe:
If you don't have the prerequisites you will find yourself tracing through the dependencies or trying to work out what those dependencies might be. It isn't always clear and the error messages are largely Java error reports. Just too long to fit in a scrolling window and the important bit has just fallen out of the scroll window buffer!
A basic understanding of Linux is an absolute must.
I was relieved to find that the Linux community is no longer a training ground for the special forces squadrons of the troll army.