Navigating Hadoop Resources

Question

Navigating Hadoop Resources

Daniel Klionsky

Mr or Mrs. 500

Points: 555
More actions
September 30, 2015 at 10:31 pm

#399123

Comments posted to this topic are about the item Navigating Hadoop Resources

Viewing 15 posts - 1 through 15 (of 29 total)

You must be logged in to reply to this topic. Login to reply

ianstirk Ten Centuries Points: 1310 More actions · Answer 1

Hi, thanks for this very useful overview.

You might want to know that Impala is a popular tool for querying data in Hive quickly (it uses in-memory processing instead of slower MapReduce jobs), often 100 times faster.

In general, it seems like the world of Hadoop is moving away from MapReduce batch jobs, and towards in-memory processing (take a look at Spark).

Most of the technologies for querying data (Hive, Impala, Spark etc) have a version of SQL that most people here will readily pick up.

Enjoy

Ian

curious_sqldba SSC-Dedicated Points: 36502 More actions · Answer 2

curious_sqldba

SSC-Dedicated

Points: 36502

October 1, 2015 at 4:40 am

#1830690

Great post. Any thoughts on polybase in 2016?

ianstirk Ten Centuries Points: 1310 More actions · Answer 3

RE PolyBase... Have a look at the job websites, book sites etc. Hadoop and its components are largely a unix/linux thing. I suspect the market for PolyBase, like HDInsight, is going to be limited.

You can perhaps see parallels with Microsoft's mobile OS when compared with Android etc.

Thanks

Ian

Pradeep Mohanta SSC Rookie Points: 25 More actions · Answer 4

Hi Daniel I am Pradeep Mohanta working as SQL Server Database Architechture. About Hadoop your feeling and learning steps/procedure are same as my feeling and learning steps. Exactly I also did same steps to learn Hadoop. Now I am looking good Institute take a small course about hadoop. I think your prescribe books are very much help full to me, specially I much excited to read the "Microsoft Sql Server 2012 with Hadoop".

Please advice me is it a good decision take training on Hadoop, since I am in Microsoft Technology Last 15 years.

I give rating this article as 5.

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 5

Wow. It made me realize how little I know about Hadoop. Thanks.

rdwilliamsjr1 SSC Rookie Points: 29 More actions · Answer 6

rdwilliamsjr1

SSC Rookie

Points: 29

October 1, 2015 at 6:37 am

#1830721

Great article...a wake up call (for me).

Alan Burstein SSC Guru Points: 61152 More actions · Answer 7

Great article. I agree that there's a need to learn Hadoop, NoSQL, etc as that's the trend and where businesses are heading. The well rounded data professional who knows multiple technologies will certainly have a wider variety of options but I don't completely agree that SQL skills are not enough to land a good job. With Big data, DBaas, Hadoop, NoSQL the pie is just getting bigger and there's more jobs out there but SQL Server is not going away. I live in Chicago so that's my frame of reference and the number of SQL DBA, Developer and BI jobs is endless.

The demand for a solid SQL developer is still growing not shrinking because there's more data in SQL server databases this year than last year. I don't see that changing. And nothing exposes bad code than more data. The amount of data in Hadoop, MongoDB, etc is growing too which is why those skills are also relevant.

I find Polybase quite interesting - I can't wait to see how that plays out and what impact it will have on SQL Server moving forward.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Daniel Klionsky Mr or Mrs. 500 Points: 555 More actions · Answer 8

Yes, you are right.

There is a variety of the SQL frameworks on a top of Hadoop (besides Hive).

That is the subject of my next article I'm currently working on.

Daniel Klionsky Mr or Mrs. 500 Points: 555 More actions · Answer 9

Pradeep Mohanta (10/1/2015)
Hi Daniel I am Pradeep Mohanta working as SQL Server Database Architechture. About Hadoop your feeling and learning steps/procedure are same as my feeling and learning steps. Exactly I also did same steps to learn Hadoop. Now I am looking good Institute take a small course about hadoop. I think your prescribe books are very much help full to me, specially I much excited to read the "Microsoft Sql Server 2012 with Hadoop".
Please advice me is it a good decision take training on Hadoop, since I am in Microsoft Technology Last 15 years.
I give rating this article as 5.

Hi Pradeep,

I attended this live session[/url] and I liked it.

Skillspeed.com is an India-based training company. They conduct free webinars as well as paid training.

To get a taste you can sign up for their virtual meetup[/url]

Daniel

Daniel Klionsky Mr or Mrs. 500 Points: 555 More actions · Answer 10

ianstirk (10/1/2015)
Hi, thanks for this very useful overview.
You might want to know that Impala is a popular tool for querying data in Hive quickly (it uses in-memory processing instead of slower MapReduce jobs), often 100 times faster.
In general, it seems like the world of Hadoop is moving away from MapReduce batch jobs, and towards in-memory processing (take a look at Spark).
Most of the technologies for querying data (Hive, Impala, Spark etc) have a version of SQL that most people here will readily pick up.
Enjoy
Ian

Ian,

I agree - MapReduce jobs are slower.

But the world is not abandoning MapReduce framework.

The newer in-memory sql-like processing engines ( Impala, Presto) made queries to run much faster indeed. But fundamentally, because of the exclusive memory usage, all of them have these two challenges:

1. the fault-tolerance issue - if the the query fails in the middle - it is gone. You have to start over. (That is unlike good old Hive on MapReduce, which stores intermediate results on the disk and tries to auto restart when failed)

2. the memory size - if the data does not fit into memory - the query will crash. So, for massively large data sets ( hundreds of gigabytes, terabytes) in-memory processing may not work.

Daniel

Daniel Klionsky Mr or Mrs. 500 Points: 555 More actions · Answer 11

ianstirk (10/1/2015)
Hi, thanks for this very useful overview.
You might want to know that Impala is a popular tool for querying data in Hive quickly (it uses in-memory processing instead of slower MapReduce jobs), often 100 times faster.
In general, it seems like the world of Hadoop is moving away from MapReduce batch jobs, and towards in-memory processing (take a look at Spark).
Most of the technologies for querying data (Hive, Impala, Spark etc) have a version of SQL that most people here will readily pick up.
Enjoy
Ian

Ian,

I agree - MapReduce jobs are slower.

But the world is not abandoning MapReduce framework.

The newer in-memory sql-like processing engines ( Impala, Presto) made queries to run much faster indeed. But fundamentally, because of the exclusive memory usage, all of them have these two challenges:

1. fault-tolerance issue - if the the query fails in the middle - it is gone. You have to start over. (That is unlike good old Hive on MapReduce, which stores intermediate results on the disk and tries to auto restart when failed)

2. the memory size. If the data does not fit in - query will crash. So for massively large data sets ( hundreds of gigabytes, terabytes) in-memory processing may not work.

Daniel

ianstirk Ten Centuries Points: 1310 More actions · Answer 12

Hi Daniel,

yes you are right of course. I was talking in terms of generalities...

The Hadoop world in general is moving towards in-memory processing instead of MapReduce batch processing due to performance. But of course it needs lots of memory, and it may have limited restart capabilities.

thanks

Ian

Daniel Klionsky Mr or Mrs. 500 Points: 555 More actions · Answer 13

Pradeep Mohanta (10/1/2015)
Hi Daniel I am Pradeep Mohanta working as SQL Server Database Architechture. About Hadoop your feeling and learning steps/procedure are same as my feeling and learning steps. Exactly I also did same steps to learn Hadoop. Now I am looking good Institute take a small course about hadoop. I think your prescribe books are very much help full to me, specially I much excited to read the "Microsoft Sql Server 2012 with Hadoop".
Please advice me is it a good decision take training on Hadoop, since I am in Microsoft Technology Last 15 years.
I give rating this article as 5.

Pradeep,

I just got this email today (as a member of `BIG-Data-Hadoop-Analytics-Learning-Group` meetup group, but you do not need to be a member )

One of our sponsors - Skillspeed - provides an amazing live project based course on BIG Data & Hadoop and for the first time they're opening it up to everyone. You can drop by to attend the first 2 modules - 6 Hours of Live Training, 4 Hours of Practicals - for no commitments whatsoever. It's a 100% free trial. 🙂

Please click here to get details & register.[/url]

Cheers!

chudman SSCrazy Points: 2453 More actions · Answer 14

What is the need for web log files, and therefore Flume?

Thanks

Jeff

StLouisMO