Navigating Hadoop Resources

  • Comments posted to this topic are about the item Navigating Hadoop Resources

  • Hi, thanks for this very useful overview.

    You might want to know that Impala is a popular tool for querying data in Hive quickly (it uses in-memory processing instead of slower MapReduce jobs), often 100 times faster.

    In general, it seems like the world of Hadoop is moving away from MapReduce batch jobs, and towards in-memory processing (take a look at Spark).

    Most of the technologies for querying data (Hive, Impala, Spark etc) have a version of SQL that most people here will readily pick up.

    Enjoy

    Ian

  • Great post. Any thoughts on polybase in 2016?

  • RE PolyBase... Have a look at the job websites, book sites etc. Hadoop and its components are largely a unix/linux thing. I suspect the market for PolyBase, like HDInsight, is going to be limited.

    You can perhaps see parallels with Microsoft's mobile OS when compared with Android etc.

    Thanks

    Ian

  • Hi Daniel I am Pradeep Mohanta working as SQL Server Database Architechture. About Hadoop your feeling and learning steps/procedure are same as my feeling and learning steps. Exactly I also did same steps to learn Hadoop. Now I am looking good Institute take a small course about hadoop. I think your prescribe books are very much help full to me, specially I much excited to read the "Microsoft Sql Server 2012 with Hadoop".

    Please advice me is it a good decision take training on Hadoop, since I am in Microsoft Technology Last 15 years.

    I give rating this article as 5.

  • Wow. It made me realize how little I know about Hadoop. Thanks.

  • Great article...a wake up call (for me).

  • Great article. I agree that there's a need to learn Hadoop, NoSQL, etc as that's the trend and where businesses are heading. The well rounded data professional who knows multiple technologies will certainly have a wider variety of options but I don't completely agree that SQL skills are not enough to land a good job. With Big data, DBaas, Hadoop, NoSQL the pie is just getting bigger and there's more jobs out there but SQL Server is not going away. I live in Chicago so that's my frame of reference and the number of SQL DBA, Developer and BI jobs is endless.

    The demand for a solid SQL developer is still growing not shrinking because there's more data in SQL server databases this year than last year. I don't see that changing. And nothing exposes bad code than more data. The amount of data in Hadoop, MongoDB, etc is growing too which is why those skills are also relevant.

    I find Polybase quite interesting - I can't wait to see how that plays out and what impact it will have on SQL Server moving forward.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Yes, you are right.

    There is a variety of the SQL frameworks on a top of Hadoop (besides Hive).

    That is the subject of my next article I'm currently working on.

  • Pradeep Mohanta (10/1/2015)


    Hi Daniel I am Pradeep Mohanta working as SQL Server Database Architechture. About Hadoop your feeling and learning steps/procedure are same as my feeling and learning steps. Exactly I also did same steps to learn Hadoop. Now I am looking good Institute take a small course about hadoop. I think your prescribe books are very much help full to me, specially I much excited to read the "Microsoft Sql Server 2012 with Hadoop".

    Please advice me is it a good decision take training on Hadoop, since I am in Microsoft Technology Last 15 years.

    I give rating this article as 5.

    Hi Pradeep,

    I attended this live session[/url] and I liked it.

    Skillspeed.com is an India-based training company. They conduct free webinars as well as paid training.

    To get a taste you can sign up for their virtual meetup[/url]

    Daniel

  • ianstirk (10/1/2015)


    Hi, thanks for this very useful overview.

    You might want to know that Impala is a popular tool for querying data in Hive quickly (it uses in-memory processing instead of slower MapReduce jobs), often 100 times faster.

    In general, it seems like the world of Hadoop is moving away from MapReduce batch jobs, and towards in-memory processing (take a look at Spark).

    Most of the technologies for querying data (Hive, Impala, Spark etc) have a version of SQL that most people here will readily pick up.

    Enjoy

    Ian

    Ian,

    I agree - MapReduce jobs are slower.

    But the world is not abandoning MapReduce framework.

    The newer in-memory sql-like processing engines ( Impala, Presto) made queries to run much faster indeed. But fundamentally, because of the exclusive memory usage, all of them have these two challenges:

    1. the fault-tolerance issue - if the the query fails in the middle - it is gone. You have to start over. (That is unlike good old Hive on MapReduce, which stores intermediate results on the disk and tries to auto restart when failed)

    2. the memory size - if the data does not fit into memory - the query will crash. So, for massively large data sets ( hundreds of gigabytes, terabytes) in-memory processing may not work.

    Daniel

  • ianstirk (10/1/2015)


    Hi, thanks for this very useful overview.

    You might want to know that Impala is a popular tool for querying data in Hive quickly (it uses in-memory processing instead of slower MapReduce jobs), often 100 times faster.

    In general, it seems like the world of Hadoop is moving away from MapReduce batch jobs, and towards in-memory processing (take a look at Spark).

    Most of the technologies for querying data (Hive, Impala, Spark etc) have a version of SQL that most people here will readily pick up.

    Enjoy

    Ian

    Ian,

    I agree - MapReduce jobs are slower.

    But the world is not abandoning MapReduce framework.

    The newer in-memory sql-like processing engines ( Impala, Presto) made queries to run much faster indeed. But fundamentally, because of the exclusive memory usage, all of them have these two challenges:

    1. fault-tolerance issue - if the the query fails in the middle - it is gone. You have to start over. (That is unlike good old Hive on MapReduce, which stores intermediate results on the disk and tries to auto restart when failed)

    2. the memory size. If the data does not fit in - query will crash. So for massively large data sets ( hundreds of gigabytes, terabytes) in-memory processing may not work.

    Daniel

  • Hi Daniel,

    yes you are right of course. I was talking in terms of generalities...

    The Hadoop world in general is moving towards in-memory processing instead of MapReduce batch processing due to performance. But of course it needs lots of memory, and it may have limited restart capabilities.

    thanks

    Ian

  • Pradeep Mohanta (10/1/2015)


    Hi Daniel I am Pradeep Mohanta working as SQL Server Database Architechture. About Hadoop your feeling and learning steps/procedure are same as my feeling and learning steps. Exactly I also did same steps to learn Hadoop. Now I am looking good Institute take a small course about hadoop. I think your prescribe books are very much help full to me, specially I much excited to read the "Microsoft Sql Server 2012 with Hadoop".

    Please advice me is it a good decision take training on Hadoop, since I am in Microsoft Technology Last 15 years.

    I give rating this article as 5.

    Pradeep,

    I just got this email today (as a member of `BIG-Data-Hadoop-Analytics-Learning-Group` meetup group, but you do not need to be a member )

    One of our sponsors - Skillspeed - provides an amazing live project based course on BIG Data & Hadoop and for the first time they're opening it up to everyone. You can drop by to attend the first 2 modules - 6 Hours of Live Training, 4 Hours of Practicals - for no commitments whatsoever. It's a 100% free trial. 🙂

    Please click here to get details & register.[/url]

    Cheers!

  • What is the need for web log files, and therefore Flume?

    Thanks

    Jeff

    StLouisMO

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply