Hi, thanks for this very useful overview.
You might want to know that Impala is a popular tool for querying data in Hive quickly (it uses in-memory processing instead of slower MapReduce jobs), often 100 times faster.
In general, it seems like the world of Hadoop is moving away from MapReduce batch jobs, and towards in-memory processing (take a look at Spark).
Most of the technologies for querying data (Hive, Impala, Spark etc) have a version of SQL that most people here will readily pick up.
I agree - MapReduce jobs are slower.
But the world is not abandoning MapReduce framework.
The newer in-memory sql-like processing engines ( Impala, Presto) made queries to run much faster indeed. But fundamentally, because of the exclusive memory usage, all of them have these two challenges:
1. the fault-tolerance issue - if the the query fails in the middle - it is gone. You have to start over. (That is unlike good old Hive on MapReduce, which stores intermediate results on the disk and tries to auto restart when failed)
2. the memory size - if the data does not fit into memory - the query will crash. So, for massively large data sets ( hundreds of gigabytes, terabytes) in-memory processing may not work.