SQL-On-Hadoop: Hive - Part I

Question

SQL-On-Hadoop: Hive - Part I

Frank A. Banin

SSCommitted

Points: 1747
More actions
July 26, 2017 at 12:00 am

#402361

Comments posted to this topic are about the item SQL-On-Hadoop: Hive - Part I
Frank Banin
BI and Advanced Analytics Professional.

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply

stan.geiger Right there with Babe Points: 727 More actions · Answer 1

I admire all the work put in to accomplish this stuff. But now that Polybase is part of SQL Server, why wouldn't you connect directly to Hadoop from SQL Server. We have found it much more efficient than SQOOP for transfering data and also allows you to use most if not all of the existing T-SQL constructs. You don't have to know Linux only where on the clusters the data is located. Another benefit is that Polybase creates all the MapReduce needed to generate the query set and runs it on Hadoop only bringing back the data.

rgagne99 SSC Veteran Points: 206 More actions · Answer 2

Hadoop demystified for SQL Server Users. Well done and thanks!
...Ray

Frank A. Banin SSCommitted Points: 1747 More actions · Answer 3

stan.geiger - Wednesday, July 26, 2017 7:29 AM
I admire all the work put in to accomplish this stuff. But now that Polybase is part of SQL Server, why wouldn't you connect directly to Hadoop from SQL Server. We have found it much more efficient than SQOOP for transfering data and also allows you to use most if not all of the existing T-SQL constructs. You don't have to know Linux only where on the clusters the data is located. Another benefit is that Polybase creates all the MapReduce needed to generate the query set and runs it on Hadoop only bringing back the data.

Like Ray put it, one objective is to demystified Hadoop for SQL Server Users and for them to know all the SQL-On-hadoop options out there.
Besides that, depending on processing objective there might be some benefits to using one against the other.
Because Hive is built directly on top of Hadood and part of the Apache framework, for instance you can map a HBase table (a NoSQL database table) as an EXTERNAL Hive table.
Hive integrates directly with Apache Spark & SparkQL an option we will look at later. SparkSQL uses a nested data model based on Hive for tables and DataFrames which makes that option more suited for interactive and real-time processing.

Frank Banin
BI and Advanced Analytics Professional.