February 17, 2015 at 7:32 am
If I wanted to find a training for how to integrate Hadoop with SQL Server where would I look?
When I google "SQL Server Hadoop Integration", I get either high level articles that talk about how SQL Server Hadoop integration or simply generic Hadoop Basics articles. I'm looking for a step-by-step document that shows how to setup a SQL Server instance that is integrated with Hadoop.
February 17, 2015 at 9:30 am
I don't think there's a way to query Hadoop from SQL Server yet. I know they're working on it. In the meantime, there is a connector for SSIS.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 17, 2015 at 10:01 am
I assume you are wanting to add data from SQL Server that can then be map reduced?
Hadoop is essentially a framework of applications of which work together to allow you to utilise big data.
One application that is part of the Hadoop infrastruce is Sqoop which can be used to render a table from SQL server that can then be queried using HIVEQL on a HIVE cluster, map reduced and the results stored in HBASE which sits on top of the HDFS storage.
I would suggest that if you have access to the CBT Nuggets videos following the series on Hadoop by Garth Schulte is worth doing.
MCITP SQL 2005, MCSA SQL 2012
February 17, 2015 at 11:59 am
So Hadoop is not meant to be a backend tool supporting SQL Server similar to how SQL Server can be used as a backend storage for Microsoft Access? That makes sense.
It sounds like Hadoop is used for storing massive amounts of data and then if you need to analyze the data in SQL Server you do a file dump of aggregated data from Hadoop.
February 17, 2015 at 12:20 pm
Hadoop is essentially an infinetly scaleable data processing engine. You can query data from multiple sources be that files, binaries, databases using utilities such as sqoop and hive to integrate then interrogate the data.
For example we use it to analyse vast amounts of click data from our web servers and integrate with data from MySQL to produce aggregated data result sets that feed into a redshift parallel datawarehouse
MCITP SQL 2005, MCSA SQL 2012
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply