Hadoop integration

  • If I wanted to find a training for how to integrate Hadoop with SQL Server where would I look?

    When I google "SQL Server Hadoop Integration", I get either high level articles that talk about how SQL Server Hadoop integration or simply generic Hadoop Basics articles. I'm looking for a step-by-step document that shows how to setup a SQL Server instance that is integrated with Hadoop.

  • I don't think there's a way to query Hadoop from SQL Server yet. I know they're working on it. In the meantime, there is a connector for SSIS.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • I assume you are wanting to add data from SQL Server that can then be map reduced?

    Hadoop is essentially a framework of applications of which work together to allow you to utilise big data.

    One application that is part of the Hadoop infrastruce is Sqoop which can be used to render a table from SQL server that can then be queried using HIVEQL on a HIVE cluster, map reduced and the results stored in HBASE which sits on top of the HDFS storage.

    I would suggest that if you have access to the CBT Nuggets videos following the series on Hadoop by Garth Schulte is worth doing.

    MCITP SQL 2005, MCSA SQL 2012

  • So Hadoop is not meant to be a backend tool supporting SQL Server similar to how SQL Server can be used as a backend storage for Microsoft Access? That makes sense.

    It sounds like Hadoop is used for storing massive amounts of data and then if you need to analyze the data in SQL Server you do a file dump of aggregated data from Hadoop.

  • Hadoop is essentially an infinetly scaleable data processing engine. You can query data from multiple sources be that files, binaries, databases using utilities such as sqoop and hive to integrate then interrogate the data.

    For example we use it to analyse vast amounts of click data from our web servers and integrate with data from MySQL to produce aggregated data result sets that feed into a redshift parallel datawarehouse

    MCITP SQL 2005, MCSA SQL 2012

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply