SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Azure Data Factory and SSIS compared

I see a lot of confusion when it comes to Azure Data Factory (ADF) and how it compares to SSIS.  It is not simply “SSIS in the cloud”.  See What is Azure Data Factory? for an overview of ADF, and I’ll assume you know SSIS.  So how are they different?

SSIS is an Extract-Transfer-Load tool, but ADF is a Extract-Load Tool, as it does not do any transformations within the tool, instead those would be done by ADF calling a stored procedure on a SQL Server that does the transformation, or calling a Hive job, or a U-SQL job in Azure Data Lake Analytics, as examples.  Think of it more as an orchestration tool.  SSIS has the added benefit of doing transformations, but keep in mind the performance of any transformations depends on the power of the server that SSIS is installed on, as the data to be transformed will be pushed to that SSIS server.  Other major differences:

  • ADF is a cloud-based service (via ADF editor in Azure portal) and since it is a PaaS tool does not require hardware or any installation.  SSIS is a desktop tool (via SSDT) and requires a good-sized server that you have to manage and you have to install SQL Server with SSIS
  • ADF uses JSON scripts for its orchestration (coding), while SSIS uses drag-and-drop tasks (no coding)
  • ADF is pay-as-you-go via an Azure subscription, SSIS is a license cost as part of SQL Server
  • ADF can fire-up HDInsights clusters and run Pig and Hive scripts.  SSIS can also via the Azure Feature Pack for Integration Services (SSIS)
  • SSIS has a powerful GUI, intellisense, and debugging.  ADF has a basic editor and no intellisense or debugging
  • SSIS is administered via SSMS, while ADF is administered via the Azure portal
  • SSIS has a wider range of supported data sources and destinations
  • SSIS has a programming SDK, automation via BIML, and third-party components.  ADF does not have a programming SDK, has automation via PowerShell, and no third-party components
  • SSIS has error handling.  ADF does not
  • ADF has “data lineage“, tagging and tracking the data from different sources.  SSIS does not have this

Think of ADF as a complementary service to SSIS, with its main use case confined to inexpensively dealing with big data in the cloud.

Note that moving to the cloud requires you to think differently when it comes to loading a large amount of data, especially when using a product like SQL Data Warehouse (see Azure SQL Data Warehouse loading patterns and strategies).

More info:

Azure Data Factory vs SSIS

James Serra's Blog

James is a big data and data warehousing technology specialist at Microsoft. He is a thought leader in the use and application of Big Data technologies, including MPP solutions involving hybrid technologies of relational data, Hadoop, and private and public cloud. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence architect and developer. He is a prior SQL Server MVP with over 30 years of IT experience. James is a popular blogger (JamesSerra.com) and speaker, having presented at dozens of PASS events including the PASS Business Analytics conference and the PASS Summit. He is the author of the book “Reporting with Microsoft SQL Server 2012”. He received a Bachelor of Science degree in Computer Engineering from the University of Nevada-Las Vegas.

Comments

Leave a comment on the original post [www.jamesserra.com, opens in a new window]

Loading comments...