SQL Clone
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in

Azure Data Factory Data Flow

Azure Data Factory v2 (ADF) has a new feature in public preview called Data Flow.  I have usually described ADF as an orchestration tool instead of an Extract-Transform-Load (ETL) tool since it has the “E” and “L” in ETL but not the “T”.  But now it has the data transformation capability, making ADF the equivalent of “SSIS in the cloud” since it has the ability to mimic SSIS Data Flow business logic.

Data Flow works without you having to write any lines of code as you build the solution by using drag-and-drop features of the ADF interface to perform data transformations like data cleaning, aggregation, and preparation.  Behind the scenes, the ADF JSON code is converted to the appropriate code in the Scala programming language and will be prepared, compile and execute in Azure Databricks which will automatically scale-out as needed.

There is an excellent chart created by that shows the SSIS tasks and the equivalent ADF operation.  There are currently 13 different dataset manipulation operations with more to come:

  • New Branch
  • Join
  • Conditional Split
  • Union
  • Lookup
  • Derived Column
  • Aggregate
  • Surrogate Key
  • Exists
  • Select
  • Filter
  • Sort
  • Extend

The Data Flow feature in ADF is currently in limited public preview.  If you would like to try this out on your Data Factories, please fill out this form to request whitelisting your Azure Subscription for ADF Data Flows: http://aka.ms/dataflowpreview.  Once your subscription has been enabled, you will see “Data Factory V2 (with data flows)” as an option from the Azure Portal when creating Data Factories.

Follow Mark Kromer and the ADF team on Twitter to stay up to date on the rollout of the preview.  Also check out the ADF Data Flow’s documentation and these ADF Data Flow’s videos.

Don’t confuse this with Dataflows in Power BI, they have nothing to do with each other.

More info:

Azure Data Factory v2 and its available components in Data Flows




James Serra's Blog

James is a big data and data warehousing technology specialist at Microsoft. He is a thought leader in the use and application of Big Data technologies, including MPP solutions involving hybrid technologies of relational data, Hadoop, and private and public cloud. Previously he was an independent consultant working as a Data Warehouse/Business Intelligence architect and developer. He is a prior SQL Server MVP with over 30 years of IT experience. James is a popular blogger (JamesSerra.com) and speaker, having presented at dozens of PASS events including the PASS Business Analytics conference and the PASS Summit. He is the author of the book “Reporting with Microsoft SQL Server 2012”. He received a Bachelor of Science degree in Computer Engineering from the University of Nevada-Las Vegas.


Leave a comment on the original post [www.jamesserra.com, opens in a new window]

Loading comments...