SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 

Azure Data Factory integration with GitHub

(2019-Feb-06) Working with Azure Data Factory (ADF) enables me to build and monitor my Extract Transform Load (ETL) workflows in Azure. My ADF pipelines is a cloud version of previously used ETL projects in SQL Server SSIS.

And prior to this point, all my sample ADF pipelines were developed in so-called "Live Data Factory Mode" using my personal workspace, i.e. all changes had to be published in order to be saved. This hasn't been the best practice from my side, and I needed to start using a source control tool to preserve and version my development code.

Back in August of 2018, Microsoft introduced GitHub integration for Azure Data Factory objects - https://azure.microsoft.com/en-us/blog/azure-data-factory-visual-tools-now-supports-github-integration/. Which was a great improvement from a team development perspective. 


So now is the day to put all my ADF pipeline samples to my personal GitHub repository.
Each of my previous blog posts:
1) Setting Variables in Azure Data Factory Pipelines
2) Append Variable activity in Azure Data Factory: Story of combining things together
3) System Variables in Azure Data Factory: Your Everyday Toolbox
4) Email Notifications in Azure Data Factory: Failure is not an option 

Has a corresponding pipeline created in my Azure Data Factory:


And all of them are now publically available in this GitHub repository:
https://github.com/NrgFly/Azure-DataFactory

Let me show you how I did this using my personal GitHub account; you can do this with enterprise GitHub accounts as well.

Step 1: Set up Code Repository
A) Open your existing Azure Data Factory and select the "Set up Code Repository" option from the top left "Data Factory" menu:


B) then choose "GitHub" as your Repository Type:


C) and make sure you authenticate your GitHub repository with the Azure Data Factory itself: 


Step 2: Saving your content to GitHub


After selecting an appropriate GitHub code repository for your ADF artifacts and pressing Save button:


You can validate them all in the GitHub itself. Source code integration allowed me to save all my AFD artifacts: pipelines, datasets, linked services, and triggers.


And that's where I can see all my four ADF pipelines:



Step 3Testing your further changes in ADF pipelines

Knowing, that all my ADF objects are now stored in GitHub, let's see if a code change from Azure Data Factory will be synchronized there.

I add a new description to my pipeline with Email notifications:


After saving this change in ADF I can see how it's being synchronized in my GitHub repository:


Summary:
1) GitHub integration with Azure Data Factory is possible.
2) And now I'm a bit closer to automating my deployment process and use Azure DevOps VSTS to create my CI/CD pipelines! :)

Data Adventures

My personal journey in an intricate world of data and continuous effort to make it more structured and well understood can be found in this blog.

I live and work in Canada - see my profile on LinkedIn.

Comments

Leave a comment on the original post [datanrg.blogspot.com, opens in a new window]

Loading comments...