Azure Data Factory Triggers

DP, 2020-02-21

Triggers in ADF are used to run pipelines automatically either on a wall-clock schedule or at a periodic time interval. There is enough documentation on these two types of triggers with examples. But, for someone like me coming from an SSIS background, what was missing is details on what happens on failure and what happens when the execution duration overlaps with schedule. It took me some time to play around with both these triggers and understand how they behave.

Schedule Triggers:

Below is a screenshot of Schedule Trigger properties to be specified while creating one. To me this looks very similar to creating a SQL Server Agent job and more details are available here

I created a new schedule trigger to execute one of my pipelines that ingests data (100+ tables) from an on-premise SQL Server to Azure Data Lake Store (ADLS). The design was to ingest the source data as a parquet file with the name of the table in ADLS, and the frequency was set to hourly (specific requirements).

When the ingestion completed within an hour, I thought everything was working fine as the trigger started the next execution as per the schedule. But when the ingestion duration was more than an hour, I saw in the ADF monitor that the next execution started as per the schedule. So, two instances of the same pipeline were running in parallel ingesting the same set of tables. This failed both the executions due to resource level locks as they were trying to create a file with same name at same location. As I worked extensively on SSIS and SQL Agent Jobs before, I thought that this is not the correct trigger.

Tumbling Window:

Below is a screenshot of Tumbling window Trigger properties to be specified while creating one. This looked different and I am not going into the details are they are here

At first look, I felt that this trigger will also run into the resource deadlocks but then the “Add dependencies” property caught my attention. Setting this property will make this trigger execution dependent on the status of another trigger or itself. I added a new trigger to execute the same pipeline with recurrence of once an hour. Below is a screenshot of dependency filled out. The drop down "TRIGGER" to select one gives all available tumbling window triggers including itself.

When the pipeline succeeded within the hour and when the duration exceeded the hour the trigger behaved as expected. There was no parallel execution of this pipeline and no deadlocks. But, when one of the executions failed (for whatever reason), I noticed that the next execution never started. When I fixed the root cause of failure, it was back to normal. This is clearly mentioned in the official documentation. But the next day, I noticed something more. As in the above screenshot I set an end date to this trigger and wasn’t expecting it to execute beyond this. Then I understood that due to the failure the number of execution windows were accumulated in queue and their executions went beyond the end date irrespective of whether the trigger is active or not. I believe this is expected behavior of a Tumbling window trigger and it cannot be controlled.

Save costs with Azure Data Factory

by DP

SQLServerCentral

A typical project following any industry standard process for building Azure Data Factory (ADF) pipelines will have different environments as part of development lifecycle. They might be named Development, Staging, Production, and so on. This article describes an approach to save costs from running ADF pipelines using Triggers. Typical ADF environments will have pipelines ingesting, […]