Working with Arrays in Azure Data Factory

,

(2019-June-06) I remember I had a professor at my university who very often would encourage his students to learn and get more experienced with simple things first. Learn the basics in and out and then move forward to more complicated concepts and practices, that was his motto, and he really tried to share this idea with us.

Simple things sometimes can be overlooked as well. With the addition of Variables in Azure Data Factory Control Flow (there were not available there at the beginning), Arrays have become one of those simple things to me.

Currently, there are 3 data types supported in ADF variables: String, Boolean, and Array. The first two are pretty easy to use: Boolean for logical binary results and String for everything else, including the numbers (no wonder there are some many conversion functions in Azure Data Factory that we can use).

I've also blogged about using Variables in Azure Data Factory:

Setting Variables in Azure Data Factory Pipelines

Append Variable activity in Azure Data Factory: Story of combining things together

System Variables in Azure Data Factory: Your Everyday Toolbox

Azure Data Factory: Extracting array first element

Going back to my memory flashback of the professor guidance for learning and using simple things, I've finally realized that they are worth to get more experienced with! Why, because arrays are everywhere in the Control Flow of Azure Data Factory:

(1) JSON output most of the activity tasks in ADF can be treated as multiple level arrays

(2) Collections that are required for the "ForEach" activity can be outsourced from the preceding (1) activity outputs

(3)  "Set Variable" and "Append Variable" activity could be used to store receding (1) activity outputs for further data transformation

(4) You can create Arrays manually by transforming existing linear values or setting them with hard-coded values (fixed collections).

Don't forget about various functions and expressions to support your work with Arrays in Azure Data Factory (https://docs.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions):

intersection - Returns a single array or object with the common elements between the arrays or objects passed to it.

union - Returns a single array or object with all of the elements that are in either array or object passed to it.

first - Returns the first element in the array or string passed in.

last - Returns the last element in the array or string passed in.

skip - Returns the elements in the array starting at index Count.

length - Returns the number of elements in an array or string.

json - Convert the parameter to a JSON type value.

array - Convert the parameter to an array.

createArray - Creates an array from the parameters.

range - Generates an array of integers starting from a certain number, and you define the length of the returned array.

Just to show a quick example of some simple operations with arrays, I had created this ADF pipeline with 4 main components:

(1) Lookup task to read a CSV file with 2 columns of Syllabic music notes:

where the JSON output of this activity task contains 7 elements:

(2) Set Variable task converts a text string of "C-D-E-F-G-A-B"

into an array variable Notes_Alpabet using this expression:

@split(variables('Notes_Alphabet_String'),'-')

(3) Then looping through a collection of array elements of the (1) activity task output:

@activity('Lookup Notes_Syllabic').output.value

I then append a combination of Syllabic and Alphabet music notes into the Notes_Combined array variable

using this expression

@concat(item().ID,'-',item().Note,'-',variables('Notes_Alphabet')[add(int(item().ID),-1)])

important part is that I can locate the Notes_Alphabet variable element with the index of the Lookup Notes_Syllabic collection index:

variables('Notes_Alphabet')[add(int(item().ID),-1)] where item().ID value comes from the first column of my sourcing file.

(4) As a result, I'm copying the content of the array Notes_Combined variable into another array Notes_Combined_View variable for Debug purpose:

And this helped me to see how both Syllabic and Alphabet music notes correspond to each other.

The code of this ADF pipeline can be found here:

https://github.com/NrgFly/Azure-DataFactory/blob/master/Samples/pipeline/adf_arrays_sample_pl.json

And I hope you will find this blog post helpful in your journey to explore simple things of the Azure Data Factory!

Original post (opens in new tab)

Rate

4 (1)

You rated this post out of 5. Change rating

Share

Share

Rate

4 (1)

You rated this post out of 5. Change rating