Transforming JSON data with the help of Azure Data Factory - Part 3


(2020-Apr-06) Traditionally I would use data flows in Azure Data Factory (ADF) to flatten (transform) incoming JSON data for further processing. Recently I've found a very simple but very effective way to flatten incoming JSON data stream that may contain a flexible structure of data elements, and this won't require using data flow transformation steps.

Photo by Rodolfo Clix from Pexels

Part 1Transforming JSON to CSV with the help of Azure Data Factory - Mapping Data Flows
Part 2: Transforming JSON to CSV with the help of Azure Data Factory - Wrangling Data Flows

Here is my story 🙂

Let's say I have the following JSON file that I want to parse one element (event) at the time:

A simple ADF pipeline can be created to read the content of this file and a stored procedure to call by passing individual JSON data elements as parameters for this procedure.

The output of the "Get JSON data" Lookup activity task is passed to the "ForEach container" with the following expression: @activity('Get JSON data')

Within my "ForEach" container I have also placed a Stored Procedure task and set 4 data elements from my incoming data stream as values for corresponding parameters.

However this approach will not work for all my incoming JSON events, it actually failed for the last one, since it didn't have both "stop_time" and "last_update" data elements.

Error message:

An easy way to fix this problem is to add missing data elements with empty values for the last event record, however, when we don't have control over incoming data, we need to adjust our data processing steps.

We can check if "stop_time" and "last_update" data elements exist in the @item iteration dataset. If they don't exist, then we can replace them with other default values. 

I'm not aware if there are built-in operators in ADF to do this, however, it still can be done by converting @item output into a string and then do a simple text search within this converted text line.

My initial stop_time expression "@item().stop_time" can be replaced with "@if(contains(string(item()),'"stop_time":'),item().stop_time,null)",
and initial last_update expression "@item().stop_timecan be replaced with "@if(contains(string(item()),'"last_update":'),item().last_update,utcNow())"

After this quick fix, the whole ADF pipeline ran successfully.

So, using a simple data conversion with the help of String ADF function we can flatten JSON data into a string, and that explains the title of my blog post 🙂 

Original post (opens in new tab)