SQLServerCentral Article

Converting a File with Parent-Child Records in SSIS

,

Overview

A common requirement in ETL scenarios is to extract data from a single file that has multiple record types with those records having parent-child relationships or a single record that can have one or more detail records that relate to it. SQL Server Integration Services (SSIS) offers a number of solutions for this scenario.

In this article, I will demonstrate a data flow solution for this scenario through the use of conditional split, script component (using VB.NET) and merge join. The input file is a comma separated sales report. The parent records detail an item bought and it's attributes, while the child records contain information about customers who have purchased that item in a single field and information regarding their purchases (e.g. price, quantity) The methodology will be to identify "good records" and allow those through the data flow. Next, a script component will be created that will assign a parent-child key to each record and direct each row to it's respective output. After each row has been directed to a separate output, I will need to scrub the customer address field using another script component. Finally, both the customer and product records will be sorted by their parent-child key and merged together using a merge join transformation and output into a SQL table.

Implementation

First, let's take a look at the sample file to be loaded


Figure 1 - Sample file

As you can see, there are two header records that contain column names: product and customer respectively. The third row has product information, while the fourth and fifth rows have customer information. The sixth row contains summary information that is not needed for this transformation.
The next step will be setting up a connection manager to extract this file. Notice that I have put double quotes (") in as a text qualifier. This will ensure that all of the customer information is captured in a single field and not delimited by 'City, State'.


Figure 2 - Flat File Connection Manager Editor

To extract only non-blank records into the actual data flow, I will add a conditional split to the transformation. The condition will check the first column to see if there is any data contained in that column. If there is data in that column the transformation sends the record to the output 'GoodRecs'.

Figure 3 - Conditional Split Transformation- excludes blank records

Once the conditional split is complete, drag and drop a 'Script Component' transformation into the data flow and choose type 'Transformation'. Select all columns under Input Columns then choose the 'Inputs and Outputs' view. Here, add 3 Outputs (Product, Customer & Summary) and add an output column to both the Product (ProdKey) and Customer (CustKey) Outputs. For the common properties (right side of the editor) of each output, you will need to change the field 'ExclusionGroup' to match the exclusion group of the Input (typically 1) and choose 'input "YourInput" (001)' as the 'SynchronousInputID'.


Figure 4 - Script Transformation Editor - splitting records by type

Next, choose the 'Script' view and click 'Design Script' at the bottom right hand corner of this box. Copy the below script into the script task. I have created variables CustPattern and SumPattern that identify patterns unique to each record type. When directing a customer record to the customer output, this variable will be called to validate that the record is, in fact, a customer record.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain
 Inherits UserComponent
Dim counter As Integer
 'Record Pattern for Customer Records
 Dim CustPattern As String = "*;*"
 'Record Pattern for Summary Records
 Dim SumPattern As String = "*(Entries*"
 'Record Pattern for Product Records left blank
Public Overrides Sub PreExecute()
counter = 0 'Initialize counter = 0
End Sub
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
 'Increment counter by 1 on each pass of blank second column
 If Row.Column1 <> "" Then
 counter += 1 
 End If
 counter = counter
'Set keys = counter value
 Row.CustKey = counter.ToString 
 Row.ProdKey = counter.ToString
'Output records based on record patterns
 If Row.Column0 Like CustPattern Then
 Row.DirectRowToCustomerRecord()
 ElseIf Row.Column0 Like SumPattern Then
 Row.DirectRowToSummaryRecord()
 Else : Row.DirectRowToProductRecord()
 End If
 End Sub
End Class

Now your data flow should look like Figure 5 below.


Figure 5 - Data Flow

Now, we need to scrub those nasty customer records with another script component. As before, create a 'Script Component' and choose type 'Transformation'. Select all columns under Input Columns then choose the 'Inputs and Outputs' view. Next, create an output called ScrubbedOuput and add output columns Zip, State, City, Address, CustomerName, CustomerNbr. For the common properties of each output, you will need to change the field 'ExclusionGroup' to match the exclusion group of the Input (typically 1) and choose 'input "YourInput" (001)' as the 'SynchronousInputID' just like before.


Figure 6 - Script Transformation Editor - scrub customer records

Next, choose the 'Script' view and click 'Design Script' at the bottom right hand corner. Copy the below script into your script task. Here, I have created the variable 'Scrub', just to make the script easier to read and modify.

Imports System
Imports System.Data
Imports System.Math
Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
Public Class ScriptMain
 Inherits UserComponent
Dim Scrub As String
Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)
 '
 Scrub = Row.Column0
Row.CustomerNbr = Scrub.Substring(0, 8)
 Row.CustomerName = Scrub.Substring(8, Scrub.IndexOf(";", 1) - 8)
 Row.Address = Scrub.Substring(Scrub.IndexOf(";", 1) + 2, (Scrub.LastIndexOf(";")) - (Scrub.IndexOf(";", 1)) - 2)
 Row.City = Scrub.Substring(Scrub.LastIndexOf(";") + 2, (Scrub.LastIndexOf(",")) - (Scrub.LastIndexOf(";")) - 2)
 Row.State = Scrub.Substring(Scrub.LastIndexOf(",") + 2, 2)
 Row.Zip = Scrub.Substring(Scrub.LastIndexOf(",") + 6, 5)
Row.DirectRowToScrubbedOutput()
 '
 End Sub
End Class

Now your data flow should look like Figure 7 below.


Figure 7 - Data Flow Task

Before we can merge these two record types together, we need to add 'Sort' transformations to both the ProductRecord flow and the CustomerRecord flow on the ProdKey and CustKey fields. Once that is complete, we can add a 'Merge Join' transformation joining the two record types on ProdKey = CustKey.


Figure 8 - Merge Join Transformation Editor - join on ProdKey = CustKey

Once the join is complete, add your destination. Now your data flow should look like Figure 9 below. With the package configuration completed, you can execute the package and watch the data flow to multiple record types and then back together to create a single flattened record.


Figure 9 - Final Data Flow Task

Conclusion

In this article, I've demonstrated how to use SQL Server Integration Services (SSIS) to create a data flow task that can handle multiple record types with a parent-child relationship. The transformation extracts non-blank records, creates a key for each record and splits that record by type and then merges the records back together to create a flattened record with a 1-to-1 relationship.

Rate

4.47 (15)

You rated this post out of 5. Change rating

Share

Share

Rate

4.47 (15)

You rated this post out of 5. Change rating