Developing Metadata Design Patterns in BIML

You may have already experienced ETL Hell, where you have a large number of similar of SSIS tasks, and a small change, such as an alteration to the network topology, means that they all need to be altered with the correct connection details. Perhaps you should consider creating design patterns for all the standard components of integration tasks in BIML, and generating the SSIS packages from these? Amarendra walks you through the process.

It is reasonably easy to develop an SSIS package, but it is harder to develop and maintain a consistent design pattern within the organisation that deals with a common problem. This is where Business Intelligence Markup Language (BIML) comes its own, because it allows for code reuse whereby templates and patterns can be shared by several ETL tasks.

In this article, I will introduce BIML and show how it can be used to build a SSIS design pattern.

What does BIML mean?

BIML is a markup language similar to HTML that is used to specify both Business Intelligence and Data Warehouse applications. By using BIML you can generate SSIS packages dynamically. SSIS packages that are generated by BIML are entirely compatible with Visual Studio, so you can subsequently alter the SSIS packages in Visual Studio if you need to do so.

BIML was developed by Varigence. There are two open-source Visual Studio plug-ins, BIDS Helper and BIML Express, that you can use to practise BIML. In addition, Varigence has developed a commercial product called MIST that has more features than the two Visual Studio plug-ins.

Both BIDS helper and BIML Express are ideal for small to medium-size project requirements. They are available to download for free under an open-source license.

Unfortunately, the BIDS helper plug-in doesn’t support command line compilation for BIML. This may be a bottleneck, if you are planning to use Continuous Integration or Continuous Delivery in your organisation. However, the two VS plugins can easily be supplemented by the commercial product called MIST from Varigence, which is designed to deal with the requirements of professional use, including command line compilation and XCopy deployment.

BIML allows distributed development teams to use a consistent design pattern for ETL, and is particularly suitable for metadata-driven data warehouse projects. The use of BIML encourages code reusability and productivity; it is reasonably easy to learn because it is based on XML mark-up language. You can reverse engineer existing ETL solutions with the help of the commercial MIST product. BIML supports SQL Server from version 2005 upwards. BIML’s functionality can be extended with the help of BIML script, which contains C# or VB.NET code inside a BIML file.

SSIS Package Generation Using BIML #1:

I will use BIDS helper to demonstrate the capabilities of BIML in this article. BIDS helper is a widely-used free Visual Studio plug-in. It is a CodePlex community project and can be downloaded from here (https://bidshelper.codeplex.com/). Once installed, this is available as a Visual Studio plug-in.

Once we’ve installed BIDS helper, we will now create a new SSIS project in Visual studio. After doing this, right-click on the project and select the menu item “Add new BIML File”. This screenshot shows how we add a new BIML file to an existing SSIS project by creating a new file in the “Miscellaneous” folder. C:\WorkArea\Nat\SQL\Blog\Amar\BIML_1.png

C:\WorkArea\Nat\SQL\Blog\Amar\BIML_2.png

Now copy the BIML code shown below and paste it in the new BIML file.

C:\WorkArea\Nat\SQL\Blog\Amar\BIML_3.png

BIDS helper allows you to validate the BIML syntax via the menu item “Check BIML for Errors” as seen here.

C:\WorkArea\Nat\SQL\Blog\Amar\BIML_4.png

Hopefully you will see the output message in a Message Box (“No errors or warnings were found”). You will also see the status message in the output window as shown.

C:\WorkArea\Nat\SQL\Blog\Amar\BIML_5.png

After you have checked that the markup is valid, you can then generate the SSIS package using the menu item “Generate SSIS Packages”.

C:\WorkArea\Nat\SQL\Blog\Amar\BIML_6.png

Once the package has been generated successfully, you will see the output message “BIML expansion completed”.

C:\WorkArea\Nat\SQL\Blog\Amar\BIML_7.png

Now let’s change the BIML script by adding information for a connection and a SQL task in the BIML file.

BIML File content:

C:\WorkArea\Nat\SQL\Blog\Amar\BIML_8.png

Once the BIML has been expanded, we can once more validate the BIML and generate the package. The screenshot below shows the generated SSIS package with a SQL task.

C:\WorkArea\Nat\SQL\Blog\Amar\BIML_9.png

The SQL task has been configured based on the settings in the BIML file.

C:\WorkArea\Nat\SQL\Blog\Amar\BIML_10.png

The generated SSIS package can now be executed manually in Visual Studio.

C:\WorkArea\Nat\SQL\Blog\Amar\BIML_11.png

SSIS Package Generation Using BIML #2:

In this BIML design pattern, we will be developing a SSIS package to load the data from a flat file to the “Currency” table in a SQL Server database.

In this example, I will create a SSIS package to truncate the destination table and to load the data from a flat CSV file. This package will have an execute SQL task to truncate the currency table. The Data flow task will have a Flat file source to extract data from the flat file. There will be an OLEDB destination to load the data into the currency table.

We have been supplied the currency data in a comma-separated CSV file. It has the currency code and the currency name.

Constraint Mode for Package:

The ‘constraint mode’ defines the way that tasks are sequenced within a SSIS package. The ‘Parallel’ mode specifies that there will be no precedence in the order of tasks. In the ‘Linear’ mode, every task is connected with a ‘precedence constraint’ to the previous task, as per the specification in the BIML file. In SSIS, tasks are linked by precedence constraints. Read about precedence constraints here

The screenshot shows how the value of the ConstraintMode is supported by intellisense.

Connection Managers in BIML:

In BIML, the Connection managers tag is defined outside of the package tags. Once the connection manager has been used inside the package, it will be added during the package generation.

From the above image it is clear the connection manager “OLEDB_Staging” has been defined within the connection tag and the same has been reused within the Package tag.

Developing a Simple SSIS Package:

The following code snippet will create a simple ‘Staging’ package to truncate the “Currency” table. An execute SQL task will be added to the package to truncate the currency table. This task will use the OLEDB connection manager.

The Execute SQL Task has two key attributes to be defined: name and the connection manager. The task has a child element “DirectInput”, to specify the sql query against the database.

BIML Script:

After a successful validation the SSIS package is then generated. This shows the result

Now we will enhance the BIML Script to add a flat file connection manager and a data flow task to refer the flat file connection as a source.

Now, with these additions, the SSIS package has a SQL task and a simple Data Flow task. We can see this in the design

The Data Flow Task has a flat file source to connect the Currency csv file.

The BIML script can be extended further to load data into the currency table.

After a successful validation the SSIS package has been generated successfully. Now the package can be executed to load the flat file into the currency table.

BIML Usability:

BIDS helper offers colour-coding for the BIML and some basic intellisense features.

When you start typing an XML tag, you’ll get a list of all the applicable child elements for the current element. In addition, BIDS Helper will add the corresponding closing tag when you complete an element.

When you are typing inside an element, you’ll get a list of all the applicable attributes.

In addition, the BIDS Helper will do some design-time validation of the BIML syntax. In the following example, I haven’t provided the connection string details for the connection. The BIDS helper has validated the script, found a problem, and is showing the error message in the tooltip.

Metadata based ETL development:

In conventional SSIS development practise, it is possible for a defect to be repeated across several SSIS packages. When this happens it becomes expensive in time and effort to fix it. The best way of avoiding this is to develop SSIS packages based on the ‘Technical metadata’. Technical metadata means data about data, its validation, sources and transformations, and is used to provide consistency in the way that data mappings and transformations are done within an enterprise. We can use it to store information about ETL processes. When an organisation maintains a database of technical metadata that contains information about Data sources, Destinations, Business transformation and Data quality rules, then BIML can access and make use of it to create connection managers, Data Sources, Data Destinations and Transformations in a SSIS package. With the help of the metadata, a complete ETL solution can be generated using BIML and BIML script.

In metadata-driven ETL development practise, new SSIS packages can be generated by adding records into the technical metadata database. The BIML Script will read the database and based on the configuration and it will generate the SSIS packages.

SSIS Package Generation Using BIML #3:

In this BIML design pattern, we will be developing a SSIS package to load the data from a flat file to the “Currency” table using the metadata-based approach.

BimlScript

BIML script is the programing language of BIML. BIMLScript is based on C# code snippets and it helps to dynamically generate BIML. It is very easy to learn and practise BimlScript.

A gentle knowledge on C# will help you to accelerate the BIML Script development. BIMLScript tags starts with “<#” and end with “#>” delimiters. BIMLScript will be embedded in the BIML file.

In BIMLScript , Usually the first line represents the directives. These directives command the BIML engine to process the BIML script. The directives starts with the tag “<#@” and end with “#>”.

The below mentioned directive, command the BIML engine to use C# to process the BIML script.

Directive to import .Net library

The C#/.NET equivalent of the Import functionality in BIML can be represented below.

Embedding in BIML File

The tag “<#=” evaluates the .NET code and return as a string. The resultant string will be embedded in the BIML file.

BIML Script SQL Task Example

It helps to understand the concept of BIMLScript by showing a simple package. This package has only one SQL Task to truncate the staging table. So instead of hard-coding the name of the table, we would like to use a variable. To take advantage of BIMLScript, we will be using a variable value to store the name of the table. The SQL statement will be prepared during BIML expansion and the package will be created with the SQL Task.

The following code snippet will declare and assign the value to the variable <# Var strTableName=”dbo.Currency”; #>

Now the variable has been referenced in the SQL query.

The tag “<#=” helps the BIML run-time engine to evaluate the variable and replace the value in the BIML file. The BIML engine will evaluate the variable and the value of the variable will be placed in the SQL query.

After processing, the SQL query will looks like the one below

The complete BIML Script to generate the SSIS package has been given below.

After the successful validation, the SSIS package has been generated.

Metadata driven solution

With these basics of BIMLScript, we can now apply the concept in a Metadata-driven solution to BIML. Our aim is to store all the technical Metadata in a database and to generate the SSIS package based on the stored Metadata. The BIML Script will read the metadata and, based on the BIML design pattern, it will generate the SSIS package accordingly.

We have the currency details in a csv file, and we need to develop a SSIS package to read the flat file and to load the data in the staging table currency.

The Currency file has two columns called Currency Code and Name.

C:\WorkArea\Nat\SQL\Blog\Amar\Article_3\Images\Currency_File.png

Database schema to store Metadata:

I have created two tables to store the Metadata about the flat file. These are FlatFileMaster and FlatFileColumnDetails

FlatFileMaster

The table FlatFileMaster will store high level details about the flat file such as Row Delimiter, File Path and Name. This details will help us to create a Flat File connection manager in the SSIS package. The below mentioned image represents the table structure with data.

C:\WorkArea\Nat\SQL\Blog\Amar\Article_3\Images\FlatFileMaster.pngFlatFileColumnDetails

The Column level details, such as Column Name, datatype, length and delimiter details, will be stored in the table FlatFileColumnDetails. These details will help us to generate column Metadata for the flat file connection manager.

C:\WorkArea\Nat\SQL\Blog\Amar\Article_3\Images\FlatFileColumns.png

We now need to create BIML Script to generate the package that has a code snippet to generate SSIS package to load currency csv file into the Currency staging table.

The BIML Script

SQL Code Snippet to create tables

BIML Script to generate an SSIS package:

There are two simple SQL queries which have been defined to list the metadata from FlatFileMaster and FlatFileColumn tables. The resultant dataset will be used by the BIMLScript to generate the code snippet for the flat file connection managers.

  • The first: For each loop in File format tag, iterate through the available header record in the FlatFileMaster table. As there is only one file, it is expected to return only one record now.
  • The second: For each loop will return the details of the columns for the currency flat file. As we have configured two columns (Code and Name), it will iterate twice to generate the column Metadata. After a successful validation, the SSIS package can be generated as below.

C:\WorkArea\Nat\SQL\Blog\Amar\Article_3\Images\BIML_Script_1.png

Adding dynamic transformation in the data flow

The above BIML script can be improved on a number of places. As a first step, we need to change the configuration for the dataflow. In the previous script, the flat file source has been hardcoded in the DataFlow.

The flat file connection information can be reused to derive the flat file source in the data flow. An additional for-each loop has been added after the transformation tag to iterate through the available flat file connections. As we have only one flat file connection, a single Flat file connection will appear here. The below mentioned BILM script will help us to generate the SSIS package dynamically.

BIML Script:

Dynamic Flat file connection manager

It is easy enough to identify that the flat file connection manager details are hard coded within the biml script. As these details are already stored in the database, this can be automated as well.

The Foreach loop before the flat file connection tag will iterate through the flat file connection result set and will create the flat file connection managers.

Testing the design pattern for multiple files

Now we have automated the generation of SSIS code snippets for Flat File connection managers, Flat File formats, Flat File column details and Flat File source. The above pattern should support multiple flat files. The only thing that remains to be done is to add the configuration details in the FlatFileMaster and FlatFileColumnDetails tables. Let us assume that now we have additional flat file “Products” and we need to load the flat file to a new staging table. Now the existing design pattern can be used to generate the SSIS package for the additional Products flat file.

The Products file has three columns (ProductID, Name & ProductNumber)

C:\WorkArea\Nat\SQL\Blog\Amar\Article_3\Images\Products_File.png

This image shows the configuration of additional records in the FlatFileMaster and FlatFileColumnDetails tables.

C:\WorkArea\Nat\SQL\Blog\Amar\Article_3\Images\FileMasterEntry.png

C:\WorkArea\Nat\SQL\Blog\Amar\Article_3\Images\FlatFileColumnConfig.png

Now let us validate and generate the SSIS package. The SSIS package has been generated with a dataflow and a connection manager for Products and Currency flat file. The Data flow has flat file source for both products and currency.

C:\WorkArea\Nat\SQL\Blog\Amar\Article_3\Images\Flat_File_Currency_Script_2.png

C:\WorkArea\Nat\SQL\Blog\Amar\Article_3\Images\Flat_File_Currency_Connection.png

C:\WorkArea\Nat\SQL\Blog\Amar\Article_3\Images\Products_Connection.png

Conclusion:

Learning BIML can be fun and easy. I recommend you to start learning BIML today with the help of BIDS helper. Once you are familiar with the basics, then upgrade your C# skills in BIML as BIML script and practise complex solutions.

Reference Links: