SQLServerCentral Article

Parameterize Connections in Microsoft Fabric Data Factory using Variable Libraries

,

When you create a data pipeline in Microsoft Fabric, you typically use one or more connections. A connection points to a certain data source (such as Azure Blob Storage, a relational database, a Cosmos DB instance etc.) or to a destination in Fabric (OneLake, Lakehouse, Warehouse etc.). After development is finished and you tested your pipeline, you probably want to promote it to another workspace as part of your deployment and testing strategy (development – test – acceptance – production). However, you don’t want your connections to point to the same sources and destinations as in your development environment. A pipeline in production shouldn’t be updating a warehouse in the development environment.

When you deploy a pipeline to another workspace, Fabric doesn’t change the connection. For example, if the connection points to a warehouse named my_warehouse in the dev workspace, it will still point to the same warehouse after you’ve deployed it to your test workspace, even though there might be a warehouse with the same name in that workspace. Connections are absolute, not relative. In the following screenshot you can see Fabric appends the workspace name to the name of the warehouse, to indicate that there are multiple warehouses in your tenant with the same name, and to let you know which warehouse exactly you are connecting to:

A screenshot of a computer AI-generated content may be incorrect.

To avoid issues with your connections after deployments, you need to parameterize the connections. At runtime, the parameters will tell the connection which object to reference. In Azure Data Factory or Synapse Analytics, you use linked services to hold your connections. You can parameterize a linked service, or you can overwrite the connection values (such as the server or database name) in the CI/CD pipeline. In Fabric, the concept of linked services do not exist. There’s only one place to manage connections:

A screenshot of a phone AI-generated content may be incorrect.

At this point, it’s not possible to parameterize a connection itself. There’s also no support for deployment rules for data pipelines in the deployment pipelines (you would use those for example to change the source connection of a Power BI semantic model). But recently Fabric introduced the variable libraries, which are the solution for parameterizing your data pipeline connections. In this article, we’ll show you how you can use those variable libraries to create dynamic internal (to Fabric objects) and external connections (to sources outside of Fabric).

At the time of writing, variable libraries are in preview. Functionality or user interfaces might change until the feature is generally available (GA).

Fabric Variable Libraries

Set-up

To demonstrate the concept of a variable library, we’re going to create a data pipeline that copies data from an Azure SQL DB to a Fabric Warehouse. In a Fabric-enabled workspace, choose to create a new item and search for variable library.

A screenshot of a software AI-generated content may be incorrect.

Give the library a name:

A green and white box AI-generated content may be incorrect.

To use variable libraries in your tenant, the feature needs to be enabled in the admin portal:

A screenshot of a computer AI-generated content may be incorrect.

If this is not the case, an error will be shown:

A close-up of a white background AI-generated content may be incorrect.

Also create a warehouse:

A screenshot of a computer AI-generated content may be incorrect.

And a data pipeline:

A screenshot of a chat AI-generated content may be incorrect.

You should have the following objects in the dev workspace:

A screenshot of a computer AI-generated content may be incorrect.

In my set-up, I have two different Azure SQL Databases as a source, so we can show the external connection actually changes. In reality, you might also have a different source database per environment. In the connections menu shown earlier, I created connections to those two databases:

A screenshot of a computer AI-generated content may be incorrect.

A connection definition looks like this:

A screenshot of a computer AI-generated content may be incorrect.

Note that a connection has a unique connection ID, which we’ll use later. Edit the pipeline and add a Copy Data activity:

A screenshot of a computer AI-generated content may be incorrect.

I develop the pipeline first without parameterization, so it be tested thoroughly before parameterization (since variable libraries are in preview, some functionality like data previews don’t work yet). In the source, I connect to one of the two Azure SQL databases and read out some table metadata:

A screenshot of a computer AI-generated content may be incorrect.

The following query is used:

SELECT 
     dbname = TABLE_CATALOG 
    ,schemaname = TABLE_SCHEMA 
    ,tablename = TABLE_NAME 
FROM INFORMATION_SCHEMA.TABLES 
WHERE TABLE_TYPE = 'BASE TABLE';

The warehouse of the same workspace as the pipeline is configured as the sink:

A screenshot of a computer AI-generated content may be incorrect.

The data will be written to a table with the following schema:

CREATE TABLE dbo.TableMetadata( 
     dbname     VARCHAR(100) NOT NULL 
    ,schemaname VARCHAR(100) NOT NULL 
    ,tablename  VARCHAR(100) NOT NULL 
);

Don’t forget to truncate the table before the data is loaded. When the pipeline is run, data like this should be written to the table:

A screenshot of a computer AI-generated content may be incorrect.

Create a deployment pipeline to deploy your items from the dev workspace to a test workspace:

A screenshot of a computer AI-generated content may be incorrect.

The deployment pipeline will create a warehouse with the table dbo.TableMetadata, the data pipeline (with connections still pointing to the original data sources) and an empty variable library.

A screenshot of a computer AI-generated content may be incorrect.

Use the Variable Library to Parameterize Connections

Open up the variable library and add these variables:

  • mywarehouse. This contains the object ID of the warehouse.
  • mysourceconnection. The object ID of the connection pointing to the Azure SQL DB (as mentioned before).
  • warehouseconn. The SQL connection string to the warehouse.
  • sourcedb. The name of the database in the Azure SQL DB.

A screenshot of a computer AI-generated content may be incorrect.

You can find the object ID of an item by browsing to it; the object ID is contained in the URL. For example, the object ID of a warehouse:

A screenshot of a computer AI-generated content may be incorrect.

The SQL connection string can be found in the settings (see screenshot above and below):

A close-up of a computer screen AI-generated content may be incorrect.

We’re also going to add a value set to the variable library. A value set is a set of alternative values for your variables.

A screenshot of a computer AI-generated content may be incorrect.

The default value set will be used in development, while the other values set is used in the test workspace. For the new value set, specify the correct object IDs for the test workspace:

A screenshot of a computer AI-generated content may be incorrect.

You can obviously only find the object ID of the test warehouse after you’ve deployed it to that workspace. Also change the object ID of the source connection to the second Azure SQL database connection, and similarly for the database name. Deploy the changed variable library to the test workspace using the deployment pipeline. There we’re going to set the test value set as the active one:

A screenshot of a computer AI-generated content may be incorrect.

Back to the data pipeline. In the settings of the pipeline itself, a new tab has been added for library variables:

A screenshot of a computer AI-generated content may be incorrect.

For each variable of the library, add a variable in this pane and link it to the corresponding library variable. The result should be like this:

A screenshot of a computer AI-generated content may be incorrect.

Now we can use those variables inside the pipeline in dynamic content expressions. In the source pane of the Copy Data activity, change the connection to Use dynamic content.

A screenshot of a computer AI-generated content may be incorrect.

The expression editor will pop up. It’s a bit hidden, but you can find the variable libraries when you click on the ellipsis:

A screenshot of a computer AI-generated content may be incorrect.

There we can find the variables that we just added to the pipeline:

A screenshot of a web page AI-generated content may be incorrect.

Drag the variable holding the object ID of the source connection to the expression pane and click OK. Using the same method, we can parameterize the database name as well:

A screenshot of a computer AI-generated content may be incorrect.

As mentioned before, some features don’t work yet while variable libraries are in preview. When you try to preview the data, an error will be shown:

A screenshot of a computer error message AI-generated content may be incorrect.

We can now parameterize the sink connection as well with the other variables:

A screenshot of a computer AI-generated content may be incorrect.

For the workspace ID, the system variable DataFactory can be used. Test the pipeline by running it. The same data should be written to the dev warehouse.

A screenshot of a computer AI-generated content may be incorrect.

Deploy the parameterized data pipeline to the test workspace.

A screenshot of a computer AI-generated content may be incorrect.

When you open the data pipeline in the test workspace, it should look identical to the one of the dev workspace. However, when that pipeline is run, data from the other Azure SQL DB (named mycheapdwh in my case) should be written to the test warehouse:

A screenshot of a computer AI-generated content may be incorrect.

This shows that the parameterization of the connections was successful.

Conclusion

In this article we’ve shown how we can build data pipelines in Fabric Data Factory with parameterized connections. Using a variable library, we can pass on different sets of values in different workspaces, so we can change a connection dynamically. This should be part of your CI/CD strategy when deploying artefacts from one workspace to another. At the time of writing, the feature is in preview and only data pipelines are supported. Be sure to check the considerations and limitations in the documentation.

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating