Another day, another blog post all about SSIS in my continuing journey through 31 Days of SSIS. A list of the other posts in this series can be found in the introductory post. If you haven’t had a chance to read through that, I would recommend taking a look at it after reading this post. Yup, I just copied this from yesterday’s post. Plagiarism!
Yesterday, I jumped back to looking at another use for the Script Component transformation. Today, I want to make another jump and look at another use for Expressions and ForEach loops. In this case, we will look at a package where one package will be used to make similar changes to multiple databases.
A number of years ago, I was working with a single tenant database application. That database had a state/province table that was “kinda” similar in each database. This “kinda”-bit had nothing to do with any specific business requirements. It was this way because nobody ever got around to updating all of the databases whenever a change was made in one database or another. As is probably obvious, new states don’t show up too often but the state of this data was terrible.
Management of the data in these tables fell down to me, as the DBA. When they were out of synch it was my problem so I needed to do something to resolve the situation. What I wanted was a method that I could manage the table from one location and have each of the databases updated in turn. I also needed a solution that was light-weight and low tech. The people that would manage the solution after it was pushed out were savvy with Access and a bit afraid of SQL Server.
To solve this issue, I opted for an SSIS solution. SSIS has capabilities to bring the data across to the databases and determine what needs to be inserted, updated, or deleted. Alternatively, I could have just used change management and DML scripts to manage the data. The downside to that was that, at that point, that process was what caused the problem. Also, I could have looked at using replication. I opted against that since I wanted to keep the implementation low tech. Replication does require a bit of know-how to keep it going.
Of course, there should be a package per database. That would be impossible to manage and put the solution back with the existing process. There also shouldn’t need to be one execution of the package per database. This needed to be a process that would just run and work itself out with one or one hundred databases.
To accomplish these needs, the package will utilize a ADO record set that contains a list of all of the databases that need to be managed. That record set will be used to iterate a loop that will make all the necessary changes to that databases that subscribe for the updates.
With the solution outline, we’ll begin to go through the package to detail how this is done and hopefully help you learn a couple new things today.
There are a few things that will be occurring in the Control Flow to take note of in this package. First, the we need a list of all of the servers and databases that contain the tables that we want to update. With that information the package will loop through each of the server and database combinations. Finally, for each iteration of the loop, the package will insert, update, and delete data as is required to keep the data in the client database tables identical to the data in the central table.
This logic will be implemented with the following Control Flow components:
There are a number of variables in the package to help control how everything is processed.
The variables from above were added for the following purposes:
Usually I don’t highlight the connections, but in this case it is necessary. The SSIS package contains two connections:
With the connection information configured, it’s time to look at the data flows of the SSIS package. There are two data flows in this SSIS package
This first Data Flow task will control the INSERT and UPDATE operations on the table that is being managed through this package. To accomplish these operations the following transformations are used:
The second Data Flow task controls the DELETE operations on the table. Since rows may be deleted on the central table, they should also be deleted on the subscribers. To accomplish these operations the following transformations are used:
Before you can run the package, you will need to create a couple of dummy databases. The script for these is below:
CREATE DATABASE [Client_Db1] GO USE [Client_Db1] GO CREATE TABLE [dbo].[StateProvince]( [StateProvinceID] [int] NULL, [StateProvinceCode] [nchar](3) NOT NULL, [CountryRegionCode] [nvarchar](3) NOT NULL, [IsOnlyStateProvinceFlag] [bit] NOT NULL, [Name] [nvarchar](50) NOT NULL, [TerritoryID] [int] NOT NULL, [rowguid] [uniqueidentifier] NOT NULL, [ModifiedDate] [datetime] NOT NULL ) ON [PRIMARY] GO CREATE DATABASE [Client_Db2] GO USE [Client_Db2] GO CREATE TABLE [dbo].[StateProvince]( [StateProvinceID] [int] NULL, [StateProvinceCode] [nchar](3) NOT NULL, [CountryRegionCode] [nvarchar](3) NOT NULL, [IsOnlyStateProvinceFlag] [bit] NOT NULL, [Name] [nvarchar](50) NOT NULL, [TerritoryID] [int] NOT NULL, [rowguid] [uniqueidentifier] NOT NULL, [ModifiedDate] [datetime] NOT NULL ) ON [PRIMARY] GO
Go ahead and execute the packages a couple times. The first time the results will be kind of boring since everything will be an insert. The second execution, though, there will be some rows that flow into the update and delete paths. This is due to the demo only T-SQL statement to mimic changes in the database.
The results for the second execution should look like the image below. Of course, I’ve cropped both data flows over each other into a single image.
One of the things that I really like about SSIS is it’s continued flexibility and ease of use. With just a record set, a couple of variables and a loop, an SSIS package can change from updating one database to updating as many as you need. Hopefully this provides some inspiration to your own processes where you can consolidate packages and processes.
I should mention that while I use the OLE DB Command in this package it isn’t necessarily overt endorsement of it’s use. Considerations for using it are many and enough for another blog post. For today, I’ll just say that you need to be careful of the fact that this transformation will execute one T-SQL statement per row that it processes in the data flow path.