Welcome to day twenty-sixth post of the 31 Days of SSIS. We’re getting near the end now. If you have not read the other posts, they can be found through the introductory post.
This series has been a combination of my thoughts on practices for working with SSIS and proof of concept packages that represent solutions I’ve worked on in the past. We’ve done a few days of packages now, so I thought I would shift gears again and talk about one of the best practices that I try to follow.
This best practices is to keep the unit of work for each SSIS packages small, focused, and limited in the task that it is trying to accomplish. This means that when I write an SSIS package one of the goals will be to limit the number of Data Flows to as few is logically possible.
For instance, if I have three files to process then I will usually build this as three SSIS packages. If all three of those packages have a shared process with identical inputs, then I may create a fourth SSIS package and pass data from the first three packages to the fourth with a Raw File.
There are a some good reasons for developing small SSIS packages. These reasons are tied into their impacts on your environment. I could say it’ll ease development, but rather than doing that I want to look at the impact that this practice will have on impact.
Let’s examine a recent use of this best practice that I had that really exemplifies the package. Let’s start with a small package.
In a recent engagement I had to add in some new functionality to an existing SSIS process. To protect the innocent, let’s say that the requirement was to geo-code addresses after they arrived. To do so, I was going to be using a web service. The geo-code information then needed to be placed in a special geo-code table. The one additional requirement was that this didn’t need to happen for all imports and sometimes would need to be run some time after the original import was already executed. These aren’t the true requirements, but they are a close enough approximation.
During design, I had a couple options that I had considered using:
Of course, I went with the second option. This was implemented fairly well and after a bit of development was ready to deploy. The new SSIS package did a great job with its place in the existing architecture.
But wait… of course, there is more. The part of this that really sells the concept of smaller, focused SSIS packages is what happened a couple weeks after deploying the new SSIS package.
On a snowy afternoon, I received an e-mail wondering if there was any way a few hundred addresses could be run through the new process. This wasn’t part of the initial requirements, but they thought to ask anyways. The problem was there was a different way in which addresses were loaded and they needed geo-coding as well. The existing process allowed the user to request that one address at a time would be geo-coded.
The win here is that with a query and an export of the results to SSIS, the manual work was eliminated. Since the design focused on a single task that was independent of the import it could be leveraged for previously unexpected uses. We got mad props for the design of the package and one person saved themselves from a serious case of sore wrist.
This practice isn’t absolute and every one of my packages aren’t always small and focused. Though I do prefer to develop in this direction and look to opportunities to do this as often as possible.
What are your thoughts on this practice? Anything that you would like to add? Feel free to leave a comment.