Control Flow Package Parts are a new feature in Microsoft SQL Server 2016 that attempts to enable code reuse within SQL Server Integration Services packages. The intent is great, but like with many first iterations, there are limited use cases, and rough edges. I'll provide an overview of those here, and in subsequent posts I'll detail how I'm currently using package parts in a data warehousing environment.
Why Were Control Flow Package Parts Created?
In Integration Services, there are pain points.
<This space left intentionally blank (for you to crack your own jokes about the volume and variety of SSIS pain points)>
One of those difficulties only really becomes apparent after you use SSIS extensively - as part of a non-trivial data warehousing project or synchronization system, for example. In those cases, you're typically going to have multiple packages, development time is going to span months, and you're going to learn things along the way. Demands for broad progress monitoring, standardizing error handling, or execution management are likely to arise. Most of those scenarios (and many others) typically result in extremely similar plumbing being added to all packages involved in the project - typically by copy and paste. And when the "standard" implementation changes - because it always does - it's a burden to re-copy and paste (and adjust) each package to follow the new and improved system.
Microsoft created Control Flow Package Parts to help this situation.
How Do Control Flow Package Parts Help Code Reuse?
The basic idea behind package parts makes complete sense to a coder - they're macros. You take code you've used in several places, put it in a separate file that you then include and "expand" in multiple other files.
If you have multiple packages with parts of the Control Flow that are identical - setting up a database in a certain way, sending emails, calling a set of stored procedures, ... - then Control Flow Package Parts can help.
The assistance isn't just limited to the initial coding, either. Yes - creating a new package with the "duplicate" code is much easier. But the real gain of Control Flow Package Parts is when your "standard" code needs changes. Instead of having to edit multiple packages to address the modifications - you only have to alter the package part. Deploying the project(s) that depend on this part automatically incorporates those improvements.
The Scorecard
Good!
To recap, the positives of Control Flow Package Parts are that you can now reuse code in SSIS. If you've used Integration Services at all in the past ten years, you should be happy about that. It's great that you can bundle up a series of tasks, preconfigured to operate in a specific way, and just drop that "algorithm" into multiple packages. Moreover, it's looking fairly robust in its implementation - what's there does work reliably.
The package parts aren't "just" macros - they step one level higher than that - you can (lightly) configure them by setting values on the exposed properties of the single top-level task in the part. This means you can (in limited fashion) have a tool encoded in a package part that you can apply to slightly different ends in different packages, or even multiple instances within the same package.
Bad...
Unsurprisingly for a "v1" feature, the initial glimmers of awesomeness are dulled by the relatively small set of scenarios that are enabled with 2016's Control Flow Package Parts. There are a few minor annoyances here:
Invisible Annotations
One of the first things I naturally did when making my first package part was to place an annotation in it so that "users" of the part could be informed about the purpose, side effects, and how to use the part. Unfortunately, when that part is placed inside a host package... annotations aren't visible.
Conditional Precedence Gets Reset
It wouldn't surprise me if I was told I was pushing this new feature too far... but one of my goals was to have Control Flow Package Parts participate fully in the host package's control flow. One of those necessary capabilities is for the package part to participate in precedence constraints with other tasks in the Control Flow. They do... except if you have a conditional precedence constraint from the package part to another task. When the package part is edited and saved (independently from the host package), and then the "macro" of the package part is "expanded" inside the host package... the condition on the precedence constraint disappears.
I've posted a Connect case on that as well. This bug has the potential to be extremely vicious - edits from one user (the package part editor) destroy code in a package they didn't touch... silently.
Data Flow Layouts and Annotations Are Discarded
Update 2016-10-26: Just found this out and
submitted a Connect bug: as soon as you add a package part to a package, you lose all manual control over the positioning of data flow elements, and all of your annotations disappear...
Ugly?
I have to confess that I'm probably taking the feature outside the design envelope - but I feel that it's the natural place to take it. I described Control Flow Package Parts as a "macro" earlier - but what I really want (and I'm sure you want it too) is "Control Flow Package Functions". I want those "parts" to be first-class citizens in the host package. Since they're essentially macros - the code from the package part appears to be literally automatically copied and pasted into the host - there seems to be no reason why not.
I want a managed way to interact with the part: an API if you will. And no, altering the list of top-level task properties isn't an API. Nice try. If one of the strengths of a package part is to have multiple tasks act as "one unit", then by definition, none of the settings of those included tasks is accessible to configure. Secondly, altering properties on a task is one of the cruder methods of "configuring" execution dynamically.
Shared Connection Managers
I want the Package Part to be able to "inherit" Connection Managers from the host, and/or be able to "map" Connection Managers from the host package to Connection Managers inside the package part. I've
filed a Connect item for that request, which should enable far more code re-use. Imagine a simple package part built to test the health of a database and record it in a table, perhaps as a first "validation" step in a standard ETL process you conduct in many packages. This part would need to use two Connection Managers - one for the database under test, and another for the logging table. The logging table could be static (bad idea, nothing should be "unconfigurable"), but the database under test would naturally want to be specified by the host package. Unfortunately, there's no documented* way to "pass in" the connection manager you want.
Shared Variables
Probably more problematic is the inability to communicate values "in" to the package part and get results "out." One of the best ways I can see using package parts is to distill a complex operation to a simple one - but that commonly requires input and output to be passed back and forth. Take our previous example about testing the health of a database and recording it in a table. In addition to wanting the Connection Manager be configurable, you could also want a table name to be passed in, and the results of the test (perhaps a "last updated date") returned.
There's
no documented* way to do this, but think of the code reuse it would enable (without going to the greater length of calling a child package).
What's Next?
It's a great start - but Control Flow Package Parts are of limited use. As-is (without hacks), they are VERY static and unconfigurable, making them not much better than simply dropping a regular task on the design surface.
I may be asking for too much with my vision for "Control Flow Package Functions"... maybe I should be simply using child packages... but there must be a happy middle ground that doesn't require full execution of a child package with all the coding, setup and teardown that entails.
As I said at the top, I really haven't shown you how to use Control Flow Package Parts in this article. I intend to do that next, so I can show you the strengths and weaknesses firsthand. I've also hinted* at "hacking" the Control Flow Package Parts system to accomplish more... and I have a series of how-to posts lined up to demonstrate exactly that... so stay tuned!
By the way - many thanks to my new environment and colleagues at
Absolute Software. Not to belittle the frequent positive feedback I get all the time about my past posts about earlier SQL releases, but it's really nice to get my hands on newer releases. Without direct practical usage of a product or technique, it's really hard to spot the strengths and issues, or hack away at interesting aspects. I'm really fortunate to be in such a great situation with Absolute! Of course, all the opinions I express here are my own, and not those of Absolute...