Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

SSIS - Code Reuse and Complex Control Flows

By Kristian Wedberg,

If you're anything like me, you get a queasy feeling in your stomach when you're copying a piece of code over and over again, reusing it in different places - you just know that it will come back and bite you, usually in the shape of an ugly, hairy beast, growling "copies not synchronized, psycho." I sure felt it breathing down my neck when setting up a benchmark recently; this article is all about how to tame that beast - here's my original starting point:

The five tasks at the bottom (Simplex Benchmark to Scary Benchmark) all produce the same end result, but do it in five different ways (I'll leave the details of that to another article.) The five tasks at the top though (Read Tables 1 to Read Tables 5), they all do the exact same thing, namely reading the same two tables from start to end - their purpose is to give each benchmark a reasonably consistent initial environment.

Tip: To get reliable results, I need to control not only what happens in the database and on the host while the benchmark is running, but also put the whole system in a consistent state before starting each benchmark. More on that at a later date.

Copy&paste

Let's not forget, copy&paste is the first level of code reuse - the fact that you can so easily select a number of components and duplicate them somewhere else in your package, or even in a different package, is a real timesaver, both when creating exact duplicates like in the above screenshot, and when using it as a starting point for creating new functionality.

Right now though, I can't stop glancing over my shoulders, I really want to get rid of all but one of those Read Tables tasks; it's time to refactor the solution by extracting the duplicated functionality, and put it in a single spot, making it easy to maintain.

I remember that the precedence constraints between tasks in the control flow does not allow circular paths - a task can not be linked to itself, not even indirectly via another task, so that's out.

Let's check out the

Simple for loop

It's instructive to think about for loops and for each loops as not just being very useful iterators of numbers, files etc., but also as being very (common and) useful ways of reusing code:

If I configure my for loop to do five iterations, I will have reused the Read Tables and A Benchmark tasks five times, just perfect! Except that in my particular case, I need five different benchmarks, which leads us to create a

Not so simple for loop

My for loop is configured to set the variable Step to {1, 2, 3, 4, 5}, in that order. This will be the clock tick that makes our construct work, changing the control flow each time.

We use a single copy of Read Tables, and five different benchmark tasks. I've also numbered the benchmark tasks from 1 to 5 to match up with the for loop Step values - this visual aid makes it easier to see in which order tasks will execute, even after moving them around in the layout pane.

The final part is to drag the precedence constraints from Read Tables to each of the benchmark tasks, and then double click each link (or right click and "Edit...", or use the properties window to set the corresponding values) and configure them like this, changing the number to "1" through "5" for the five links:

When we run this, the execution order will be:

  • Read Tables
  • 1. Simplex Benchmark
  • Read Tables
  • 2. Duplex Benchmark
  • Read Tables
  • 3. Complex Benchmark
  • Read Tables
  • 4. Huge Benchmark
  • Read Tables
  • 5. Scary Benchmark

Now this seems like a perfect fit for my problem in terms of functionality and complexity, so as long as I only need this simple sequence, this is what I'll use.

Tip: In the Precedence Constraint Editor above we specified Evaluation operation = "Expression". This has the side effect of allowing the benchmark tasks to run even if the Read Tables task fails! An alternative is to set Evaluation operation = "Expression and Constraint" - this will stop the precedence from triggering unless Read Tables actually succeeds.

Tip: You can only set a precedence constraint if you have a preceding task! For instance, if you needed to run Read Tables after each benchmark, you would still need a preceding task to set the constraints against - simply use a no-op as a preceding task, such as a Sequence Container with nothing in it.

But requirements change! What if I need to run three benchmark tasks in a more complex sequence? Let's investigate the

Downright complex and best avoided for loop

Can it handle this?

  1. Complex Benchmark
  2. Duplex Benchmark
  3. Simplex Benchmark in parallel with Duplex Benchmark
  4. Simplex Benchmark in parallel with Complex Benchmark
  5. Duplex Benchmark in parallel with Complex Benchmark
  6. Complex Benchmark
  7. Duplex Benchmark

Here we have both repeating sequences of tasks, and tasks that sometimes run sequentially, and sometimes in parallel.

One way to do this is to start with the Not so simple for loop, and add more complex precedence expressions:

  • We have seven steps, so make the loop variable Step go from 1 to 7
  • For every benchmark, check which steps it should run in, i.e. Complex Benchmark runs in step 1, 4, 5, 6
  • For every benchmark, set the precedence expression to evaluate to true for the identified steps, i.e. "@Step==1 || @Step>=4 || @Step<=6"

Are we having fun yet??? Depending on your answer, feel free to impress me greatly (and get weird looks from your spouse I bet) by having that table driven Finite State Machine on my desk by Monday, 9am sharp! To implement it though, instead of Downright complex... I suggest you use the

Getting silly for loop

In the previous Downright complex... for loop we had our control logic spread out across many precedence expressions. That's much too easy to get wrong, so let's refactor again. Now we have centralized all the logic in the single script called Logic, where we can configure arbitrary control flows:

To handle the complex sequence, construct it thusly:

  1. Configure the for loop to iterate Step from 1 to 7, corresponding to our seven steps in the sequence. Also add an integer variable BitMask to the for loop.
  2. Name the benchmark tasks as 1, 2, 4 (continuing with 8, 16, 32, 64 etc. for any additional tasks)
  3. Add code to the Logic script to set the bits in BitMask according to which step in the sequence is executing - a set bit means the corresponding benchmark task will execute in the current step:
    Enum Bits
      SimplexBenchmark = 1
      DuplexBenchmark = 2
      ComplexBenchmark = 4
    End Enum
    Select Case CType(Dts.Variables("Step").Value, Integer)
      Case 1
        Dts.Variables("BitMask").Value = Bits.ComplexBenchmark
      Case 2
        Dts.Variables("BitMask").Value = Bits.DuplexBenchmark
      Case 3
        Dts.Variables("BitMask").Value = Bits.SimplexBenchmark + Bits.DuplexBenchmark
      Case 4
        Dts.Variables("BitMask").Value = Bits.SimplexBenchmark + Bits.ComplexBenchmark
      Case 5
        Dts.Variables("BitMask").Value = Bits.DuplexBenchmark + Bits.ComplexBenchmark
      Case 6
        Dts.Variables("BitMask").Value = Bits.ComplexBenchmark
      Case 7
        Dts.Variables("BitMask").Value = Bits.DuplexBenchmark
    End Select
  4. Set the precedence expressions for the three benchmark tasks to trigger when their corresponding bit in BitMask is set:
    • (@BitMask & 1) != 0
    • (@BitMask & 2) != 0
    • (@BitMask & 4) != 0
Tip: Remember that we can have other tasks and containers running in parallel with our for loop - they can signal external events to our Logic script via variables for instance. Just don't go overboard with this unless it really is for a good cause...

Tip: Instead of using the for loop AssignExpression to change Step, you might want to use the Logic script to update it - that's especially useful when you want to 'jump around' in the sequence, say "Go from step 5 to step 2 if today is Saturday or Sunday", and when the sequence should 'run forever'.

No downloadable package today, I've left that as an exercise to you the reader (I really hate it when writers use cop outs like that. Which is exactly why I did say that, just now. Don't get mad, get even.-)

All in all, it is a bit silly, but in an irresistible kind of way - just like my wife actually, and we had Love At First Sight! So don't be surprised if this thingumajig pops up again.

Stop, I'm gagging!

I'll give that poor loop a rest for a while. As a heads up though, here are some more ways to implement reuse:

  • Using the loop counter to effect different behavior inside a task
  • Putting SQL code in variables and external files
  • Calling child packages
  • Using event handlers
  • Writing .NET assemblies
  • Creating custom components

These are all pretty hefty subjects in their own right; we'll save them for another day.

Ergo sum

  • Precedence constraints can not have circular paths
  • For loops are very useful for implementing reuse as well as for implementing simple and complex control flows
  • Understand the various ways of doing reuse - which technique and what level of complexity is appropriate in your case, taking the foreseeable future of your implementation into account?
  • The Getting silly for loop is one way to implement Finite State Machines in SSIS

There is truly an enormous amount of functionality packed inside Integration Services, to the extent that it feels more like a 2.7 version than the 1.0 it really is. As we've seen, there are many ways to accomplish code reuse, which is lucky indeed, 'cause as you know, I sure don't like the sound of that growling beast...


Kristian Wedberg is a Business Intelligence consultant living outside London, UK. He can be reached at this email address: trk2061 [at] wedberg.name


Disclaimer: Opinions and comments expressed herein are my own, and does not necessarily represent those of my employer.
Total article views: 15064 | Views in the last 30 days: 29
 
Related Articles
BLOG

Calculate Bitmasks In PowerShell

Have you ever had to calculate bitmasks on the fly?  I have and still do.  In this post, I’m going t...

ARTICLE

SSIS - Reusable Benchmark Harness

We all know testing is important, but face it, testing is not the highlight of anyone's daily work. ...

FORUM

Precedence Constraints

Precedence Constraints in data flow

ARTICLE

Virtual Benchmarks

This week Steve Jones looks at the new TPC benchmark, TPC-VMS.

BLOG

DB Benchmarking

As database professionals, we have a need to benchmark performance of the database, processes, and e...

 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones