Data Flow Script Transformation and Pre/PostExecute methods - when do they run

  • Maybe I'm not understanding this component properly so I'd be grateful if someone could shed light between what I'm expecting and what I'm experiencing with this component.

    Basically, part way down my pipeline of data I have placed a Script Component to act as Data Transformation. It's purpose is to transform data on each row of the input according to specific rules based on package level variables and add an additional column with the transformed data. I've done this many times before and this works perfectly. However in this case I need to gather some statistics on how many of each transformation are taking place. This is done by defining some package level variables that will ultimately hold the data and specifying these in the ReadWrite variable property of the script. As the transformations occur, internal C# variables are incremented accordingly and it is in the PostExecute() method that I was writing the internal variables back to the package variables.

    However when I came to use the variables later on in the pipe they were set to their default zero values still. So I did a bit of debugging and this is where my understanding and reality started to drift.

    What I was expecting was that the PreExecute() would run as soon as data started to flow down the pipe and into the transformation, the ProcessInputRow() method fire on each row, and then once all of the data had been processed and there was no more data to flow in, the PostExecute() process would run.

    But when I threw a few MessageBox.Show() commands into the transformation this is what I experienced.

    The PreExecute() ran before any other component ran, regardless that it has half way down the data pipe, and before even the data sources had started to extract the data.

    The ProcessInputRow() did run as expected for each record, and the counts were incrementing as I had expected them to.

    The PostExecute() never ran at all.

    Am I missing something obvious here?

    Thanks in advance.


    ---------------------------------------
    It is by caffeine alone I set my mind in motion.
    It is by the Beans of Java that thoughts acquire speed,
    the hands acquire shaking, the shaking becomes a warning.
    It is by caffeine alone I set my mind in motion.

  • I don't know enough about SSIS to know about the Pre and Post methods, and I'm at a loss as to the scoop with the post one not running, but I would suspect that there may be things you just can't do in post, possibly due to scope problems, but I'd be taking a giant SWAG with that.

    I'm not sure why you need to have the package count the transforms though, when it seems that just having a derived column detect the conditions necessary to cause one, and then outputting a column whose value is Y or N for transformed or not for each one you want to count, might be easier to document, and very easy to query later, without having to depend on the package. You'll need to keep the data in the same table, or you can multi-cast it to a separate table with the same primary key. It also gives you a permanent record of the transforms in your database at the record level, which you just never know when that might turn out to be awfully darn handy. However, there may be other considerations that come into play that you have yet to detail. I'm also wondering what kind of transform you'd want or need a script task for. In other words, there may be other ways to "skin the cat", so to speak.

    Other considerations could include things like data volume, performance, or even the very nature of the data. Wish I could speak to the "when" part on execution. I'd like to know that myself...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • I've run some more tests and what I'm getting is that the PreExecute() executes as soon as the entire data flow starts, the PostExecute() executes as soon as the data flow finished and control passes back out to the Control Flow. I proved it by simply creating a dataflow that extracted 5 records, used the transform to count them and then cheated and put another script transform afterwards so I could use the message box. Sure enough I got the message box showing no records were processed. But a message box in the Control Flow after the data flow task showed the correct database.

    The reason for wanting the count is that I don't have control in this instance of the destination table structure so I can't add any additional fields on to capture whether the transformation took place or not.

    Now I know what it happening it will stop me from falling into the same trap again and trying to use the transformation for something it is actually not designed to do. Having been to the pub for lunch I have now realised my error and put a RowCount on the pipes to catch the number of changes being made and handling this back out in the Control Flow where it belongs.

    As usual with me, trying to fit a square peg into a round hole 🙂


    ---------------------------------------
    It is by caffeine alone I set my mind in motion.
    It is by the Beans of Java that thoughts acquire speed,
    the hands acquire shaking, the shaking becomes a warning.
    It is by caffeine alone I set my mind in motion.

  • Hey, "Round Hole" ... My name is "Square Peg" ... care to JOIN ???

    Yeah, that's SELECT, ain't it ??

    Last time I heard that I fell off my dinosaur WHERE SUNSHINE = 0

    I know... I just can't stop my bad self....

    Usually, I just start carving corners in the round hole until the peg fits... it's either that or sanding the peg...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply