Blog Post

Validating in-line DataFlow records with a Stored Procedure

,

 

I have text file that contains data for a given fiscal period.  In this particular file there are around 2 million records.  The file needs to be split into two distinct output paths from the same source, which means this should all happen within a given DataFlow component.  One of the paths simply sums the data via a specific set of columns, nothing unusual.  The other path required that if the fiscal period was not valid, then no data was to be written.  This sounded simple enough, except the validation routine was a stored procedure.

Now you might be thinking, "Easy peasy, lemon squeezie!  Just put an OLE DB Command in-line and you can call that stored procedure to validate your data."  That would call the stored procedure once for each record (totalling 2 million) when I only need a single call, because although the fiscal period is a column in the file, it is the same for all records.  All I need to accomplish this is a Script Component and here's how I did it.

validating_inline_preexecute.png

After I set up my preexecute to create the stored procedure call, I need to pass in a few parameters, execute it and get a result back.  In this case the result will be a true/false value that we want to return back to the calling portion.  Here is that method:

validating_inline_checkTable.png

Remember that we will be processing millions of rows and I don't want to call this for every row, just once per file.  To accomplish this we use two variables: RunCheck and passThrough.  The variable RunCheck acts like a one way door, setting itself to false immediately after checking.  As you might have guessed from the name it is used to determine if the stored procedure needs to be called.  The variable passThrough is set to the result of the checkTable method and will determine if the data coming down the pipeline is valid or not.  If it is valid, then it is sent passed this component and on to the output.  If the data is invalid, then it is held.  Here is what the ProcessInputRow method looks like:

validating_inline_processinput.png

As you can see from the picture if passThrough is true then that means allow the data to pass through to the output.  If the call returned a false, then rows will never be added to the outputbuffer.

Original post (opens in new tab)

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating