SSIS Dynamic Conditional Split

Question

SSIS Dynamic Conditional Split

Dave R (UK)

SSChasing Mays

Points: 656
More actions
February 20, 2014 at 4:27 am

#283814

Hi all,
I've been looking for the last couple of days, but have been unable to find a solution.
Situation:
I need to load and split a text file, however the rules for the split are likely to change over time. I want to make a dynamic package where the end user can maintain the business logic for the split in an excel file with cols like
Rule# Condition NewColValue
1 EmpG=="GHD" GHD_1
2 EmpG=="BDS" BDS_435
My idea is that I would load this file into a for each loop populating two variables (CONDITION & NEWCOL), then have a data flow within it that would use a conditional split where the condition uses the CONDITION variable to split the data into valid and invalid rows with NEWCOL then added as a derived column for valid records. The data flow would then loop again using then invalid rows and then next condition until all records have been assessed, the last chunk being what ever is left over.
I am having trouble dynamically setting the condition for the split.
I know that I can probably accomplish these using a SQL table and updating the where clause of each select statement, but for this package was trying to avoid the necessity of using the table as it is not required for the output.
Does this make sense ??? Does anyone have any ideas as to whether this is possible or suggestions as to alternative ways that this could be achieved??
Dave

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 247172 More actions · Answer 1

Dave R (UK) (2/20/2014)
Hi all,
I've been looking for the last couple of days, but have been unable to find a solution.
Situation:
I need to load and split a text file, however the rules for the split are likely to change over time. I want to make a dynamic package where the end user can maintain the business logic for the split in an excel file with cols like
Rule# Condition NewColValue
1 EmpG=="GHD" GHD_1
2 EmpG=="BDS" BDS_435
My idea is that I would load this file into a for each loop populating two variables (CONDITION & NEWCOL), then have a data flow within it that would use a conditional split where the condition uses the CONDITION variable to split the data into valid and invalid rows with NEWCOL then added as a derived column for valid records. The data flow would then loop again using then invalid rows and then next condition until all records have been assessed, the last chunk being what ever is left over.
I am having trouble dynamically setting the condition for the split.
I know that I can probably accomplish these using a SQL table and updating the where clause of each select statement, but for this package was trying to avoid the necessity of using the table as it is not required for the output.
Does this make sense ??? Does anyone have any ideas as to whether this is possible or suggestions as to alternative ways that this could be achieved??
Dave

I see what you're trying to do. I doubt that it is possible without doing some coding, but see no reason at all why it could not be achieved in a Script Component.

Actually, now that I think of it, there is one reason. If the number of user-created conditions is completely dynamic, that would also mean that you do not know before execution how many additional columns are required. Or what their data types will be. Or where they map to. Solve these problems and the split itself is easy enough - just define suitable multiple outputs from your Script Component.

Dave R (UK) SSChasing Mays Points: 656 More actions · Answer 2

Thanks for the prompt response Phil,

I have looked at the script component, but couldn't get the condition to work...the number of splits will be totally dynamic and driven by business rules, each rule would need to be applied in sequence, hence the use of the for each container, and chunk off sections of the source file. The columns added will be the same each time, but with different values - so essentially the process is identical each time with differing conditions.

Do you have an example of the script (VB) that I could use? I am guessing it would always split into valid and invalid, valid building up to the final file and invalid becoming the source for the next iteration as it were...

D

Phil Parkin SSC Guru Points: 247172 More actions · Answer 3

Dave R (UK) (2/20/2014)
Thanks for the prompt response Phil,
I have looked at the script component, but couldn't get the condition to work...the number of splits will be totally dynamic and driven by business rules, each rule would need to be applied in sequence, hence the thought of the for each container, and chunk of section of the source file. The columns added will be the same each time, but with different values - so essentially the process is identical each time with differing conditions.
Do you have an example of the script (VB) that I could use? I am guessing it would always split into valid and invalid, valid building up to the final file and invalid becoming the source for the next iteration as it were...
D

Your logic is sound, but I'm afraid I don't have any example that does anything like that. However, the split itself would be achieved along the lines of the following horrid pseudo-code:

Valid = "Y"

for each condition

if (found) then

Valid = "N"

Save additional column information

Exit condition loop //(if you are happy to catch only the first match rather than all of them)

End If

next condition

if (Valid=="Y") then

Create new 'valid' output row

Assign output row variables

Else

Create new 'invalid' output row

Assign output row variables

End If

Note that before writing the above code you would have to define the two script component outputs manually.

Note also that a script component can run in two modes: synchronous and asynchronous. For the above to work, I think it would have to run in async mode.

Dave R (UK) SSChasing Mays Points: 656 More actions · Answer 4

Thanks again Phil,

that is sort of where I have got to, the problem I am having though is using a variable for the found condition....it si getting it to treat the variable as the whole condition instead of looking at the variable value and testing it against something....so

I have varaible CONDITION populated with "ColA = = 54" as an example....in the script I want to write

if CONDITION then

which would equate to

if ColA==54 then

but I am thinking that it just seems this as a string to compare ie really wants

if CONDITION == 1 then

does that makes sense??

Think I am going to end up having a table and dynamically setting the where clause of the SQL expression that pulls the dataset...

Phil Parkin SSC Guru Points: 247172 More actions · Answer 5

Dave R (UK) (2/20/2014)
Thanks again Phil,
that is sort of where I have got to, the problem I am having though is using a variable for the found condition....it si getting it to treat the variable as the whole condition instead of looking at the variable value and testing it against something....so
I have varaible CONDITION populated with "ColA = = 54" as an example....in the script I want to write
if CONDITION then
which would equate to
if ColA==54 then
but I am thinking that it just seems this as a string to compare ie really wants
if CONDITION == 1 then
does that makes sense??
Think I am going to end up having a table and dynamically setting the where clause of the SQL expression that pulls the dataset...

Yes, I don't know whether that is possible (maybe a .NET guru can help here?)

But even if it were possible, you could be opening yourself up to some sort of injection attack by passing through the conditions without parsing them.

If you added some logic to your code to break each condition into four parts ... something like:

(ColumnName, Operator, MatchValue, NewValue)

You might find it easier to code.

Phil Parkin SSC Guru Points: 247172 More actions · Answer 6

Or, rather than ColumnName, use ordinal position for direct access to it in subsequent code.

Dave R (UK) SSChasing Mays Points: 656 More actions · Answer 7

Thanks for the suggestions Phil - given me food for thought as it were :w00t:

Dave R (UK) SSChasing Mays Points: 656 More actions · Answer 8

Well - took a couple of hours tinkering and a lot of thought, but I have it working the way I want !!!

So, thought it worth while adding my solution below, I had to tweak my conditions file slightly - think the insipration came from Phil's post so thanks again Phil

If anyone else has any ideas or suggestions on how to improve this or streamline it I would be interested in hearing them.

My Solution

I have a file with the conditions in that holds all the columns that are in the source file, similar to the below, and holds the valid values for each test:

Condition# Col1 Col2 Col3 NewCol1

1 Any 56 Any A

2 454 56 Any B

3 Any 56 451 C

I load this into a recordset and loop through it populating variables Col1_Value, Col2_Value, Col3_Value and NewCol

The condition in my conditional split is

(Col1 == @[User::Col1_Value] || @[User::Col1_Value] == "Any") && (Col2 == @[User::Col2_Value] || @[User::Col2_Value] == "Any") && (Col3 == @[User::Col3_Value] || @[User::Col3_Value] == "Any")

to be sent to the valid record output.

On the first iteration records where Col2 == 56 are found, second where Col1 == 454 and last where Col3 == 451.

A derived column uses the last value to add the results to the dataset.

Working just as I wanted !!!

To complete it, the source file will be loaded to a holding file which is the source for the dataflow. InvalidRows on each iteration will be sent to an InvalidRecords file. After the dataflow within the loop I will then replace the holding/source file with the latest set of invalid records, thus removing from the source for the dataflow the records which passed the latest test.

Job Done :hehe:

Phil Parkin SSC Guru Points: 247172 More actions · Answer 9

Nice work, well done!

You asked for suggestions ...

I would have loaded the condition data into an array rather than named variables. It's much easier in code to loop round an array and there will be no worries about hitting limits imposed by the number of variables you have declared.

SSIS Dynamic Conditional Split

Cookies on SQLServerCentral