SSIS Dynamic Conditional Split

  • Hi all,

    I've been looking for the last couple of days, but have been unable to find a solution.

    Situation:

    I need to load and split a text file, however the rules for the split are likely to change over time. I want to make a dynamic package where the end user can maintain the business logic for the split in an excel file with cols like

    Rule# Condition NewColValue

    1 EmpG=="GHD" GHD_1

    2 EmpG=="BDS" BDS_435

    My idea is that I would load this file into a for each loop populating two variables (CONDITION & NEWCOL), then have a data flow within it that would use a conditional split where the condition uses the CONDITION variable to split the data into valid and invalid rows with NEWCOL then added as a derived column for valid records. The data flow would then loop again using then invalid rows and then next condition until all records have been assessed, the last chunk being what ever is left over.

    I am having trouble dynamically setting the condition for the split.

    I know that I can probably accomplish these using a SQL table and updating the where clause of each select statement, but for this package was trying to avoid the necessity of using the table as it is not required for the output.

    Does this make sense ??? Does anyone have any ideas as to whether this is possible or suggestions as to alternative ways that this could be achieved??

    Dave

  • Dave R (UK) (2/20/2014)


    Hi all,

    I've been looking for the last couple of days, but have been unable to find a solution.

    Situation:

    I need to load and split a text file, however the rules for the split are likely to change over time. I want to make a dynamic package where the end user can maintain the business logic for the split in an excel file with cols like

    Rule# Condition NewColValue

    1 EmpG=="GHD" GHD_1

    2 EmpG=="BDS" BDS_435

    My idea is that I would load this file into a for each loop populating two variables (CONDITION & NEWCOL), then have a data flow within it that would use a conditional split where the condition uses the CONDITION variable to split the data into valid and invalid rows with NEWCOL then added as a derived column for valid records. The data flow would then loop again using then invalid rows and then next condition until all records have been assessed, the last chunk being what ever is left over.

    I am having trouble dynamically setting the condition for the split.

    I know that I can probably accomplish these using a SQL table and updating the where clause of each select statement, but for this package was trying to avoid the necessity of using the table as it is not required for the output.

    Does this make sense ??? Does anyone have any ideas as to whether this is possible or suggestions as to alternative ways that this could be achieved??

    Dave

    I see what you're trying to do. I doubt that it is possible without doing some coding, but see no reason at all why it could not be achieved in a Script Component.

    Actually, now that I think of it, there is one reason. If the number of user-created conditions is completely dynamic, that would also mean that you do not know before execution how many additional columns are required. Or what their data types will be. Or where they map to. Solve these problems and the split itself is easy enough - just define suitable multiple outputs from your Script Component.


  • Thanks for the prompt response Phil,

    I have looked at the script component, but couldn't get the condition to work...the number of splits will be totally dynamic and driven by business rules, each rule would need to be applied in sequence, hence the use of the for each container, and chunk off sections of the source file. The columns added will be the same each time, but with different values - so essentially the process is identical each time with differing conditions.

    Do you have an example of the script (VB) that I could use? I am guessing it would always split into valid and invalid, valid building up to the final file and invalid becoming the source for the next iteration as it were...

    D

  • Dave R (UK) (2/20/2014)


    Thanks for the prompt response Phil,

    I have looked at the script component, but couldn't get the condition to work...the number of splits will be totally dynamic and driven by business rules, each rule would need to be applied in sequence, hence the thought of the for each container, and chunk of section of the source file. The columns added will be the same each time, but with different values - so essentially the process is identical each time with differing conditions.

    Do you have an example of the script (VB) that I could use? I am guessing it would always split into valid and invalid, valid building up to the final file and invalid becoming the source for the next iteration as it were...

    D

    Your logic is sound, but I'm afraid I don't have any example that does anything like that. However, the split itself would be achieved along the lines of the following horrid pseudo-code:

    Valid = "Y"

    for each condition

    if (found) then

    Valid = "N"

    Save additional column information

    Exit condition loop //(if you are happy to catch only the first match rather than all of them)

    End If

    next condition

    if (Valid=="Y") then

    Create new 'valid' output row

    Assign output row variables

    Else

    Create new 'invalid' output row

    Assign output row variables

    End If

    Note that before writing the above code you would have to define the two script component outputs manually.

    Note also that a script component can run in two modes: synchronous and asynchronous. For the above to work, I think it would have to run in async mode.


  • Thanks again Phil,

    that is sort of where I have got to, the problem I am having though is using a variable for the found condition....it si getting it to treat the variable as the whole condition instead of looking at the variable value and testing it against something....so

    I have varaible CONDITION populated with "ColA = = 54" as an example....in the script I want to write

    if CONDITION then

    which would equate to

    if ColA==54 then

    but I am thinking that it just seems this as a string to compare ie really wants

    if CONDITION == 1 then

    does that makes sense??

    Think I am going to end up having a table and dynamically setting the where clause of the SQL expression that pulls the dataset...

  • Dave R (UK) (2/20/2014)


    Thanks again Phil,

    that is sort of where I have got to, the problem I am having though is using a variable for the found condition....it si getting it to treat the variable as the whole condition instead of looking at the variable value and testing it against something....so

    I have varaible CONDITION populated with "ColA = = 54" as an example....in the script I want to write

    if CONDITION then

    which would equate to

    if ColA==54 then

    but I am thinking that it just seems this as a string to compare ie really wants

    if CONDITION == 1 then

    does that makes sense??

    Think I am going to end up having a table and dynamically setting the where clause of the SQL expression that pulls the dataset...

    Yes, I don't know whether that is possible (maybe a .NET guru can help here?)

    But even if it were possible, you could be opening yourself up to some sort of injection attack by passing through the conditions without parsing them.

    If you added some logic to your code to break each condition into four parts ... something like:

    (ColumnName, Operator, MatchValue, NewValue)

    You might find it easier to code.


  • Or, rather than ColumnName, use ordinal position for direct access to it in subsequent code.


  • Thanks for the suggestions Phil - given me food for thought as it were :w00t:

  • Well - took a couple of hours tinkering and a lot of thought, but I have it working the way I want !!!

    So, thought it worth while adding my solution below, I had to tweak my conditions file slightly - think the insipration came from Phil's post so thanks again Phil 🙂

    If anyone else has any ideas or suggestions on how to improve this or streamline it I would be interested in hearing them.

    My Solution

    I have a file with the conditions in that holds all the columns that are in the source file, similar to the below, and holds the valid values for each test:

    Condition# Col1 Col2 Col3 NewCol1

    1 Any 56 Any A

    2 454 56 Any B

    3 Any 56 451 C

    I load this into a recordset and loop through it populating variables Col1_Value, Col2_Value, Col3_Value and NewCol

    The condition in my conditional split is

    (Col1 == @[User::Col1_Value] || @[User::Col1_Value] == "Any") && (Col2 == @[User::Col2_Value] || @[User::Col2_Value] == "Any") && (Col3 == @[User::Col3_Value] || @[User::Col3_Value] == "Any")

    to be sent to the valid record output.

    On the first iteration records where Col2 == 56 are found, second where Col1 == 454 and last where Col3 == 451.

    A derived column uses the last value to add the results to the dataset.

    Working just as I wanted !!!

    To complete it, the source file will be loaded to a holding file which is the source for the dataflow. InvalidRows on each iteration will be sent to an InvalidRecords file. After the dataflow within the loop I will then replace the holding/source file with the latest set of invalid records, thus removing from the source for the dataflow the records which passed the latest test.

    Job Done :hehe:

  • Nice work, well done!

    You asked for suggestions ...

    I would have loaded the condition data into an array rather than named variables. It's much easier in code to loop round an array and there will be no worries about hitting limits imposed by the number of variables you have declared.


Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply