How to make a CASE STATEMENT (Case when ....then...) in SSIS

  • Here is my problem : I want to do a REAL "Case Statement" in my code (not only a single If).

    I want to do (it is a theoritically example) :

    CASE WHEN "MyValue" = 1 THEN "First"

    WHEN "MyValue" = 2 THEN "Second"

    WHEN "MyValue" = 3 THEN "Third"

    ELSE "Bah"

    END AS MyOutput

    You get it? A normal SQL "Case Statement" in SSIS tool. However, my source is a FILE, and I don't want to INSERT for that (and then be able to have an OLE_DB source with a SELECT).

    I found 2 ways to have my case statement working :

    1-Derived column

    Using the not trivial at all "Derived column". I really don't like this because of the complexity for the reader (and the one who want to modify the script). I would be something like :

    MyValue==1?"First":MyValue==2?"Second":MyValue==3?"Third":"Bah"))))

    It is not normal having a fear of getting more "case". Imagine this statement with 10 CASE WHEN.

    2-Script transformation

    The other way is using the script Transformation. It is the most "clear to read". To achieve my initial CASE requirement, the code would look like:

    Select case(row.MyValue)

    case 1

    row.MyOutput = "First"

    case 2

    row.MyOutput = "Second"

    case 3

    row.MyOutput = "Third"

    else

    row.MyOutput = "Bah"

    End Select

    Good! This is the less ugly! However, YOU ARE PLAYING IN .NET ! I have to teach SSIS to ETL beasts (that already know Data Stage or Informatika). They laugh when I say that they will be learning .NET ! Worst, nothing are set to help the programmers to find their way out with .NET. When you open the "script transformation", nothing to help you find "input", "Output" and "commonly used statement".

    Am I missing something or there is no easy way to do a poor little "SELECT CASE...WHEN...THEN..." with multiple test in SSIS?

    Simon L-Deslauriers

  • First, to handle your simple example, you could create a table that had "MyValue" and "MyOutput" as columns and simply use a lookup component. Not only would this be easier than either of your solutions, but it would allow you to handle additional cases without changing your package.

    Now, assuming you have a far more complicated CASE, another option would be to use a conditional split. Handle the case in the conditionals and use a simple derived column on each output to add the "MyOutput" and then finally use a UNION ALL to put it all back together. Although this is a lot more components, it is not going to add a lot of overhead and will be really easy to follow in the data flow.

    Now, the unfortunate reality for your situation is that ETL developers using SSIS will eventually need to dive into .Net at least at a basic level. The script component is too important to simply try to avoid all of the time. Anyone complaining that it is too hard to learn needs to get a reality check. Part of a developer's job is to learn to use new technology. So, you may really want to use this as an opportunity to get your people on-board - bring in a .Net trainer for a basic 1-day class for everyone.

  • You might want to use a Lookup transform.

    Instead of putting the logic into a case statement you can create a table with MyValue and MyOutput columns. Then run the output from your file into the lookup, join the two data sets on MyValue and return MyOutput from the reference table.

    To handle the “ELSE Bah” condition (when there is no match in the reference table) you can set the lookup to ignore errors. Then add a derived column task after the lookup with something like: ISNULL(MyOutput) ? “Bah”: MyOutput.

    This gives you the advantage that when you add or change MyValue conditions you don’t have to touch the code you can just insert/update a reference table.

  • Conditional Split is way to go here. It is SSIS's CASE statement.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Oops, Michael added his post before I refreshed. That's what happens when work gets in the way of forum posting! 😀

  • Eric Klovning (1/16/2009)


    Oops, Michael added his post before I refreshed. That's what happens when work gets in the way of forum posting! 😀

    I beat you and John to it today.

  • Funny. I didn't see Eric's post for the same refresh reason. I did see Michael's but I just wanted to second the Conditional Split transform.

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • eheh, who's first who's last to post! 😉

    So, for the SSIS community here, should we say to everyone to use the conditionnal split for a CASE Statement?? Is it the best way?

    SELECT CASE with conditionnal split:

    Can you be more precise on how to to SELECT CASE with conditionnal split (+derived columsn for each output +union all)?

    Sincerly, I think is is a little bit too much component for a soo little requirement. Correct me if i am wrong : If i have 10 case statement (for one input columns), I will have 10 "derived columns" re-united again in an Union all?

    What if i have 3 sources columns being "select cased" + 1 lookup?

    SELECT CASE using Lookup transfo and some reference table

    About the lookup table, it is a good idea. If the clients are ok with creating some "reference table". However, if the customer requirement change slightly in time, the lookup table wont be able to adapt to this :

    case when MyInput="1" then ...

    when MyInput="2" and MyInput02="ok" then...

    when MyInput="2" and MyInput02="hm" then...

    when MyInput="2" and MyInput02="no" then...

    when myInput="3" then ....

    .....

    Sorry to be so picky (?) but I really want to create some "best practice" for my student.

    In good old SQL, it would fit in ONE little script. In SSIS, it is a complex ETL...daah

    Anyway, to get to the bottom of this, I was firstly wondering if I was missing something to achieve a case statement in SSIS. For the moment, the "Script Transformation" is the cleanest and flexible way to do it.

    .Net it will be!

    Thanks folks!

    P.S. Does Microsoft is planning to add some usefull and day-to-day used components like this one in future release of SSIS?

  • If you can use a lookup - use a lookup. Remember, you don't need an actual table for a lookup - your query could simply be a bunch of select statements unioned together:

    [font="Courier New"]SELECT 1 AS MyVal, 'Test' AS MyDesc

    UNION ALL SELECT 2 AS MyVal, 'Test2' AS MyDesc[/font]

    Having a table is nice because you would not have to open the package to add a new case value.

    Now, your new example is still possible with a lookup because you can join on two fields (MyInput and Myinput02) to get your value. Where you would have trouble with a lookup is when you have something like:

    [font="Courier New"]CASE

    WHEN MyInput="1" THEN "A"

    WHEN MySecondInput="1" THEN "B"

    WHEN MyInput="2" AND MyColor="Red" THEN "C"

    ELSE "F"

    END[/font]

    In these situations, you can get into something that needs to be hard-coded into your package.

    Typically, I would recommend going to a script component - especially if the number of CASE items is pretty high. I can see some benefit to using a CONDITIONAL SPLIT - mostly because it puts the logic clearly in the data flow so it is really easy to follow.

  • You could also use multiple derived column components to keep your expression simple. Put each case into it's own derived column and put them in the correct order.

    I cannot see MS doing anything else for a CASE statement. This is a case in which there are so many ways to do it already that creating a component specifically for it is probably not worthwhile.

  • I had to convert this to ssis:

    WHEN CAST(T1.[ORIG_DT] as datetime) >= T.[SETTLE_dt] THEN '1'

    WHEN CAST(T1.[ISSUE_DT] as datetime)>= T.[settle_dt] THEN '3'

    WHEN CAST(T1.[ISSUE_DT] as datetime) >= T.[Output_dt] AND T1.[ISS_STK] = '1' THEN '4'

    ELSE NULL

    END AS [offering_type_cd]

    I did the following:

    1) conditional split

    2) 4 derived columns

    3) union all

    The part that was confusing was that in the "union all" all of the other columns in the dataset showed up. I thought only the derived column (OfferingTypeCode) would appear in the union all.

    Also, I wasn't sure how to get NULL as one of the conditions. Currently, in the conditional split the last order I have is condition 1==0 and created Output OfferingTypeCode5. The expression in the derived column OfferingTypeCode5 is NULL(DT_I4). Is this correct?

    Thanks!

  • Do I need to provide more information? haven't received a reply. Thanks.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply