SSIS Data flow with Case statements

  • Hi

    I would like to know how I can add the following sample code to my Source data on Data Flow on SSIS, or what other options there are. The main issue is time as we have talking about 100's of millions of rows

    select Sample,

    CASE

    WHEN Sample IS NULL

    THEN NULL

    WHEN SUBSTRING(Sample, 1, 6) IS NULL

    THEN ' '

    ELSE RTRIM(SUBSTRING(Sample, 1, 6))

    END AS [Sample_1_6]

    from TestTable

    what I have done at this stage is just to Create a SQL task with a

    Insert into

    INSERT INTO [dbo].[TestTable1]

    ([Sample]

    ,[Sample_1_6])

    select Sample,

    CASE WHEN Sample IS NULL =THEN NULL

    WHEN SUBSTRING(Sample, 1, 6) IS NULL THEN ' '

    ELSE RTRIM(SUBSTRING(Sample, 1, 6))

    END AS [Sample_1_6]

    from TestTable

    If there is a way adding this to a dataflow so I van use fast load that would really be the best solution. I know there are derived columns, but would this really be faster than the straight insert into in a SQL Task? If this is the way to go what is the code I would use in the derived column or any other option

  • Your entire case statement can be rewritten as RTRIM(SUBSTRING(Sample, 1, 6)).

    This will return NULL when Sample is NULL. SUBSTRING on a non-null string will never return NULL, so you can ignore that branch of the case statement.

    So if no CASE is needed, I would just keep your INSERT statement as-is. SSIS will just have more network overhead than the pure T-SQL solution. (unless SSIS is on the same box of course)

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Would that also work for this Case statement

    CASE

    WHEN Sample IS NULL

    THEN NULL

    WHEN SUBSTRING(REPLICATE('0', 9 - LEN(Sample )) + CAST(Sample AS VARCHAR(9)), 1, 3) IS NULL

    THEN 0

    ELSE RTRIM(SUBSTRING(REPLICATE('0', 9 - LEN(Sample )) + CAST(Sample AS VARCHAR(9)), 1, 3))

    END AS [Sample _1_3]

    We have a lot of these types of scripts in triggers so if you can give me the script to simplify this. At times we populate it with '' other times with 0 as with this specific example

    I am currently doing an ETL where a long method was used running everything through a cursor to populate about 5 or 6 fields like this and the other example you helped with. If I can find a way to get this straight into a fast load that would really be super. Thanks sofar for the help

  • This CASE statement does the following:

    * if sample is NULL, return NULL

    * if sample has a length bigger than 9, return 0

    * else return first 3 characters of "0-prefixed sample with a length of 9"

    Is this the correct business logic?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I see you point but this data is weird and we do have substrings for example for 4,6 or 6,20, so some of the values may in fact be nulls

    So for example the field has a length of 30 characters, say in one field the length is only 10

    the substring 1,3 would not be null

    the substing 2,15 is totally different,

    In the first example modified a bit,

    say your data looks like this

    row 1 - 1234567890

    row 2 - null

    row 3 - 123456789012

    running the select below would give you the following values

    row 1 - ''

    row 2 - null

    row 3 - '12'

    select Sample,

    CASE

    WHEN Sample IS NULL

    THEN NULL

    WHEN SUBSTRING(Sample, 11, 2) IS NULL

    THEN ' '

    ELSE RTRIM(SUBSTRING(Sample, 11, 2))

    END AS [Sample_11_2]

    from TestTable

    Hope this makes sense

  • The issue with this really strange data is that I need to get a way to do a fast load of the data if this is going to be possible, with quite a number of these fields, my options are doing a fast load with a trigger on the destination table which I think is going to be very slow, considering there are over 500,000,000 rows to update, at this stage I am doing it with a insert into with the case statements but it's also running for a while, not as much as the previous method currently in use which runs it through a cursor to update.. Off the topic though part of the ETL is changing data types, field names etc so I create a new table with the correct field names and data types and then run a fast load on tables without data updates, and now I am stuck on those where we have some fields that we want to autopopulate using the Case statements

  • OK.

    Keep in mind though that in the second CASE example, you will get a NULL when the length of sample is bigger than 9.

    Personally I would check the length of sample instead of taking the substring and then verifying if it is NULL or not. This makes the intention more clear.

    The first branch of the CASE can always be dropped. You don't need to check for NULL and then return NULL.

    I wouldn't convert this to SSIS. Chances are this will be as least as fast in TSQL.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Thanks, makes sense what you say, I will though then keep the insert into in a SQL Task, as we are adding all the other tables via SSIS, the performance is almost similar for this select, the fast load on the other tables are great though and then big improvement why I keep in in SSIS is cutting out all the other steps, renaming fields, changing data types etc, the run times are roughly similar between SSIS and SQL, but I am working on settings on SSIS which is gradually improving performance on SSIS compared to SQL. Our large Clients ETL's are currently taking about 2-4 weeks, so by adding it to SSIS I can also running different tasks at the same time. I think this is the greatest thing with SSIS at this point unless I can get really increases in performance on SSIS which would be great.

    Any extra tips etc would be great on getting performance on SQL and or SSIS. The tables are really huge, one table had 1,800,000,000 rows.

    Thanks for the advice, will definitely try implementing as much as possible.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply