Updating Rows in Incremental Loads – Level 4 of the Stairway to Integration Services Services

  • Comments posted to this topic are about the item Updating Rows in Incremental Loads – Level 4 of the Stairway to Integration Services Services

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Hi Andy,

    Thanks a lot for detailed discussion.

    But my concern in this implementation is related to slowness when the number of records are more. Unnecessary, this package will compare all the record in source table to all the records in destination table to find its state.

    My suggestion is to use staging table which will contain only the delta changes in the source table and comparison should be done in delta records.

  • Similar question to dheeraj.marwaha, although not the same. Had to do this recently for a process that changed, and after building it using lookups and running into trouble because it wasn't set-based, I'm confused.

    In short, if we're going to identify the add/change/delete records then do set-based updates from a staging table, why would we EVER want to use lookups in SSIS? Why not just grab the inbound file and load straight to a staging table, then compare in SQL? No need to write expressions, process the file to identify issues, etc?

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • I do this sort of operation fairly frequently and just looking at the SSIS package for even the simple case in this example sets me on edge. There are plenty of tricks for doing this from a staging table, like comparing checksums rather than individual fields. Also our processes identify that a records needs updating and they report against what was updated on a field by field basis, including previous and new values for just the changed fields. I really don't see how you would do that in SSIS while maintaining any performance.

  • All of the above comments boil down to: Should we use an ELT or ETL approach?

    ETL being what SSIS is designed to do - transform the data in a stream in the ETL tool.

    ELT being how DTS used to work - load the data into staging tables into the database and do all the work in the database.

    Its only worth using a ETL approach if we already know the state of the records in the source. Which in turn requires triggers and logs etc. in the source.

    The RBAR update constraint basically means you need to make this a ELT approach. Then you may as well make the entire thing ELT rather than having to contend with two different approaches.

    Thats my thoughts.

  • ddriver (10/12/2011)


    I do this sort of operation fairly frequently and just looking at the SSIS package for even the simple case in this example sets me on edge. There are plenty of tricks for doing this from a staging table, like comparing checksums rather than individual fields. Also our processes identify that a records needs updating and they report against what was updated on a field by field basis, including previous and new values for just the changed fields. I really don't see how you would do that in SSIS while maintaining any performance.

    I agree with every you say here, particularly the checksum method. I would add though that for small, and always will be small update sets, the RBAR method in SSIS is useful and quick to implement. I usually use the SSIS RBAR for systme monitoring type process such as sweeping through jobs on all servers looking for failure times or abnormal durations, where change logging is not a requirement.

  • Hi Andy, thanks for the great stairway!

    I'm having a problem with the condition in the Conditional Split. I copied it exactly as you have it:

    (FirstName != LkUp_FirstName) || ((ISNULL(MiddleName) ? “Humperdinck” : MiddleName) != (ISNULL(LkUp_MiddleName) ? “Humperdinck” : LkUp_MiddleName)) || (LastName != LkUp_LastName)

    But I get this error:

    Error at Data Flow Task [Conditional Split [104]]: Attempt to parse the expression "(FirstName != LkUp_FirstName) || ((ISNULL(MiddleName) ? “Humperdinck” : MiddleName) != (ISNULL(LkUp_MiddleName) ? “Humperdinck” : LkUp_MiddleName)) || (LastName != LkUp_LastName)" failed. The token " " at line number "1", character number "57" was not recognized. The expression cannot be parsed because it contains invalid elements at the location specified.

    Error at Data Flow Task [Conditional Split [104]]: Cannot parse the expression "(FirstName != LkUp_FirstName) || ((ISNULL(MiddleName) ? “Humperdinck” : MiddleName) != (ISNULL(LkUp_MiddleName) ? “Humperdinck” : LkUp_MiddleName)) || (LastName != LkUp_LastName)". The expression was not valid, or there is an out-of-memory error.

    Error at Data Flow Task [Conditional Split [104]]: The expression "(FirstName != LkUp_FirstName) || ((ISNULL(MiddleName) ? “Humperdinck” : MiddleName) != (ISNULL(LkUp_MiddleName) ? “Humperdinck” : LkUp_MiddleName)) || (LastName != LkUp_LastName)" on "output "Updated Rows" (120)" is not valid.

    Error at Data Flow Task [Conditional Split [104]]: Failed to set property "Expression" on "output "Updated Rows" (120)".

    Character number "57" is the space before the first '?'. If I remove the space, it still complains but about the '?' itself. I am running 2008R2.

    Any ideas?

  • There could be another character hiding out at character 57. My suggestion: Copy the expression to Notepad, make sure it's all on a single line, then copy from Notepad back into the Expression field.

    Hope this helps,

    Andy

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Andy,

    I did that, and it did the trick. Thanks again!

  • I had the same Problem, when copying the expression over.

    It's the quotation marks.

    (FirstName != LkUp_FirstName) || ((ISNULL(MiddleName) ? Humperdinck : MiddleName) != (ISNULL(LkUp_MiddleName) ? Humperdinck : LkUp_MiddleName)) || (LastName != LkUp_LastName)

    Delete these quotes around Humperdinck and type them back in. This should fix the problem.

    Another way like already mentioned is copying your expression over to Notepad. This will remove all formatting

  • Andy,

    I have a question about the operation and flow of records. Let's make the following change to your data.. Let's assume that the data coming in was using a super key (FirstName, LastName, Email). Then we write the data using only the PK E-Mail, but we grab all the records.

    (Lets asume we had the following data already there.. Tim, Jones, tj@aol.com. : Now the data coming in.. (Tim, Jones, Timj@aol.com), (Tim, Jones, Jonest@aol.dom), (Tim, Jones, tj@aol.com)

    I think the operation would be.. All records would go to the matching and then tj@aol.com would be dropped, since it matches. The others would update the existing record, so only the last write would exist in the database.

    My test seems to prove this... just want to make sure my logic is correct.

  • Yes, I know this is way old but I've used Stairway series before and so I'm just now going through this one as I learn SSIS for work...

    First off, thanks for the great series! It's so easy to go through step-by-step and learn the concepts as you go!

    Secondly, I am at the part where I'm trying to click the 'New' button next to the Name of the table or the view on the OLE DB Destination Editor window. The problem is, there is no New button! What's up with that?

    Thanks and keep up the good work!

    EDIT: Nm, I figured it out. I had forgotten to join the Conditional Split first. Ooops!

    EDIT2: Sorry, but something I noticed is that even though I have the tasks in the correct order:

    TRUNCATE

    |

    DATA FLOW

    |

    APPLY STAGED UPDATES

    When I view the Progress tab after running it, it appears that Truncate is running last? I could verify this by checking the StageUpdates table after the initial run. It had no records. I assumed the order of task execution was determined by the visual design. What am I missing? thanks!

  • I had been scratching my head over this for 2 days. For me it was char 56. Copying the string to Notepad and stripping out the spaces worked for me too. I had no idea hidden characters could be inserted in this way.

  • Hey Andy.. When you are working on the OLE DB Destination, why would I get the error "No column information was returned"?

    I have the update statement, just like the one I did for this training.. If I put it into SSMS and replace the question marks, it updates the table just fine.

    I looked at the data flow path editor and it shows all the columns from the table read, and the column added from the lookup (PK for the table to write to).

  • Hello Andy,

    Thank you for a great series and explanations with the figures.

    Very straight forward.

    My current issue is that I can't not get the parameters shown when I am setting up on the OLE DB Command. I tried using the update * set * = ? and I don't see the parameters list available. like Param_0, Param_1

    Is there a way to manually setup the parameters or do you happen to know why they do not show up?

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply