How to handle parallel objects sequence after conditional split for two different case

  • Hello,

    In the data flow, in the conditional split object, i have added two conditions

    Case 1: IF NOT ISNULL() THEN Update

    Case 2: IF ISNULL() THEN Insert

    after that i have added two drived column with data viewer grid each linked for each case from conditional split, to test the data.

    Problem is: its running Insert path before update, however i want to update first and then insert, to avoid unnecessary update on newly created records.

    So how to handle sequence of same object, at same level and are at same sequence in the flow.

    Thanks and Cheers.

  • At first glance you don't have a problem.. Your conditional split creates two separate dataflows, one for inserts and one for updates. Because of this you should not have the update path colliding with the insert path because they shouldn't contain the same records. This does make some assumptions about the content of your data in that you don't expect to be updating and inserting the same key in the same run. If that case is true then I'll have to re-evaluate.

    CEWII

  • Well, the purpose of giving that example was to explain the context.

    I have seen similar situations that sometime Insert path OR sometime Update path at same level takes the precedence.

    Is there any particular way to handle such situation ?

    Thanks

  • I understood the context and have done this before. In my case the record would always be either an insert OR an update in the same run, never both, given that the insert and update didn't require any precedence and I never updated and records just inserted.

    If you need to do them in the same dataflow I think there is a CodePlex project that lets you hold up execution of one path until some condition is met.

    However, you can run into other problems.. Because the default batch size is about 10,000 records you can have the case where the record appears more than once in the same batch and it thinks an insert is required more than once and that causes problems. What are you using to fill in that field that is null or not?

    CEWII

  • Thanks Elliott,

    Yes agree with your point.

    I am trying to do data archiving. Moving OLD records from Live DB to Archived DB.

    So live DB should have records only those records which matches our defined criteria e.g. not older than a year.

    All historic data should be in archive database and archived database will be updated daily in off-peak hours. Further we shall do all reporting from Archive database.

    So in SSIS i am checking with conditional split and left join

    Case 1 : if null archive db PK then Insert into archive

    Case 2 : if NOT null then update all records.

    Do you think its a right way going forward, though, it looks to mean very laborious activity to write all these things in SSIS because this is SSIS plus all update with parameter and maintenance cost of more than 100 tables will be quite high.

    Kindly give any better guide line OR which objects are better to use in such situation ?

    Thanks and Cheers.

  • MidBar (11/11/2010)


    Thanks Elliott,

    Yes agree with your point.

    I am trying to do data archiving. Moving OLD records from Live DB to Archived DB.

    So live DB should have records only those records which matches our defined criteria e.g. not older than a year.

    All historic data should be in archive database and archived database will be updated daily in off-peak hours. Further we shall do all reporting from Archive database.

    So in SSIS i am checking with conditional split and left join

    Case 1 : if null archive db PK then Insert into archive

    Case 2 : if NOT null then update all records.

    Do you think its a right way going forward, though, it looks to mean very laborious activity to write all these things in SSIS because this is SSIS plus all update with parameter and maintenance cost of more than 100 tables will be quite high.

    Kindly give any better guide line OR which objects are better to use in such situation ?

    Thanks and Cheers.

    Ok, just want to make sure I understand.. You pull data from Live to Archive, with a criteria of within the last year, you don't know if the record has been archived or updated at this point.

    Ok, first I love SSIS, it can do a lot, but I'm wondering if transactional replication might be a better fit here especially with the number of tables we are talking about.

    If transactional replication is not to be considered.. For a much smaller number of tables I would usually query the raw data and use a lookup component to query primary keys, in 2005 this is a little harder because you need to treat the error output as the insert pipeline. I typically use an OLEDB Command to update the records if there aren't a lot.

    Now with all that said, with 100 tables I'm not convinced that SSIS should do most of the work, a far easier way would be to script out the 100 tables, create a staging database on archive with those 100 tables and use SSIS to truncate the staging them fill them up with the raw data. Then use a sproc to build code dynamically that updates then inserts into the archive table. This sproc could use the information schema to determine primary keys for comparison, also it is really inefficient to do bulk updates if nothing has changed, the read operation coupled with a much smaller write operation is generally more cost effective. This is done by comparing the data in both tables and only updating when there is a difference. The sproc isn't simple but it is very doable, I wrote one for a previous employer, wish I still had the code..

    I know I threw out a lot here, but I would be happy to clarify any of it. Let me know..

    CEWII

  • Again thanks Elliott,

    Well first thing in my scenario, as i said keep the most required records in Live. So steps i want to perform here are

    1. Prepare a staging table (tblStage1) which will identify the records changed since last synchronization.

    2. Based on tblStage1, pick the relevant records and update in archive.

    2. Insert the newly inserted records from Live to Archive. This was the reason i wanted to make sure insert first, otherwise update will apply to both update and inserted records.

    3. Now everything is updated in archive, so check for any unwanted records in Live, if exists then delete them. So here, can we use transactional replication? No, i think...

    So if i need to script everything in SP then i really don't see a reason of using SSIS for more than 100 tables. Because if i have to create update scripts with paramter (?) for all tables then at that points i should have completed 50-70% work already.

    So what to do .... Does SSIS is not meant for complex large databases ?

    Confused...

    Thanks and Cheers.

  • MidBar (11/13/2010)


    Again thanks Elliott,

    Well first thing in my scenario, as i said keep the most required records in Live. So steps i want to perform here are

    1. Prepare a staging table (tblStage1) which will identify the records changed since last synchronization.

    2. Based on tblStage1, pick the relevant records and update in archive.

    2. Insert the newly inserted records from Live to Archive. This was the reason i wanted to make sure insert first, otherwise update will apply to both update and inserted records.

    3. Now everything is updated in archive, so check for any unwanted records in Live, if exists then delete them. So here, can we use transactional replication? No, i think...

    So if i need to script everything in SP then i really don't see a reason of using SSIS for more than 100 tables. Because if i have to create update scripts with paramter (?) for all tables then at that points i should have completed 50-70% work already.

    So what to do .... Does SSIS is not meant for complex large databases ?

    Confused...

    Thanks and Cheers.

    I may not have been clear. First SSIS is good for complex and large databases but with that said that doesn't mean there isn't a lot of work required to support large complex processes..

    Ok, the staging tables are on the archive side they are EXACT duplicates (structurally) of the base tables, their purpose is to hold 1 year worth of data. Now, before I go further, is there a reliable way in each record to tell if it has changed such as a create data and/or a last mod date that is ALWAYS updated when the record is touched? If so that makes the process easier, we will continue under the assumption that such a field does NOT exist..

    I had a process that needed about 37 tables and it was really simple, I created four data-flows, the first two contained 16 OLEDB sources and 16 OLEDB destinations, each source had a very simple query roughly equivalent to "SELECT * FROM dbo.tablename". Yours would need a little more logic to just pull a year. But those sources connected to the destination and the mapping was straight through. I hand did all those, it took me about 2 hours. On the other 2 they were roughly the same just fewer sources/destinations and this was because they were larger tables that dwarfed the others.

    I had a process that only needed to know what table I wanted to sync and it generated on the fly a complete update then insert command and then ran it. I built a very simple sproc to sit on top of it that cycled through the list of staged tables. This was the actual sync. The sproc that generated the code knew nothing about the tables other than their names and used the SQL metadata to build the commands.

    I spoke about transactional replication, you can setup that and it will capture all the changes to each record and apply them to your archive with very little intervention from you. You would replicate inserts and updates but not deletes. Once it is setup it largely just runs. However, only consider this if both servers are in the same datacenter, if they are accross the coutry or something over slow links, just don't, it is painful in the extreme in that case..

    I might suggest breaking the insert/update process out of the delete process, but thats because I like to play a little paranoid with my data and this makes it harder to lose data.

    Please let me know if I have gotten you ontrack or if there is other options you'd like to discuss.

    CEWII

  • thanks Elliott,

    I am getting a lot from your experience.

    First thing, in replication, can we really stop just deletion without effecting replication overall? is there any specific property/method to handle this ?

    Now the main thing

    Yes, i can genreate DMLs using information_schema and meta data, but i wonder that how we can dynamically handle source and destination transformation and on which logic we shall map columns dynamically?

    is that really possible to handle those objects dynamically but putting a top level SP to loop-through table by table. Moreover, how to maintain parent-child table hierarchy?

    I am really struggling to learn this dynamic behaviour in SSIS.

    If you really have some sample script/solution to explain this then that would be of great help.

    Many thanks and Kind regards.

    Cheers

  • Hello,

    I hope you are fine.

    Can you continue replying to this? as i believe no one else will touch this anymore.

    Thanks and Cheers.

  • SSIS does not natively handle dynamic column mapping in dataflows, if that is what you are referring to?

    If you haven't even tried to resolve your issue, please don't expect the hard-working volunteers here to waste their time providing links to answers which you could easily have found yourself.

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply