Data Migration with SSIS in multiple steps

  • I am new to SSIS.

    I am planning to use SSIS for data migration. I have a big source table (record count: 35M). I would like to start the migration from old table to the new table before the actual release day. In that way, on the actual release day, the execution time is low.

    How can I do that?

    For example, today is April 03. And our actual release date is April 10. The source table has a record count of 35 Million (as of today). May be by April 10, it will grow up to 37 Million records.

    Option 1: I can run the SSIS package on April 10, and it will have to take care of 37 Million records.

    Option 2: Can I run the SSIS package on April 08 for records upto that point. and then run the package again on April 10 for the rest of the records?

    If option 2 is possible, how can I do that?

    Thanks in advance for your kind advise.

  • Depends,

    Does the data get updated on any of the existing rows?

    Does the Source table have a key column that you can use to tell if the data is in the new table or not?

    If the data does get updated that makes your job a lot tougher. Because it limits what data you can copy ahead of time because if it is updated after you copy it that what do you do?

    Have you practiced how long it takes to copy the table?

    35 Million Rows Seems like a lot but you might be surprised how long it takes to copy if you make sure your SSIS Package is using Fast load.

    Also Practicing may help you answer some other important questions like what to set the initial size of your database. This will prevent frequent autogrow events which would also slow the insert process down slightly.

    If the data does not get updated, and it does have a key column. then you certainly can copy "old" rows over ahead of time. and just copy the rest on release night. Again, Try to practice so you don't run into any gotcha's on release night.

  • Thanks you very much Ray M for your kind reply with valuable advise.

    [Ray M] Does the data get updated on any of the existing rows?

    [Allen] No, the data will NOT be updated on the existing row.

    [Ray M]Does the Source table have a key column that you can use to tell if the data is in the new table or not?

    [Allen] The source table has an identity column which I plan to use to find out whether the data is in new table or not.

    [Ray M]If the data does get updated that makes your job a lot tougher. Because it limits what data you can copy ahead of time because if it is updated after you copy it that what do you do?

    [Allen] Since the data will not get updated on the existing row, I am safe.

    [Ray M]Have you practiced how long it takes to copy the table?

    [Allen] 68 minutes.

    35 Million Rows Seems like a lot but you might be surprised how long it takes to copy if you make sure your SSIS Package is using Fast load.

    [Ray M]Also Practicing may help you answer some other important questions like what to set the initial size of your database. This will prevent frequent autogrow events which would also slow the insert process down slightly.

    If the data does not get updated, and it does have a key column. then you certainly can copy "old" rows over ahead of time. and just copy the rest on release night. Again, Try to practice so you don't run into any gotcha's on release night.

    [Allen] Thanks for the advise. I plan to do testing more on this.

    [Allen]

    Question: How do I create the package that it checks whether data has been transferred or not.

    Do I need to set some variable? Or some other thing? Thanks a million for your kind advise.

  • You can design your package in a way to figure this out.

    Use a sql task to query and stuff the resultset into a variable

    IF EXISTS (SELECT * FROM NewDatabase..Table)

    SELECT MAX(IDentityColValue) AS MaxVal FROM NewDatabase..Table

    ELSE

    SELECT 0 AS MaxVal

    Set a variable in the package for the MaxVal.

    Then in your data flow.

    Source Query will be

    SELECT <ColumnList> FROM OldDatabase..Table WHERE IdentityColValue > ?

    And set the Parameter to feed in the MaxVal Variable.

  • Also in your Destination Container make sure the Table Lock box is checked.

    You may also want to look at removing all indexes from your destination table.

    This may speed the load up. and then apply the indexes after.

    If you can test this and see if it improves the time.

    Here's a good article. Check out point 4, not all the items apply to your situation.

    http://sqlcat.com/sqlcat/b/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx

  • Thank you very much Ray M for your kind advise. I will utilize them and update you and the forum accordingly.

    Really appreciate.

    Regards, Allen.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply