Data migration using SSIS

  • Hi All ,

    I am a newbie in SSIS. I am creating a SSIS package to migrate the data from MYSQL to SQLServer 2008 .

    I am having a small doubt that for data to be migrated from multiple tables to corresponding tables in target database do i have to create as many number of Source - destination pairs in my data flow or there is some other way round to attach multiple destination task items to a source item in which i have pulled data from all tables .

    Please guide me as i looks a bit weird to have 'n' pairs of source - destination task items for transferring data from 'n' tables also can we specify the order of execution of those source-destination task items ?

    Regards

    chitranjan

  • If I understand what you are asking, I think you have tableA, tableB, tableC, etc in your MySql database, and you want to move that data into tableA, tableB, tableC, etc in your new SQL Server 2008 database.

    If that is the case, you would have a MySql source and a SQL Server Destination for each of your tables. This would allow you to map your data types and convert as needed between the 2 platforms. If you put all of them in the same data flow, you will not have control of the order in which SSIS decides to run them. If you need to control the load order, you would create multiple data flow task in your control flow and the connect them with OnSuccess precedence constraints. So you could have your first data flow task load tableA & tableB and have your 2nd data flow task load TableC & TableD. That is one of the beauties of SSIS.

    Before you start designing your SSIS package, I would make sure you have your new database schema fully designed and built in SQL Server. For one, you will need it in place in order to use it as a destination in your SSIS package, and secondly, it will help you to identify what changes you may have to make to your data inside your SSIS package.

  • Hi San

    Thanks for the reply. I am able to sequence the migration of data using sequence container and Onsucess precedence constraint.It really helped to avoid errors in case of Foreign Key referencing a Master table.

    I am amazed with the speed with which data is migrated using SSIS Packages

    One strange thing i noticed that while using a flatfile destination task item to redirect the table rows to some flat file in case some error occurs. I was trying to use same destination for multiple script components but it was not allowing me to use same destination for multiple script components and i had to add separate destination task item for each table.

  • Or you can even build each source-destination as individual SSIS packages and call these from a Main SSIS package as 'Execute Package Task' this way it's more cleaner and can be managed better.

    Thanks,

    Amol

    Amol Naik

  • Not sure of all the features of MySQL but it may have an import wizard that you would execute after you had your tables in place. Also don't forget to check that your KEY constraints are consistent across both databases. Especially make sure they are being enforced on your source tables.

    ----------------------------------------------------

  • chitranjan.ranga (11/22/2009)

    do i have to create as many number of Source - destination pairs in my data flow or there is some other way round to attach multiple destination task items to a source item in which i have pulled data from all tables?

    If you have a series of simple table-to-table imports, the best approach is multiple source-destination pairs. If you really have a single source, say a complex multi-table query, you can use a multicast or conditional split transform to branch the data flow out to multiple destinations.

  • Is this something you are doing once or often. If it is once you might look at the Import/Export wizard and let it do much of the heavy lifting. I do see some potential order problems but if you apply foreign keys after the data copy it won't matter.

    Just a thought.. Also you can probably same the package and tweak it up some..

    CEWII

  • You can also use BCP in/out to transfer data if that's possible. BCP though is not that easy to setup but is usually the faster means of loading data.

    You can still SSIS and use Execute Process Task that fires the bcp in/out commands in a For Loop container.

    Thanks,

    Amol

    Amol Naik

  • If you can create a linked server to the MySQL source, you can skip SSIS altogether and just write INSERT/SELECT queries in T-SQL. Then you can put your whole import process into a stored procedure.

    SSIS gets more useful as you add more complex processing during the transfer, but for simple table loads the INSERT/SELECT approach is simple and easier to maintain.

  • But wouldn't moving large amounts of data over linked server hurt performance? SSIS or BCP would still be efficient. I think the pipe is very limited for data transfer on a linked server.

    Correct me if i am wrong.

    Thanks,

    Amol

    Amol Naik

  • Worrying about linked server performance without knowing what data volumes are involved is probably an example of premature optimization. They perform very well in many situations. My only bad experiences with linked servers were with horrible older drivers for Teradata and Progress databases.

    Using SSIS to transfer the data is probably safe for beginners. There are ways to make it run faster by correctly setting the table lock and constraint options, running the package directly on the SQL Server to avoid extra network traffic, and figuring out how to get multiple data flows to run in parallel.

    The INSERT/SELECT method is an easy and safe method for beginners, unless he neglected to mention that he's importing millions of rows and will fill up the log drives or create deadlocks.

    The BCP method is a little harder to master than the others, and I'm not sure how to get it to load directly from MySQL. It will be very fast if you know how to get it to run with minimal logging, but you also have to know whether you need to run a full backup afterward to maintain the transaction log chain.

    I would not recommend BCP to a beginner who's asking how to load some tables from another system. If they get it working some other way and then ask how to speed it up, then it might come up.

  • Quite frankly I probably wouldn't recommend BCP for experienced developers.

    Linked server is ok for fairly small amounts of data but if we are talking more than a couple 100,000 rows total, I would probably stay away from that as well due to the issues already mentioned.

    I come back to the question I asked before and don't think I got an answer for. Is this a one-shot or is it something you will be doing repeatedly??

    CEWII

  • HI ,

    Thanks a lot guys for the loads of suggestions .. for a newbie like me it can't be better than this to get to hear from lot of people..

    well coming on the point... i also looked into the Linked server option but with it i think we are not getting the flexibility of column by column mapping or other transformations in column data types if we want ....so i went with traditional source-destination approach

    One small questions for all you biggies there ..... 🙂

    Since in destination task item i am redirecting the error causing rows to a flat file ... now in my log file i am not able to see the actual error description which causes the row not to be migrated so we have to manually go to flat file and do the analysis of failed rows.. can somebody tell if i have enabled the "Redirect the row" option instead of "Fail Component" in OnError section of destination task item .. how can i catch hold of actual error description... ?

  • HI There,

    just a small doubt ... i am having a huge table with 80 million of records and overall size of table is 23GB..Which i want to migrate from MYSql to SQLServer 2008.

    I want to write a ssis package which will read this table in chunks and keep on dumping the records let say on first iteration first 1 million records to be migrated , then records from 1 million to 2 million ....to be migrated .. like this

    i am just finding a way to do it in if any of you can give me some suggestion it will be very beneficial :-)... please reply back with your valuable suggestions ..... thanks in advance ....

  • Use the appropriate batch size parameter in the OLEDB Destination component. In this case it will be 1 million. Also set the Commit size of 1 million. But i would say try and figure out the optimum values by doing some tests.

    Thanks,

    Amol

    Amol Naik

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply