ETL from MySql

  • Hi,
    I am writing ETL routine (stored proc based) to fetch data from mysql and insert in mssql database. I am using open query/linked server to fetch data in temp table and after some processing insert to mssql tables.
    we have  few big tables, like one of our table on mysql takes 1million + rows from application daily. I suspect that millions of rows will slow down process if I will use temp table . I will use batch processing.
    but my question is about best practice or effective process. Can you please suggest what could be efficient way to process millions of rows from mysql ?

    Thanks

  • thbaig - Tuesday, November 28, 2017 10:48 AM

    Hi,
    I am writing ETL routine (stored proc based) to fetch data from mysql and insert in mssql database. I am using open query/linked server to fetch data in temp table and after some processing insert to mssql tables.
    we have  few big tables, like one of our table on mysql takes 1million + rows from application daily. I suspect that millions of rows will slow down process if I will use temp table . I will use batch processing.
    but my question is about best practice or effective process. Can you please suggest what could be efficient way to process millions of rows from mysql ?

    Thanks

    Simply put do not use a linked server to do this it's not the temp table that will slow down the process, extract the data on the mysql side to a flat file, transfer the file to someplace SQL Server has access to and load the file from there.

  • ZZartin - Tuesday, November 28, 2017 10:55 AM

    thbaig - Tuesday, November 28, 2017 10:48 AM

    Hi,
    I am writing ETL routine (stored proc based) to fetch data from mysql and insert in mssql database. I am using open query/linked server to fetch data in temp table and after some processing insert to mssql tables.
    we have  few big tables, like one of our table on mysql takes 1million + rows from application daily. I suspect that millions of rows will slow down process if I will use temp table . I will use batch processing.
    but my question is about best practice or effective process. Can you please suggest what could be efficient way to process millions of rows from mysql ?

    Thanks

    Simply put do not use a linked server to do this it's not the temp table that will slow down the process, extract the data on the mysql side to a flat file, transfer the file to someplace SQL Server has access to and load the file from there.

    Thank you. but i have to run job periodically to  load and update data on mssql, as mysql will remain primary db for applicaiton

  • thbaig - Tuesday, November 28, 2017 11:00 AM

    Thank you. but i have to run job periodically to  load and update data on mssql, as mysql will remain primary db for applicaiton

    OK, but what's the issue with the proposed solution?


  • I would use SSIS to load the data into your staging table. SSIS will handle chunking out the data so you don't have to move 1 million row in one transaction. If you want to use a linked server chunk out the data yourself in a loop.

  • Phil Parkin - Tuesday, November 28, 2017 11:39 AM

    thbaig - Tuesday, November 28, 2017 11:00 AM

    Thank you. but i have to run job periodically to  load and update data on mssql, as mysql will remain primary db for applicaiton

    OK, but what's the issue with the proposed solution?

    dependency. i will need to manage a routine at mysql to extract data then copy to network, copy to mssql and then load. also as this will be recursive job, i will need to  do whole process for e very execution that don;t seems optimal

  • thbaig - Tuesday, November 28, 2017 12:38 PM

    Phil Parkin - Tuesday, November 28, 2017 11:39 AM

    thbaig - Tuesday, November 28, 2017 11:00 AM

    Thank you. but i have to run job periodically to  load and update data on mssql, as mysql will remain primary db for applicaiton

    OK, but what's the issue with the proposed solution?

    dependency. i will need to manage a routine at mysql to extract data then copy to network, copy to mssql and then load. also as this will be recursive job, i will need to  do whole process for e very execution that don;t seems optimal

    SSIS can do everything listed, but it's over-complicating things. Joe's idea is better. Don't use a linked server.


Viewing 7 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply