Data Flow very slow versus Linked Server

  • Hi,

    I have tried the Import wizard to import data from a view on a separate SQL server. This took approximately 1 hour to load 168,261 rows.When I used a linked server and a Select into statement it took 10 seconds. This is really baffling to me, I thought SSIS would be at least equal to the linked server or a bit faster. No matter how I adjust DefaultBufferSize or BufferMaxRows the elapsed time is one hour. Can someone kindly explain this or provide a suggestion to improve performance to the SSIS data flow?

    Thanks,

    Greg

  • Update - I used the ADO data flows and it brought the time down to 2 minutes versus 1 hour. Not sure what the difference is between OLE DB and the ADO data flow but it made a huge difference in elapsed time.

    Thanks,

    Greg

  • I suspect that the performance problem is because the insert destination is committing each row as it goes. Check the insert destination- the data access mode should be something like 'Table or View - Fast Load".

  • Also, in general, an OLE DB destination will be faster than an ADO.NET destination. So, I would switch back to OLE DB and do as happycat suggests and make sure you're using fastload.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • This took approximately 1 hour to load 168,261 rows.When I used a linked server and a Select into statement it took 10 seconds.

    This should not be the case. This difference is quiet huge. Are you sure both activity is doing the same thing.

    ----------
    Ashish

  • crazy4sql (6/15/2012)


    This took approximately 1 hour to load 168,261 rows.When I used a linked server and a Select into statement it took 10 seconds.

    This should not be the case. This difference is quiet huge. Are you sure both activity is doing the same thing.

    It doesn't surprise me at all. The original SSIS package was, most likely, doing 168,261 commits, whereas the SELECT INTO using the linked server only does a single commit. That is a huge difference in the processing required. The change I suggested to the SSIS package is intended to change the default behaviour of the ole db destination transformation to be similar to the SELECT INTO - although you actually have some control on how often the it commits.

  • Your problem may be due to the way your database servers was set up and ssis performing row-by-row inserts vs mass insert using a sql query. Now what about a hybrid solution here?

    Copy your linked server query into an 'Execute SQL' task in SSIS. That way you get the best of both worlds!;-)

  • akin.akinwumi (6/15/2012)


    Copy your linked server query into an 'Execute SQL' task in SSIS. That way you get the best of both worlds!;-)

    In my opinion that is actually the worst of both worlds. You're bypassing the valuable functionality built into SSIS for data movement while still allowing SQL Server to SQL Server communication. Sorry, but it is hard to imagine a worse decision being made in this scenario :sick:

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (6/15/2012)


    akin.akinwumi (6/15/2012)


    Copy your linked server query into an 'Execute SQL' task in SSIS. That way you get the best of both worlds!;-)

    In my opinion that is actually the worst of both worlds. You're bypassing the valuable functionality built into SSIS for data movement while still allowing SQL Server to SQL Server communication. Sorry, but it is hard to imagine a worse decision being made in this scenario :sick:

    We will have to agree to disagree on that point. In some scenarios based on how the database servers are set up; a linked query will produce better performance that SSIS data flow task! So if you still intend to use the performance gain of the linked query but still require that the data flow matched with your ssis package architecture? Well, SSIS provides the functionality to do that.

    Use a 'EXECUTE SQL' task. 😉

  • akin.akinwumi (6/15/2012)


    opc.three (6/15/2012)


    akin.akinwumi (6/15/2012)


    Copy your linked server query into an 'Execute SQL' task in SSIS. That way you get the best of both worlds!;-)

    In my opinion that is actually the worst of both worlds. You're bypassing the valuable functionality built into SSIS for data movement while still allowing SQL Server to SQL Server communication. Sorry, but it is hard to imagine a worse decision being made in this scenario :sick:

    We will have to agree to disagree on that point. In some scenarios based on how the database servers are set up; a linked query will produce better performance that SSIS data flow task! So if you still intend to use the performance gain of the linked query but still require that the data flow matched with your ssis package architecture? Well, SSIS provides the functionality to do that.

    Use a 'EXECUTE SQL' task. 😉

    All things being equal, even if you could not get an SSIS package to perform as well as a Linked Server setup (which I doubt) having SSIS push a process that does SQL Server to SQL Server communications in this manner is a ridiculous design pattern. Think about the debug scenarios and the environment maintenance required to support your proposed architecture. It is in the running for my last resort option.

    You're right about two things. 1. It depends, but only because that is almost always true. 2. I disagree with you.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • opc.three (6/15/2012)


    akin.akinwumi (6/15/2012)


    opc.three (6/15/2012)


    akin.akinwumi (6/15/2012)


    Copy your linked server query into an 'Execute SQL' task in SSIS. That way you get the best of both worlds!;-)

    In my opinion that is actually the worst of both worlds. You're bypassing the valuable functionality built into SSIS for data movement while still allowing SQL Server to SQL Server communication. Sorry, but it is hard to imagine a worse decision being made in this scenario :sick:

    We will have to agree to disagree on that point. In some scenarios based on how the database servers are set up; a linked query will produce better performance that SSIS data flow task! So if you still intend to use the performance gain of the linked query but still require that the data flow matched with your ssis package architecture? Well, SSIS provides the functionality to do that.

    Use a 'EXECUTE SQL' task. 😉

    All things being equal, even if you could not get an SSIS package to perform as well as a Linked Server setup (which I doubt) having SSIS push a process that does SQL Server to SQL Server communications in this manner is a ridiculous design pattern. Think about the debug scenarios and the environment maintenance required to support your proposed architecture. It is in the running for my last resort option.

    You're right about two things. 1. It depends, but only because that is almost always true. 2. I disagree with you.

    Yes - everything does "depend" on your circumstances, skills, preformance goals...you can add many items to this list.

    Personally, I prefer to develop code that can be debugged, maintained and supported. As a consultant, my decision on how to implement code changes depending on the current and likely future skills of those who will be debugging, maintaining and supporting my code (it is not likely to be me). I certainly have customers whose skill level and main interest is not SSIS. So, for these customers, I tend to use as much t-SQL code as I can. This may not produce a solution that performs as good as it could but that is only one design constraint.

    And as far as using the power of SSIS, I have had situations with very large data transfer where having a single very large transaction was a problem. The single final comit of many millions of row put a huge load on the server which was undesirable. By adjusting the number of row in each transaction, you can reduce that high load - the net effect is a slightly high over load (with several commits instead of one) but with server load not being max-ed out for a long time which allows other user activities to continue with less disruption. You still need to handle fails and that does become more complicated. So, no free lunches with this solution either. But, then, there never is.

    Personally, I would not be all that keen on using linked servers to import large volumes of data. There are are a number of problems with linked servers (e.g. MemToLeave may be too small and cause SQL Server to crash). For low volumes, they may not be a problem.

  • happycat59 (6/16/2012)


    opc.three (6/15/2012)


    akin.akinwumi (6/15/2012)


    opc.three (6/15/2012)


    akin.akinwumi (6/15/2012)


    Copy your linked server query into an 'Execute SQL' task in SSIS. That way you get the best of both worlds!;-)

    In my opinion that is actually the worst of both worlds. You're bypassing the valuable functionality built into SSIS for data movement while still allowing SQL Server to SQL Server communication. Sorry, but it is hard to imagine a worse decision being made in this scenario :sick:

    We will have to agree to disagree on that point. In some scenarios based on how the database servers are set up; a linked query will produce better performance that SSIS data flow task! So if you still intend to use the performance gain of the linked query but still require that the data flow matched with your ssis package architecture? Well, SSIS provides the functionality to do that.

    Use a 'EXECUTE SQL' task. 😉

    All things being equal, even if you could not get an SSIS package to perform as well as a Linked Server setup (which I doubt) having SSIS push a process that does SQL Server to SQL Server communications in this manner is a ridiculous design pattern. Think about the debug scenarios and the environment maintenance required to support your proposed architecture. It is in the running for my last resort option.

    You're right about two things. 1. It depends, but only because that is almost always true. 2. I disagree with you.

    Yes - everything does "depend" on your circumstances, skills, preformance goals...you can add many items to this list.

    Personally, I prefer to develop code that can be debugged, maintained and supported. As a consultant, my decision on how to implement code changes depending on the current and likely future skills of those who will be debugging, maintaining and supporting my code (it is not likely to be me). I certainly have customers whose skill level and main interest is not SSIS. So, for these customers, I tend to use as much t-SQL code as I can. This may not produce a solution that performs as good as it could but that is only one design constraint.

    And as far as using the power of SSIS, I have had situations with very large data transfer where having a single very large transaction was a problem. The single final comit of many millions of row put a huge load on the server which was undesirable. By adjusting the number of row in each transaction, you can reduce that high load - the net effect is a slightly high over load (with several commits instead of one) but with server load not being max-ed out for a long time which allows other user activities to continue with less disruption. You still need to handle fails and that does become more complicated. So, no free lunches with this solution either. But, then, there never is.

    Personally, I would not be all that keen on using linked servers to import large volumes of data. There are are a number of problems with linked servers (e.g. MemToLeave may be too small and cause SQL Server to crash). For low volumes, they may not be a problem.

    Well said. Your comment about keeping things in T-SQL is relevant in a lot of shops. There is a balance that needs to be struck between existing skillsets that can maintain the code you leave behind and treating everything as a nail, where T-SQL is a hammer. Your comment about being able to tune the number of rows committed is also well taken. When processing large batches of data, doing a single insert-select using a Linked Server leaves no room for tuning the number of rows committed at one time without doing multiple round trips.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • There is a known issue with SSIS OLEDB Source - where using a table/view directly performs very badly.

    In addition to changing the batch size and commit size on the OLEDB destination, changing the source to use either a query or stored procedure will dramatically increase performance.

    I have been able to move processes that use linked servers to move data to SSIS packages and reduce overall processing by a lot. One process used to take 6 or more hours to pull data across a linked server. Converting to SSIS reduced that process to about 30 minutes.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • To add to the modifications Jeffrey mentioned you can also tune the SSIS package's buffer settings (DefaultBufferMaxRows and DefaultBufferMaxSize) to improve performance.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply