SSIS - Guidelines for building efficient pipelines

  • Can you suggest various ways in which efficient fast pipelines can be built in SSIS?

  • What's your opinion about it?

  • The first thing I'd suggest for future rerunnability is to break down the work that needs to be accomplished into steps such that each step could be rerunnable and be its own SSIS package.  There will undoubtedly be times you have to rerun part of a process.  If you have a SQL Agent job that calls several small SSIS packages you can choose where in the process you start from but if you have one big monolithic SSIS package you can't really control where it starts very easily.

    To address speed, there are some general practices that I believe should be followed:
    -  Only separate data sources if they are physically different (I saw someone once essentially rewriting queries for tables within the same database using SSIS transformations and it was awful)

    -  Focus more on the Extract and Load operations as much as possible, transformations within a data flow tend to slow things down considerably.  It's often better to load the data as is then transform it within the database with regular T-SQL

    -  OLE DB tends to be faster than ADO.NET as a data source or destination.  ADO.NET will usually give you better error messages when things go wrong, but there are some cases where OLE DB simply outperforms it.

    -  Make sure to define a flow for your failed destination rows, so that you only need to rerun the ones that failed and not the whole batch.

    -  Logging may not seem important during initial development and testing, but once the package is automated you will want to be able to review a text file or log table to see all the details of what happened.  SQL Agent may cut off the output.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply