T-SQL insert using SSIS Data Pump

  • Federico Iori

    SSCommitted

    Points: 1625

    Comments posted to this topic are about the item T-SQL insert using SSIS Data Pump

  • nick.mcdermaid

    Hall of Fame

    Points: 3779

    I'd be interested in a performance comparison with a simple INSERT INTO / SELECT solution which does not use SSIS.

  • Federico Iori

    SSCommitted

    Points: 1625

    You can try by yourself to compare .. as I have written, Insert Into is fully logged operations, requires more restrictive lock on target table and makes heavy use of transaction logs ..

    If you insert into an empty table, you can use the hint with(tablock) and the insert will be, under certain conditions, minimally logged, but it puts an X lock on the table to insert into , while OleDb Insert requires only an IS lock and makes better use of transaction logs because commits inserted rows in batch..

  • nick.mcdermaid

    Hall of Fame

    Points: 3779

    I probably won't have the opportunity to compare myself - thats why I read sites like this. But if I do have an insert performance issue, I might consider SSIS as a solution.

    Perhaps you could also consider using the SQL Server destination instead of the OLE DB destination in appropriate cases for even mor performance.

  • Federico Iori

    SSCommitted

    Points: 1625

    It depends how many rows and how many bytes per row you are going to insert.

    If you start making DWH / ETL with SqlServer, you will probably face this kind of issues ..

  • philcart

    SSC-Forever

    Points: 47713

    Sorry you lost me on a few fronts.

    1) you reference another article stating, "The method SSIS Data Pump represents the most effective way to load large amount of data into a SQL Server database because the SSIS Data Pump makes optimal usage of the transaction logs, parallelism, locking, and pipelining. A more detailed explanation is in the referenced article.". Yet I see no such detailed explanation in the referenced article.

    2) you call xp_cmdshell to execute your package. This won't be usable on systems that dis-allow usage of xp_cmdshell.

    3) your xp_cmdshell calls are executing a bunch of commands you can do in SSIS with the FileSystem task, or at worst the Script Task.

    --------------------
    Colt 45 - the original point and click interface

  • Peter Pirker

    Ten Centuries

    Points: 1079

    Excellent article.

    When somebody gets around to writing the VB code in C#, could you pls. post it?

  • Luke C

    Right there with Babe

    Points: 790

    Nice article!

    Luke C
    MCSE: Data Platform, MCP, MCTS, MCITP - Database Administrator & Database Developer

  • Federico Iori

    SSCommitted

    Points: 1625

    philcart (2/5/2013)


    Sorry you lost me on a few fronts.

    1) you reference another article stating, "The method SSIS Data Pump represents the most effective way to load large amount of data into a SQL Server database because the SSIS Data Pump makes optimal usage of the transaction logs, parallelism, locking, and pipelining. A more detailed explanation is in the referenced article.". Yet I see no such detailed explanation in the referenced article.

    You are right , there is no such detailed explanation.

    Purpose of my article was not explaining why SSIS Data Pump is the best method.

    You can give a look to :

    http://blog.staticvoid.co.nz/2012/8/17/mssql_and_large_insert_statements

    and note that SSIS data pump is nothing else than a pipelined bulk copy.

    2) you call xp_cmdshell to execute your package. This won't be usable on systems that dis-allow usage of xp_cmdshell.

    How could I launch from T-SQL a SSIS package ? The only way I know is using xp_cmdshell.. purpose of the article was a T-SQL stored procedure able to launch a SSIS package, and this requires xp_cmdshell

    3) your xp_cmdshell calls are executing a bunch of commands you can do in SSIS with the FileSystem task, or at worst the Script Task.

    Yes, some of xp_cmdshell calls could be moved into BULK_TRANSFER_DATA.dtsx

  • hdavidson88

    SSC Rookie

    Points: 47

    Nice article and I picked up some information I could use in our business. We have different applications that need client data dispersed throughout various locations, which have table table and column names which are not the same as our source database. Our approach was to store the application name, source table name, destination table name, source connection string, destination connection string, source column name, destination column name into a single SQL table. We then call the dynamic package passing the application that we want to load up and the package loops through this mapping table dynamically creating the connections and table and column mappings. Which works fine for table loads, I was looking for information on how to use stored procedures for the data flow task, to update data on destination sources.

  • Sean Bowden

    Old Hand

    Points: 359

    Federico Iori (2/5/2013)

    How could I launch from T-SQL a SSIS package ? The only way I know is using xp_cmdshell.. purpose of the article was a T-SQL stored procedure able to launch a SSIS package, and this requires xp_cmdshell

    We use SQL Agent to launch and manage our SSIS packages, it's pretty trivial and easy to implement (and you could easily make it dynamic and adapt into your system).

  • nick.mcdermaid

    Hall of Fame

    Points: 3779

    SQL Agent is tricky because:

    1. If you have only one SQL Agent job being your 'host' you can't run mutliple jobs in parallel

    2. If you want to have a non power-user trigger a SQL Agent job from some kind of client you need quite a bit of custom plumbing to get it all to work

    Here's another idea: use BULK INSERT instead of SSIS.

    But anyway all of these various methods of loading data have already been covered ad nauseum elsewhere.

  • Federico Iori

    SSCommitted

    Points: 1625

    BULK INSERT on Windows OS is not pipelined : first you need to save all to a file, then Bulk Insert from the file.

    SSIS use C# to achieve pipelining .

  • philcart

    SSC-Forever

    Points: 47713

    nick.mcdermaid (2/6/2013)


    SQL Agent is tricky because:

    1. If you have only one SQL Agent job being your 'host' you can't run mutliple jobs in parallel

    2. If you want to have a non power-user trigger a SQL Agent job from some kind of client you need quite a bit of custom plumbing to get it all to work

    Here's another idea: use BULK INSERT instead of SSIS.

    But anyway all of these various methods of loading data have already been covered ad nauseum elsewhere.

    We load end-of-day transaction files from multiple banks using the same package. We have multiple SQL Agent jobs and use the /config parameter to provide a different config file for each bank. Package has no problems executing in parallel.

    As for non-power users managing the jobs, that's done from a web-page that calls sp_start_job. The app pool identity of the site has the permissions required to manage the jobs.

    Federico Iori (2/6/2013)


    philcart (2/5/2013)


    2) you call xp_cmdshell to execute your package. This won't be usable on systems that dis-allow usage of xp_cmdshell.

    How could I launch from T-SQL a SSIS package ? The only way I know is using xp_cmdshell.. purpose of the article was a T-SQL stored procedure able to launch a SSIS package, and this requires xp_cmdshell

    That is precisely the wrong approach to take. Forget the gaping security implications of having to keep xp_cmdshell enabled for the whole server. Why on earth would you contemplate launching an executable within the memory space of the Database Engine, consume it's resources and possibly adversely affect load performance?

    --------------------
    Colt 45 - the original point and click interface

  • nick.mcdermaid

    Hall of Fame

    Points: 3779

    We load end-of-day transaction files from multiple banks using the same package. We have multiple SQL Agent jobs and use the /config parameter to provide a different config file for each bank. Package has no problems executing in parallel.

    As I said if you only have one SQL Agent job there is a problem, as you have many there is not a problem

    As for non-power users managing the jobs, that's done from a web-page that calls sp_start_job. The app pool identity of the site has the permissions required to manage the jobs.

    I would call building a web app 'custom plumbing'

    Anyway I don't think we disagree here. When we get down to specifics every solution has it's advantages and disadvantages.

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply