Pushing inserts into SQL server with C# ?

  • I want to insert about 5000 (typically, can be 10K+ at times) rows into SQL server at a time. My insert statements are generated by a C# program.

    One way could be by simply generating a big string full of these SQL statements and then actually making SQL server execute them using `SqlCommand.ExecuteNonQuery();` But there is a risk that SQL server might not be able to execute a big statement like that.

    Is there a way I could sort of push the SQL statements to some kind of "SQL server buffer" (if there is one) and then finally execute them all ?

    Thanks.

  • Perhaps look in to doing a bulk insert call?

  • To make a decision you must answer some questions: first, there are dependencies between rows in that big chunk of data?

    Second, what do you want to do if a insert fail? Rollback all the inserts, just keep going and create some sort of alert/logging?

    Third, there are performance issues? can I run a batch overnight? That data must be persisted ASAP?

    I don't thing a big blob/string is a good option because you ill need to parse/deserialize that (sql side) and, in general, putting that kind of logic in a SP(or whatever) is not good for performance, reliability and maintenance.

    Search for transaction control and (maybe) stage tables and (maybe) integration strategies.

    You can end doing something like:

    procedure insert raw data

    1- begin transaction

    2- loop(insert in stage table)

    3- commit tran

    procedure process stage table

    1- select no processed data from stage

    2- loop in

    3- begin transaction

    4- process and persist data

    5- if ok commit, else rollback

    6- log results for that "row"

    7- end loop

    But off course that depends on your requirements.

  • bulk insert into a single table is incredibly fast. switching away from INSERT statements being generated to sticking the data into a local DataTable, and either syncronizing the changes or using bulk methods are much better, as well as maintaining data type integrity.

    here's a c# code example, where previous processes have put the data i want into a DataTable:

    //now use SQLBulk Copy to get the data into the server, instead of RBAR:

    //note my connection string is based on String.Replace of this:

    //Private Const SqlConnectionFormat As String = "data source={0};initial catalog={1};user id={2};password={3};Trusted_Connection=False;Application Name=ReportGrid.exe;"

    //Private Const SqlTrustedConnectionFormat As String = "data source={0};initial catalog={1};Trusted_Connection=True;Application Name=ReportGrid.exe;"

    {

    try {

    SqlConnection myConn = new SqlConnection(this.ConnectionString);

    myConn.Open();

    using (SqlBulkCopy myBulkCopy = new SqlBulkCopy(myConn)) {

    myBulkCopy.DestinationTableName = "[" + DestinationTable + "]";

    myBulkCopy.WriteToServer(dt);

    }

    } catch (Exception ex) {

    Debug.Print(ex.Message);

    }

    }

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • I learned something today, thanks!

    (a bit shame for me, C# developer)

    Last time I needed to do something like that (plumbing data from a terabyte text file) framework 2.0 was not released yet.

    But for sure ill keep this in mind next time I'll need to implement that kind of application.

  • I solved a similar problem before. Some developers for a 3rd party developed front end code that passed 250,000 individual inserts as part of an "initialization" for a new "project" in the code. After looking at their code, it turned out that the front end code was doing all of that work based on just 4 parameters. I wrote a proc to do the same thing that the front end code was doing and it ran in sub second times not to mention preventing a meltdown of the copper between the front end and the server. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • burfos (11/19/2013)


    Perhaps look in to doing a bulk insert call?

    Thanks. How do I do that ?

  • jcb (11/20/2013)


    To make a decision you must answer some questions: first, there are dependencies between rows in that big chunk of data?

    Second, what do you want to do if a insert fail? Rollback all the inserts, just keep going and create some sort of alert/logging?

    Third, there are performance issues? can I run a batch overnight? That data must be persisted ASAP?

    I don't thing a big blob/string is a good option because you ill need to parse/deserialize that (sql side) and, in general, putting that kind of logic in a SP(or whatever) is not good for performance, reliability and maintenance.

    Search for transaction control and (maybe) stage tables and (maybe) integration strategies.

    You can end doing something like:

    procedure insert raw data

    1- begin transaction

    2- loop(insert in stage table)

    3- commit tran

    procedure process stage table

    1- select no processed data from stage

    2- loop in

    3- begin transaction

    4- process and persist data

    5- if ok commit, else rollback

    6- log results for that "row"

    7- end loop

    But off course that depends on your requirements.

    Wow thats quite a lot of tips for a beginner like me to digest. Thanks for all those points.

    There are no dependencies between them. But, can you give me an example of cases where

    dependencies might arise ?

    If the insert fails, then rollback.

    Thank you very much.

  • Lowell (11/20/2013)


    bulk insert into a single table is incredibly fast. switching away from INSERT statements being generated to sticking the data into a local DataTable, and either syncronizing the changes or using bulk methods are much better, as well as maintaining data type integrity.

    here's a c# code example, where previous processes have put the data i want into a DataTable:

    //now use SQLBulk Copy to get the data into the server, instead of RBAR:

    //note my connection string is based on String.Replace of this:

    //Private Const SqlConnectionFormat As String = "data source={0};initial catalog={1};user id={2};password={3};Trusted_Connection=False;Application Name=ReportGrid.exe;"

    //Private Const SqlTrustedConnectionFormat As String = "data source={0};initial catalog={1};Trusted_Connection=True;Application Name=ReportGrid.exe;"

    {

    try {

    SqlConnection myConn = new SqlConnection(this.ConnectionString);

    myConn.Open();

    using (SqlBulkCopy myBulkCopy = new SqlBulkCopy(myConn)) {

    myBulkCopy.DestinationTableName = "[" + DestinationTable + "]";

    myBulkCopy.WriteToServer(dt);

    }

    } catch (Exception ex) {

    Debug.Print(ex.Message);

    }

    }

    Thanks. I will try this. It seems closer to my "dump into SQL server buffer" requirement.

  • Jeff Moden (11/21/2013)


    I solved a similar problem before. Some developers for a 3rd party developed front end code that passed 250,000 individual inserts as part of an "initialization" for a new "project" in the code. After looking at their code, it turned out that the front end code was doing all of that work based on just 4 parameters. I wrote a proc to do the same thing that the front end code was doing and it ran in sub second times not to mention preventing a meltdown of the copper between the front end and the server. 😛

    Thanks for sharing your experience. I am new, so its hard for me to fully understand why the 3rd party solution was slow. Can you answer some questions for me ?

    1 - Individual inserts as in row by agonizing row ? Thats what I was ordered to do a while ago.

    2 - Can you tell me the differences between the front end and the sp, and why the latter was faster ?

    Thanks.

  • blasto_max (11/22/2013)


    Jeff Moden (11/21/2013)


    I solved a similar problem before. Some developers for a 3rd party developed front end code that passed 250,000 individual inserts as part of an "initialization" for a new "project" in the code. After looking at their code, it turned out that the front end code was doing all of that work based on just 4 parameters. I wrote a proc to do the same thing that the front end code was doing and it ran in sub second times not to mention preventing a meltdown of the copper between the front end and the server. 😛

    Thanks for sharing your experience. I am new, so its hard for me to fully understand why the 3rd party solution was slow. Can you answer some questions for me ?

    1 - Individual inserts as in row by agonizing row ? Thats what I was ordered to do a while ago.

    2 - Can you tell me the differences between the front end and the sp, and why the latter was faster ?

    Thanks.

    Sure.

    1 - Yes... as in RBAR. They created individual INSERT/VALUE statements for each of 250,000 rows and passed those to the SQL Server to process. I don't know much about front-end code but they should have used something like the good bulk code that Lowell posted. It would have been MUCH faster although it would still "heat up the pipe" with all of the data that it transfers from one machine to the server.

    2 - Yes. Each individual INSERT requires SQL Server to look at it and evaluate it so that it can decide whether of not to use a cached execution plan or to create a new one. That alone can take a comparatively substantial amount of time. Then, there's what happens during an INSERT. I'm no expert at the exact details but, for each INSERT, SQL Server has to start an implicit transaction, take the necessary lock(s), write the data to the log, write the data to the table, commit the transaction, etc, etc, etc for each and every insert.

    All 250,000 INSERTs were generated by "boiler-plate (templated or "standard") initialization code" based on just 4 parameters. I wrote the same code using CROSS-JOINs on some "standard" tables to generate the 250,000 rows using a single INSERT/SELECT. It still had to do all of that transactional and execution plan stuff, but it only had to do it once to cover all 250,000 rows as a single INSERT instead of doing it 250,000 times. AND, it didn't have to travel across the network, either. Instead, it operated at the internal buss speeds of the server and memory (and eventually to disk).

    Basically, the sproc avoided all of the bottlenecks and iterative problems that the front-end code introduced.

    I can't set it up to duplicate the nasty load that such a thing places on the network and the I/O system of the server from here but I can demonstrate what 250,000 individual INSERTs do to the server. Run the following RBAR code with the Actual Execution Plan turned off and see how long it takes. Then, turn on the Actual Execution Plan and see how many plans it generates.

    This first snippet of code just creates the test table with a typical PK.

    --===== This just creates a test table in TempDB

    CREATE TABLE #TestTable

    (

    TestTableID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED

    ,SomeInt1 INT

    ,SomeInt2 INT

    ,SomeInt3 INT

    ,SomeInt4 INT

    )

    ;

    This is the RBAR method. Like I said, I can't actually make it send the data over the network for you to see how much that slows the data down. On my 11 year old single CPU box, it takes about 18 seconds to run (this code is nowhere nearly as complicated as to what I had to do as I previously discussed).

    --===== Make sure the test table is empty for reruns

    TRUNCATE TABLE #TestTable;

    --===== Performance Trick 1 suppresses the display of row counts.

    SET NOCOUNT ON;

    --===== Demonstrate a 250,000 row RBAR INSERT of random Integers (20)

    DECLARE @Counter INT;

    SELECT @Counter = 1;

    WHILE @Counter <= 250000

    BEGIN

    INSERT INTO #TestTable

    (SomeInt1,SomeInt2,SomeInt3,SomeInt4)

    SELECT SomeInt1 = ABS(CHECKSUM(NEWID()))%1000+1

    ,SomeInt2 = ABS(CHECKSUM(NEWID()))%1000+1

    ,SomeInt3 = ABS(CHECKSUM(NEWID()))%1000+1

    ,SomeInt4 = ABS(CHECKSUM(NEWID()))%1000+1

    ;

    SELECT @Counter = @Counter+1

    ;

    END

    ;

    We can get a fair bit more speed (1/2 the duration) out of that by encapsulating all of the INSERTs in a single explicit transaction. It's still RBAR, though and takes about 10 seconds on my old machine.

    --===== Truncate the table for the next test

    TRUNCATE TABLE #TestTable;

    --===== Performance Trick 1 suppresses the display of row counts.

    SET NOCOUNT ON;

    --===== Demonstrate a 250,000 row RBAR INSERT of random Integers (20)

    DECLARE @Counter INT;

    SELECT @Counter = 1;

    BEGIN TRANSACTION --Performance trick #2

    WHILE @Counter <= 250000

    BEGIN

    INSERT INTO #TestTable

    (SomeInt1,SomeInt2,SomeInt3,SomeInt4)

    SELECT SomeInt1 = ABS(CHECKSUM(NEWID()))%1000+1

    ,SomeInt2 = ABS(CHECKSUM(NEWID()))%1000+1

    ,SomeInt3 = ABS(CHECKSUM(NEWID()))%1000+1

    ,SomeInt4 = ABS(CHECKSUM(NEWID()))%1000+1

    ;

    SELECT @Counter = @Counter+1

    ;

    END ;

    COMMIT

    ;

    Although there's a faster setbased method for doing this (and, NO! It's not a recursive CTE which is really RBAR in every sense of the term), I wanted to show you this method because of it's simplicity especially for one-off code. It creates the same number of rows but, on my old machine, it's almost 4 1/2 times faster (about 5 seconds) than the original 250,000 individual inserts.

    --===== Truncate the table for the next test

    TRUNCATE TABLE #TestTable;

    --===== Create the same kind of rows in a setbased fashion

    -- using a CROSS-JOIN as a "Pseudo-Cursor" row source.

    -- Although it creates 250,000 rows, it's still a

    -- single INSERT.

    INSERT INTO #TestTable

    (SomeInt1,SomeInt2,SomeInt3,SomeInt4)

    SELECT TOP 250000

    SomeInt1 = ABS(CHECKSUM(NEWID()))%1000+1

    ,SomeInt2 = ABS(CHECKSUM(NEWID()))%1000+1

    ,SomeInt3 = ABS(CHECKSUM(NEWID()))%1000+1

    ,SomeInt4 = ABS(CHECKSUM(NEWID()))%1000+1

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    ;

    I hope that answers your second question.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply