Sending multiple rows to the Database from an Application: Part I

  • I've used the XML approach many times over for bulk loading many times over. It is probably the most efficient way I've found to do this. However, I use the OPENXML statement to parse the XML for me, then I can treat the xml as if it were a table.

    I saw a post about user-defined tables for passing tables into. I would advise against this. It sounds great on paper and I've used it myself (with regrets now). Everything works fine when creating the setup. You define the table in SQL, build your sproc accepting the parameter of the table, pass the table in your code, it works fine. Fast-forward 6 months and you have to change that table structure and find out that you can't modify the user-defined table, you have to drop it and recreate it, but you can't drop it because your sproc is still referencing it. You find yourself squirreling around with the system just to make a simple change.

    XML, you make the changes in your class, you make the changes in your sproc. Nice and simple.

  • I'd like to second Stephen Byrne's question on the first page of this thread. What would be the best way to report errors on specific insert/update/delete operations. Stephen used the example of a 1000 row operation, and an error occurs on the 999th row. I'll expand on that. What if errors occurred on rows 7, 345, and 999? What would you suggest for reporting the errors specific to those rows, while allowing the entire 1000 row operation to complete?

  • I've had some success using temp tables to pass multi-row data into stored procs. This has worked well since SQL Server 2000 and is for passing smaller numbers of rows.

  • As yet another alternative... use with your favorite splitter...

    http://www.sqlservercentral.com/articles/T-SQL/63003/

    I also recommend NOT using a recurrsive CTE for such a thing because it's just too slow.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • For anyone using SQL Server 2008, there is another option that is extremely efficient and does not require a splitter function:

    http://www.sqlservercentral.com/articles/SQL+Server+2008/66554/

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • olimp23 (10/15/2010)


    Nice article with interesting approach. I would use it in SqlServer 2000, but since SqlSever 2005 has a good XML support I would choose XML. In fact I have used XML parameter function to pass 50k+ rows from my Java application to SqlServer. I wonder witch approach is faster.

    Heh... you should test it an let us know. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Daniel Macey (8/18/2009)


    Good to see an article on a tricky and quite important piece of code.

    Although, with SQL Server 2008 around, I am surprised that you did not mention the Table user defined type and the MERGE statement?

    Since you brought it up, do you have an example you'd like to share?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Gosh... my first take on this is all the people in this discussion citing XML as a good way to do this. Would you folks mind providing a little code on how to do that? Then we'll find out if it's a good way to pass large numbers of parameters or not.

    My second take on it is that a SQLCLR splitter would probably be the best way to split up large "matrices" of data passed to the database. Of course, right after that I ask "except for certain process monitoring devices (which should be captured in a file to begin with), why would anyone need to pass so much data (ie. a million rows) from an app to a database?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/17/2010)


    My second take on it is that a SQLCLR splitter would probably be the best way to split up large "matrices" of data passed to the database. Of course, right after that I ask "except for certain process monitoring devices (which should be captured in a file to begin with), why would anyone need to pass so much data (ie. a million rows) from an app to a database?

    Jeff, the speed of large sets in CLR depends on which version of SQL Server one is using. I have noticed that in SQL Server 2005 trying to split a million items from a comma-delimited list starts out fast but slows down over time. So for SQL Server 2005 a T-SQL UDF based on a Numbers Table is the fastest way that I have seen. However, the CLR issue was fixed in SQL Server 2008 and the SQLCLR-based split function is just as fast as the T-SQL / Numbers Table version. Of course, if someone is splitting a complex set and not a simple-delimited list then CLR might certainly be the way to go in either case.

    As far as to "why" anyone would need to send a million rows, it could be reading in log files to parse the data and update one or more tables that are not a simple BCP operation. In this case, as I noted above, the fastest way to send in large sets of rows is to use the streaming function combined with a Table-Valued Parameter that will allow someone to do a set-based approach that can start processing with the first row received as opposed to having to receive the entire set of data first which can be several MB if you are talking about 1 million+ rows.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • I'm not a GUI/App programmer so I'll have to find another way to test what you propose, Solomon. Shouldn't be too difficult, though.

    If I were an App programmer, I'd write code to make whatever log file you speak of more easily fit the idea of a simple BCP import. Normalization of data isn't reserved to just tables in a database nor any less important outside a database.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/17/2010)


    I'm not a GUI/App programmer so I'll have to find another way to test what you propose, Solomon. Shouldn't be too difficult, though.

    If I were an App programmer, I'd write code to make whatever log file you speak of more easily fit the idea of a simple BCP import. Normalization of data isn't reserved to just tables in a database nor any less important outside a database.

    Hey there. I do have some sample code in the article I noted above so you can do a simple test and then expand on it if you like. I did not test the XML route but someone else did and posted their findings in the discussion forum.

    As far as normalizing the log file, that is not always in your control. For example, where I currently work we take web server log files, parse various parts of each row using RegEx, get current DB info and then INSERT or UPDATE various tables. This process was designed when we were on SQL Server 2000. With the new streaming method of 2008 though, it would be possible to do this more set-based. But the point is that we only have so much control over the format of the log file. And doing BCP still requires having a staging or queue table in the DB to then do set-based stuff on after the loading is done. The streaming method that I have been mentioning bypasses the staging table and it even bypasses the need to wait until the entire dataset is loaded into the table as it processes each row as it comes in but in a way that appears to be set-based. At that point the BCP route takes more time and additional Schema. So even if one could control the format of the input file you are still stuck with a slower and more complicated system.

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • TheSQLGuru (10/15/2010)


    I consider Erland's guideance on this topic to be an excellent resource: http://www.sommarskog.se/arrays-in-sql.html%5B/quote%5D

    +1. Erland's treatment of this topic is extremely detailed, and covers each version of SQLS from 2000 through 2008.

  • Attached is a zip file containing an extension method that you can use on any IEnumerable in .NET that will create a DataTable from the IEnumerable.

    The datatable can then be sent to a stored procedure directly as a paramter and be operated on just like any other table.

    The instructions for its use are part of the EntityClass.cs file. I very quickly pulled this stuff out of some code I am working on. If you have difficulties with it, you can email me.

    Be sure to read the directions carefully. You must edit the C# code in 2 places to make sure Reflection knows how to identify the proper property/field attributes.

    Cheers,

    Ken

  • I'm finishing up on a re-write of our import process. We import millions of rows through a Visual FoxPro app from the employee's computer. It can import CSV, fixed length, or DBFs. Our old way was to process the data locally and insert the formatted data. Several approaches were tried using UNION ALL, multiple value statements, and several single inserts statements in batches. All of these methods were slow, mainly because we kept the error handling logic there. Here are the issues we faced.

    1) Parsing the data on the client side then sending the data in batches. When a batch failed, try the batch again but with single inserts to find the bad record and continue on. This was slow because we had to loop locally and create single statements from the batch, which in turn, caused more network traffic.

    2) Our import process allowed the importation of fields that could not be stored in the destination by storing them in an aux table that stored the textual representation of a column (since most of our data is in CSV or fixed length) so we had to return the inserted IDs to insert this extra data. This slows things down with another round of network traffic and error handling.

    The best speed we could achieve was under 500 rows/sec.

    Our solution was to build a C# COM object that utilizes SQLBulkCopy then call a stored procedure that knows how to import the data based on the file type.

    The process removed as much error handling possible from the client by using a staging table. The staging table can be created by parsing the DataTable that results in the following structures:

    1) Fixed Length = Column_0 varchar(max)

    2) CSV = Column_0 varchar(max), Column_1 varchar(max).....

    3) DBF = Column_0 <mapped field type>, Column_1 <mapped field type>...

    All structures include a StagerRowID int IDENTITY(1,1). After the stager is done, a call is made to a stored procedure that imports the data into the destination table in batches of 10K. If an insert fails, switch to single inserts and log the ones that fail into the StagerResults table. And the extra data to be imported gets done in this process.

    Results:

    DBF with 233,415 records

    * Current Code *

    10:30 ~ 370.5 rows/sec

    * New Code *

    2:30 ~ 1556 rows/sec

    Fixed length with 2,259,995 records

    * Current Code *

    85:30 ~ 440 rows/sec

    * New Code *

    8:00 ~ 4708 rows/sec

    I'm in the process of optimizing our CSV process but this is the best option we could find for importing large files from a employee's computer without file access to the server to do bulk copy/bcp, which doesn't meet our needs in tracking errors.

    /* Anything is possible but is it worth it? */

Viewing 14 posts - 31 through 43 (of 43 total)

You must be logged in to reply to this topic. Login to reply