Click here to monitor SSC
SQLServerCentral is supported by Redgate
Log in  ::  Register  ::  Not logged in
Home       Members    Calendar    Who's On

Add to briefcase «««12345

Sending multiple rows to the Database from an Application: Part I Expand / Collapse
Posted Sunday, October 17, 2010 6:45 PM

Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Tuesday, November 29, 2016 11:37 AM
Points: 526, Visits: 2,894
Jeff Moden (10/17/2010)
I'm not a GUI/App programmer so I'll have to find another way to test what you propose, Solomon. Shouldn't be too difficult, though.

If I were an App programmer, I'd write code to make whatever log file you speak of more easily fit the idea of a simple BCP import. Normalization of data isn't reserved to just tables in a database nor any less important outside a database.

Hey there. I do have some sample code in the article I noted above so you can do a simple test and then expand on it if you like. I did not test the XML route but someone else did and posted their findings in the discussion forum.

As far as normalizing the log file, that is not always in your control. For example, where I currently work we take web server log files, parse various parts of each row using RegEx, get current DB info and then INSERT or UPDATE various tables. This process was designed when we were on SQL Server 2000. With the new streaming method of 2008 though, it would be possible to do this more set-based. But the point is that we only have so much control over the format of the log file. And doing BCP still requires having a staging or queue table in the DB to then do set-based stuff on after the loading is done. The streaming method that I have been mentioning bypasses the staging table and it even bypasses the need to wait until the entire dataset is loaded into the table as it processes each row as it comes in but in a way that appears to be set-based. At that point the BCP route takes more time and additional Schema. So even if one could control the format of the input file you are still stuck with a slower and more complicated system.

SQL# -
Post #1005957
Posted Monday, October 18, 2010 7:57 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, July 1, 2014 5:00 PM
Points: 2, Visits: 118
TheSQLGuru (10/15/2010)
I consider Erland's guideance on this topic to be an excellent resource:

+1. Erland's treatment of this topic is extremely detailed, and covers each version of SQLS from 2000 through 2008.
Post #1006227
Posted Monday, October 18, 2010 5:27 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Saturday, March 26, 2011 12:41 PM
Points: 7, Visits: 59
Attached is a zip file containing an extension method that you can use on any IEnumerable in .NET that will create a DataTable from the IEnumerable.

The datatable can then be sent to a stored procedure directly as a paramter and be operated on just like any other table.

The instructions for its use are part of the EntityClass.cs file. I very quickly pulled this stuff out of some code I am working on. If you have difficulties with it, you can email me.

Be sure to read the directions carefully. You must edit the C# code in 2 places to make sure Reflection knows how to identify the proper property/field attributes.



  Post Attachments (5 views, 2.49 KB)
Post #1006661
Posted Tuesday, October 19, 2010 9:57 AM

Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Thursday, April 21, 2016 4:37 AM
Points: 392, Visits: 888
I'm finishing up on a re-write of our import process. We import millions of rows through a Visual FoxPro app from the employee's computer. It can import CSV, fixed length, or DBFs. Our old way was to process the data locally and insert the formatted data. Several approaches were tried using UNION ALL, multiple value statements, and several single inserts statements in batches. All of these methods were slow, mainly because we kept the error handling logic there. Here are the issues we faced.

1) Parsing the data on the client side then sending the data in batches. When a batch failed, try the batch again but with single inserts to find the bad record and continue on. This was slow because we had to loop locally and create single statements from the batch, which in turn, caused more network traffic.

2) Our import process allowed the importation of fields that could not be stored in the destination by storing them in an aux table that stored the textual representation of a column (since most of our data is in CSV or fixed length) so we had to return the inserted IDs to insert this extra data. This slows things down with another round of network traffic and error handling.

The best speed we could achieve was under 500 rows/sec.

Our solution was to build a C# COM object that utilizes SQLBulkCopy then call a stored procedure that knows how to import the data based on the file type.

The process removed as much error handling possible from the client by using a staging table. The staging table can be created by parsing the DataTable that results in the following structures:
1) Fixed Length = Column_0 varchar(max)
2) CSV = Column_0 varchar(max), Column_1 varchar(max).....
3) DBF = Column_0 <mapped field type>, Column_1 <mapped field type>...

All structures include a StagerRowID int IDENTITY(1,1). After the stager is done, a call is made to a stored procedure that imports the data into the destination table in batches of 10K. If an insert fails, switch to single inserts and log the ones that fail into the StagerResults table. And the extra data to be imported gets done in this process.

DBF with 233,415 records
* Current Code *
10:30 ~ 370.5 rows/sec
* New Code *
2:30 ~ 1556 rows/sec

Fixed length with 2,259,995 records
* Current Code *
85:30 ~ 440 rows/sec
* New Code *
8:00 ~ 4708 rows/sec

I'm in the process of optimizing our CSV process but this is the best option we could find for importing large files from a employee's computer without file access to the server to do bulk copy/bcp, which doesn't meet our needs in tracking errors.

/* Anything is possible but is it worth it? */
Post #1007091
« Prev Topic | Next Topic »

Add to briefcase «««12345

Permissions Expand / Collapse