SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Sending multiple rows to the Database from an Application: Part I


Sending multiple rows to the Database from an Application: Part I

Author
Message
Solomon Rutzky
Solomon Rutzky
Hall of Fame
Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)Hall of Fame (3.3K reputation)

Group: General Forum Members
Points: 3324 Visits: 3029
Jeff Moden (10/17/2010)
I'm not a GUI/App programmer so I'll have to find another way to test what you propose, Solomon. Shouldn't be too difficult, though.

If I were an App programmer, I'd write code to make whatever log file you speak of more easily fit the idea of a simple BCP import. Normalization of data isn't reserved to just tables in a database nor any less important outside a database.


Hey there. I do have some sample code in the article I noted above so you can do a simple test and then expand on it if you like. I did not test the XML route but someone else did and posted their findings in the discussion forum.

As far as normalizing the log file, that is not always in your control. For example, where I currently work we take web server log files, parse various parts of each row using RegEx, get current DB info and then INSERT or UPDATE various tables. This process was designed when we were on SQL Server 2000. With the new streaming method of 2008 though, it would be possible to do this more set-based. But the point is that we only have so much control over the format of the log file. And doing BCP still requires having a staging or queue table in the DB to then do set-based stuff on after the loading is done. The streaming method that I have been mentioning bypasses the staging table and it even bypasses the need to wait until the entire dataset is loaded into the table as it processes each row as it comes in but in a way that appears to be set-based. At that point the BCP route takes more time and additional Schema. So even if one could control the format of the input file you are still stuck with a slower and more complicated system.

SQL# - http://www.SQLsharp.com/
skel-man
skel-man
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: General Forum Members
Points: 2 Visits: 118
TheSQLGuru (10/15/2010)
I consider Erland's guideance on this topic to be an excellent resource: www.sommarskog.se/arrays-in-sql.html


+1. Erland's treatment of this topic is extremely detailed, and covers each version of SQLS from 2000 through 2008.
kennydevries
kennydevries
Forum Newbie
Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)Forum Newbie (7 reputation)

Group: General Forum Members
Points: 7 Visits: 59
Attached is a zip file containing an extension method that you can use on any IEnumerable in .NET that will create a DataTable from the IEnumerable.

The datatable can then be sent to a stored procedure directly as a paramter and be operated on just like any other table.

The instructions for its use are part of the EntityClass.cs file. I very quickly pulled this stuff out of some code I am working on. If you have difficulties with it, you can email me.

Be sure to read the directions carefully. You must edit the C# code in 2 places to make sure Reflection knows how to identify the proper property/field attributes.

Cheers,

Ken
Attachments
CodeSamples.zip (11 views, 2.00 KB)
Gatekeeper
Gatekeeper
SSChasing Mays
SSChasing Mays (621 reputation)SSChasing Mays (621 reputation)SSChasing Mays (621 reputation)SSChasing Mays (621 reputation)SSChasing Mays (621 reputation)SSChasing Mays (621 reputation)SSChasing Mays (621 reputation)SSChasing Mays (621 reputation)

Group: General Forum Members
Points: 621 Visits: 888
I'm finishing up on a re-write of our import process. We import millions of rows through a Visual FoxPro app from the employee's computer. It can import CSV, fixed length, or DBFs. Our old way was to process the data locally and insert the formatted data. Several approaches were tried using UNION ALL, multiple value statements, and several single inserts statements in batches. All of these methods were slow, mainly because we kept the error handling logic there. Here are the issues we faced.

1) Parsing the data on the client side then sending the data in batches. When a batch failed, try the batch again but with single inserts to find the bad record and continue on. This was slow because we had to loop locally and create single statements from the batch, which in turn, caused more network traffic.

2) Our import process allowed the importation of fields that could not be stored in the destination by storing them in an aux table that stored the textual representation of a column (since most of our data is in CSV or fixed length) so we had to return the inserted IDs to insert this extra data. This slows things down with another round of network traffic and error handling.

The best speed we could achieve was under 500 rows/sec.

Our solution was to build a C# COM object that utilizes SQLBulkCopy then call a stored procedure that knows how to import the data based on the file type.

The process removed as much error handling possible from the client by using a staging table. The staging table can be created by parsing the DataTable that results in the following structures:
1) Fixed Length = Column_0 varchar(max)
2) CSV = Column_0 varchar(max), Column_1 varchar(max).....
3) DBF = Column_0 <mapped field type>, Column_1 <mapped field type>...

All structures include a StagerRowID int IDENTITY(1,1). After the stager is done, a call is made to a stored procedure that imports the data into the destination table in batches of 10K. If an insert fails, switch to single inserts and log the ones that fail into the StagerResults table. And the extra data to be imported gets done in this process.

Results:
DBF with 233,415 records
* Current Code *
10:30 ~ 370.5 rows/sec
* New Code *
2:30 ~ 1556 rows/sec

Fixed length with 2,259,995 records
* Current Code *
85:30 ~ 440 rows/sec
* New Code *
8:00 ~ 4708 rows/sec

I'm in the process of optimizing our CSV process but this is the best option we could find for importing large files from a employee's computer without file access to the server to do bulk copy/bcp, which doesn't meet our needs in tracking errors.

/* Anything is possible but is it worth it? */
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search