Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««123

Streaming Data Into SQL Server 2008 From an Application Expand / Collapse
Author
Message
Posted Wednesday, July 1, 2009 9:42 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Sunday, December 14, 2014 10:06 PM
Points: 379, Visits: 1,991
Florian Reischl (7/1/2009)
@Solomon Rutzky
Just want let you know, your article is referenced in current German SQL PASS newsletter!


Sweet! Danke for sharing .

Zai Gezundt.





SQL# - http://www.SQLsharp.com/
Post #745540
Posted Saturday, December 12, 2009 12:57 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Sunday, December 14, 2014 10:06 PM
Points: 379, Visits: 1,991
FZelle (6/10/2009)

Most people don't know that for every command send ADO.NET needs an open Transaction.
If there is none, it will open one and Commit or Rollback depending on the result.

So if you open a Transaction for those mass updates yourself, you save a lot of time.

So instead of using 3.000.000 commands and therefor 3.000.000 transactions just use one command,
and an explicit Transaction.
This alone cuts down the time needed in this experiment to about 25%.


Hello FZelle. I am sorry it has taken so long to respond to this but shortly after this post we had our first child and many things got lost in the mix.

Thank you for pointing out the explicit transaction. Yes, I had not thought of that and I suspect that you are correct in that most people do not try this method. It is great to see such a simple update gain so much efficiency.

In contrasting that with the method I am showing in this article, I see a benefit in the new streaming method in that it only takes less than 30 seconds as opposed to the 8 or so minutes that it would take assuming 25% of the original time it took. In certain situations to do the explicit transaction will lock the table for longer than is acceptable. In my current company we have a table that is updated in this method (many calls to a single row insert / update) with millions of rows and while doing the explicit transaction would reduce that time greatly, it would also lock the table which is not acceptable given that several processes need to write to the table at about the same time and cannot wait for 8 minutes (assuming lock escalation occurs). So there are situations where the explicit transaction would indeed reduce time but still not be a workable solution. However, as I said, this is still great information to know and helps a lot for people still on SQL Server 2005 who cannot do the streaming method.

Lastly, to Bruce: Yes, the intention was to point out a new solution. I hope that I did not convey that this is the only solution but certainly one that should be considered.


Take care,
Solomon...





SQL# - http://www.SQLsharp.com/
Post #833396
Posted Thursday, May 17, 2012 6:31 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Saturday, November 29, 2014 2:36 PM
Points: 42, Visits: 382
1. generally, you will find that 500-1000 rows/transaction will give the optimal performance. Beyond that, you won't see much performance improvement.

2. doing all 3 million rows in the transaction adversely impacts server resources and performance.

3. If the upload table is going to be shared, you might also consider doing individual uploads to separate tables (one per user) and then executing an Insert statement from the upload table to the shared table -- cached upload.



Post #1301695
Posted Monday, May 28, 2012 10:24 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Sunday, December 14, 2014 10:06 PM
Points: 379, Visits: 1,991
mark hutchinson (5/17/2012)
1. generally, you will find that 500-1000 rows/transaction will give the optimal performance. Beyond that, you won't see much performance improvement.


Hi Mark and thanks for the reply.

I would advise against having, or promoting, a set range of rows per transaction as there are far too many variables that play into overall performance. I have seen queries that work great at 80 rows and hang at 100 as well as other queries that are fine at the 5000 - 10,000 range. It all comes down to testing to find what is optimal for a particular situation.

2. doing all 3 million rows in the transaction adversely impacts server resources and performance.


Can you be more specific, please? My testing shows that using a streaming TVP for 3 million rows is actually not that impacting to CPU and Memory, so what resource(s) are you thinking of? Sure, the tran log might grow slightly faster/larger but that might be worth the trade-off. Also keep in mind that the 3 million rows is a contrived example to show the disparity between the different mechanisms used to get the data into the DB. If locking a table for 20 seconds is too long and increases blocking, then it can be batched, but that wasn't really the point of the article.


3. If the upload table is going to be shared, you might also consider doing individual uploads to separate tables (one per user) and then executing an Insert statement from the upload table to the shared table -- cached upload.


I don't see any particular gain here as the INSERT into the destination table should take the same amount of time either way. But the beauty of the TVP approach, even if not streaming, is that the incoming data can first go into a temporary table (if any massaging of the data needs to occur or if something needs to be indexed) and then into the shared table (possibly in batches), thus eliminating any need for additional schema (i.e. the per-user tables). Otherwise, assuming a more typical case of thousands, tens of thousands, or even several hundred thousand rows to be imported, the operation would likely happen quickly enough to not even need those additional steps. But again, testing is key and it is easy enough to add the intermediary temp table.

Take care,
Solomon...





SQL# - http://www.SQLsharp.com/
Post #1307498
Posted Tuesday, May 29, 2012 7:07 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Saturday, November 29, 2014 2:36 PM
Points: 42, Visits: 382
@Solomon

My transaction range was based on some testing I'd done a long time ago. I'm surprised that a batch Insert transaction would slow/hang at 100. I agree that the transaction size can be much larger, but the performance levels off after a point and the memory cost becomes a consideration.

My statement about a 3M row transaction being ill-advised was also based on old testing. Both TVP and transactions cache the Insert data differently. Since TVPs are a more recent feature, I would hope that it is inherently efficient. My experience with huge transactions is that there is a real memory resource hit and that slows everything on the system. If you have a server with sufficient physical memory, you would not experience this hit. I would expect you to see memory use spike on a performance monitor.

My suggestion about multiple destination staging tables was meant to address table locking for multiple simultaneous users' batch Insert activities.

TVP is a beautiful thing. My original comment was inquiring whether any performance test had been done using transactions. If TVP is sufficiently faster or more efficient, it might cause some people to change their batch Insert code.

Mark



Post #1307738
« Prev Topic | Next Topic »

Add to briefcase «««123

Permissions Expand / Collapse