Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Streaming Data Into SQL Server 2008 From an Application


Streaming Data Into SQL Server 2008 From an Application

Author
Message
don_goodman
don_goodman
SSC Rookie
SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)SSC Rookie (48 reputation)

Group: General Forum Members
Points: 48 Visits: 82
The TVP feature is an improvement, yes. BUT you wrote dozens and dozens of lines of code that were only precompiled. The better path was is at the end of your article, "Large datasets that are inserted directly (i.e. not transformed) will likely be faster via T-SQL BULK INSERT, bcp.exe..."
Here's the universally easy way to transform data

1) dump it to ASCII text on the main frame, DB2, universe, oracle, informix or mySQL; parse the attributes with a character like tab or comma or whatever; end the lines with CRLF to be fully compatible with Windows.
2) FTP pull/push the file to a disk on the SQL Server
3) BULK INSERT or BCP the text into a staging table or a memory table like TVP (watch out for your tempdb if memory table)
4) Transform the data in the staging table or the memory table
5) dump the result into the destination

This is the only sensible way to get data over a busy network into a SQL Server database. The rest just run slower and are buggier.
Florian Reischl
Florian Reischl
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2113 Visits: 3934
I think the main difference between "BCP/BULK INSERT" and TVPs is the context to use it.

Since BCP requires special privileges it's not a feature for front-end applications. TVPs can be used by usual import processes or even GUIs which create larger counts of objects.

Flo


The more I learn, the more I know what I do not know
Blog: Things about Software Architecture, .NET development and T-SQL

How to Post Data/Code to get the best Help How to Post Performance Problems
Matt Whitfield
Matt Whitfield
Mr or Mrs. 500
Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)Mr or Mrs. 500 (527 reputation)

Group: General Forum Members
Points: 527 Visits: 719
Solomon Rutzky (5/28/2009)

1) Using the IEnumerable streaming method that I described here provides the greatest benefit when doing transformations in the application layer. I realize that my example code was simplistic and did not show a transformation but this technique is more focused on the "sending" of the data than the "receiving" of it. And in these cases, bulk insert/copy operations do not address the issue of doing anything more than straight reads of the source data.

That's not strictly true - you can do whatever transformation you like on the data that you give to SqlBulkCopy - you don't point it at source data, you give it data through a dataset - which you are able to transform before you put it in there...

Solomon Rutzky (5/28/2009)

2) As far as using TVPs in general goes, regardless of the full-streaming feature when using IEnumerable they do, in my opinion, provide a much cleaner interface than using a staging or loading table. True, using them is not compatible with prior versions of SQL Server but many people do not need such compatibility. Unless there is a specific reason not to do so I believe that you should use all features available in a product. I have mainly worked at Software as a Service companies where we host the application and do not need to worry about what version of the DB (or even what vendor) a customer might have.

Well, I think I just disagree there. If something is compatible, it opens up flexibility. If i tell my boss 'well this hardware is perfectly fine but you need to use x thousand pounds to upgrade your license because i've chosen this implementation route' then that usually doesn't go down too well, in my experience... Personally, having a slightly cleaner interface wouldn't want to make me go and tell my boss that! :-)

Atlantis Interactive - SQL Server Tools
My blog
Why I wrote a sql query analyzer clone
Oleg Netchaev
Oleg Netchaev
SSCommitted
SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)SSCommitted (1.7K reputation)

Group: General Forum Members
Points: 1707 Visits: 1811
This is a great article, Solomon. Thank you.

I do have some comments about the C# code. As daniel.janz already pointed out,

If you wrap a object into a using scope it will automatically be Disposed (and therefore closed) if an exception is thrown. The call to the Close() method in your finally block will then cause another excetion because its called on a already disposed object.

I can add another point because it is amazing how many people consistently do it incorrectly. The SqlCommand inherits from DbCommand which in turn implements IDisposable interface. This means that a variable of SqlCommand reference type must be disposed, for example by enclosing it in the using block. The type does not implement Close() so wrapping it into using block will suffice. The using blocks exist entirely for our convenience, the compiler does not have a concept of using block and simply restates our code to try/finally whenever it finds using {}. It is a matter of pure luck based on the simple fact that the Microsoft's wrappers under SqlCommand are written so well that we can get away with forgetting to dispose it. If you needed to use OleDb connecting to something like Sybase (yes, I used the "S" word, sorry) then you will see how quickly the code would come to a screeching halt, so bad that the connection timeout related messages from the database engine will not even make it back to the C# code, which in turn will happily seat on the same line for days if you wish until someone will kill the process.

The implementation of the using blocks in your sample code needs to be adjusted in all places related to both connection and stream reader. When the IDisposable type variable goes out of using scope, the framework calls its respective Dispose() method, which will attempt to call the Close() if present and then delegate the work to the protected Dispose(bool disposing) where disposing will be passed as true. This is by design is because some people opt to use the using blocks, some prefer to explicitly call Close() and then Dispose() and finally, there are people who unfortunately do nothing in which case the protected Dispose(bool disposing) will be called from destructor passing false as a disposing parameter though.

In other words, if you remove your finally blocks, add respective calls to Close() methods, and wrap the SqlCommands into using blocks the code will be fine.

The article is great, it is good to learn about alternative to bulk methods.

Oleg
Solomon Rutzky
Solomon Rutzky
SSC Eights!
SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)

Group: General Forum Members
Points: 896 Visits: 2946
LP:

Correct: I had not thought of the XML solution and so that would be an incorrect statement that I made about single inserts being the “only” way to load data. I will see about updating the article to be more accurate in that regards. Thanks for pointing that out. Really.




Brian:

Regarding being able to do this in PHP. I have not had time to check out the PHP or Mono (for *NIX) documentation, but if it supports enough basic .Net functionality, then it is possible since this does not need to be done via a SQL Server specific library. I used the ADO.Net method but from what I see it can be via ODBC using “data-at-execution” (whatever that is) or via OLE DB using a custom IRowset implementation.




Florian:

Thanks. And yes, I will try the single instance of the variable. I could have sworn I started with that and got an error but maybe I am misremembering. I will try it again.




Daniel and Oleg :

Thanks for pointing out the misusage of the “using” block. I thought that I originally had it the way that both of you recommended but had someone tell me to change it to what is currently shown. It is possible that I misunderstood that person. In either case I did miss it on the SqlCommand so thanks also, Oleg, for pointing that out.




Don:

I am not sure that I understand. It seems that extracting, FTPing, BCPing, Transforming, and finally loading into the destination seems a bit more complex than simply reading the data from a local source, transforming each row as necessary, and sending it to the server. This is fewer steps, fewer processes, very little memory usage, and a single connection over the network to the DB as opposed to the two steps in your five step process (FTP and BCP). So in using FTP the disk on the SQL Server has to work one time to receive the data and write it locally, then twice more to read the local source and write it to a SQL Server datafile. Then the disk has to work a fourth time to move the data from the “staging” table into the final destination table. In the scenario I mentioned, assuming the same setup with two servers, the disk on the SQL Server would only work one time as it puts the data it receives from the application into the final destination table. This would be 25% the amount of disk I/O (in simplistic terms) as compared to the FTP / BCP / Transform method.




Matt:

1) Yes, you can transform the data before it goes into the DataTable when using SqlBulkCopy, but you still need to populate that DataTable in memory before you can send it to the destination. In doing so it is fully persisted in memory whereas in the solution that I am giving an example of, only a single row at a time is in memory.

2) I am not advocating that this feature alone is a “compelling” reason to upgrade to SQL Server 2008. If you work somewhere that does not have it or produces software that needs to work on various versions of various databases then by all means do what will work in most / all situations. BUT, what I AM saying is that if you already have SQL Server 2008 in-house then why not use a feature that is available to be used? That would not incur any more cost.

SQL# - http://www.SQLsharp.com/
john hill
john hill
Grasshopper
Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)

Group: General Forum Members
Points: 14 Visits: 68
I enjoyed reading your article. Thanks for sharing it and providing the material.

I downloaded your code and tried adding an option to do an import via SQLXML parameter into a similar stored procedure.


The xml(untyped) approach is pretty simple:

CREATE PROCEDURE dbo.ImportDataXml (
@inputXml xml
)
AS
SET NOCOUNT ON

INSERT INTO dbo.ImportTest (SillyIDField, MeaninglessText)
SELECT r.n.value('@id','int'), r.n.value('@value','varchar(50)')
FROM @inputXml.nodes('/Root/Node') as r(n)
GO


My testing showed it to be significantly slower using a small 30K line file. Here are my results(~ 250 ms vs 1250ms)

[09:28 UTC-7] PS>./streamteste Full Test.txt
5/29/2009 9:28:03 AM -- Truncating the import table, clearing the caches, etc...
5/29/2009 9:28:03 AM -- Done setting up for the test
5/29/2009 9:28:03 AM -- Connecting to the Database...
5/29/2009 9:28:03 AM -- Calling proc to read and import the data...
5/29/2009 9:28:03 AM -- Proc is done reading and importing the data via Streamed TVP (0.241 Seconds))
5/29/2009 9:28:03 AM -- Done!
[09:28 UTC-7] PS>./streamteste xml Test.txt
5/29/2009 9:28:08 AM -- Truncating the import table, clearing the caches, etc...
5/29/2009 9:28:08 AM -- Done setting up for the test
5/29/2009 9:28:08 AM -- Connecting to the Database...
5/29/2009 9:28:08 AM -- Calling proc to Xml import the data...
5/29/2009 9:28:09 AM -- Proc is done importing the data via single Xml (1.296 Seconds)
5/29/2009 9:28:09 AM -- Done!

I attached my VS project and simple (testing) code.

Thanks,

John
Attachments
SQLStreamingTesting.zip (11 views, 177.00 KB)
Solomon Rutzky
Solomon Rutzky
SSC Eights!
SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)SSC Eights! (896 reputation)

Group: General Forum Members
Points: 896 Visits: 2946
john hill (5/29/2009)
I enjoyed reading your article. Thanks for sharing it and providing the material.

I downloaded your code and tried adding an option to do an import via SQLXML parameter into a similar stored procedure.

My testing showed it to be significantly slower using a small 30K line file. Here are my results(~ 250 ms vs 1250ms)


Hello John and thanks, I am glad that you liked the article.

And thanks so much for doing that test and providing the results and code. I was curious as to the performance of the XML option. I am not surprised that it takes longer given that it has to parse the XML string. Also, did you pay attention to the memory usage on the system? I am sure that it took up a good amount given that it had to package up the entire XML string before sending it to the database. Again this is the beauty of the IEnumerable interface feature with TVPs starting in SQL Server 2008 in that the application server does not need to use large amounts of memory in order to send large amounts of data to the database.

Take care,
Solomon...

SQL# - http://www.SQLsharp.com/
FZelle
FZelle
Forum Newbie
Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)Forum Newbie (1 reputation)

Group: General Forum Members
Points: 1 Visits: 3
This is one main problem with mass data in ADO.NET.

Most people don't know that for every command send ADO.NET needs an open Transaction.
If there is none, it will open one and Commit or Rollback depending on the result.

So if you open a Transaction for those mass updates yourself, you save a lot of time.

So instead of using 3.000.000 commands and therefor 3.000.000 transactions just use one command,
and an explicit Transaction.
This alone cuts down the time needed in this experiment to about 25%.

So change the OldSchool connection and command loop to

using (dbConnection = new SqlConnection("Server=.\\sqlexpress;Database=Test;Trusted_Connection=True;"))
{
dbConnection.Open();
SqlTransaction trans = dbConnection.BeginTransaction();

Console.WriteLine("{0} -- Calling the proc iteratively ...", DateTime.Now );

using( SqlCommand importProc = new SqlCommand("INSERT INTO dbo.ImportTest (SillyIDField, MeaninglessText) VALUES (@SillyIDField, @MeaninglessText)", dbConnection))
{
SqlParameter sillyIDField = importProc.Parameters.AddWithValue("@SillyIDField", (int)tempTable.Rows[0][0]);
SqlParameter meaninglessText = importProc.Parameters.AddWithValue("@MeaninglessText", (string)tempTable.Rows[0][1]);

importProc.Transaction = trans;


for (int rowNum = 0; rowNum < tempTable.Rows.Count; rowNum++)
{
sillyIDField.Value=(int)tempTable.Rows[rowNum][0];
meaninglessText.Value=(string)tempTable.Rows[rowNum][1];

importProc.ExecuteNonQuery();
}
}
trans.Commit();
Console.WriteLine("{0} -- Done importing the data old-school style ({1} Seconds)", DateTime.Now, elapsedSeconds);
}



And by the way DataRows.ItemArray is only needed if you want to get/set the Rows value as an Object[].
Bruce W Cassidy
Bruce W Cassidy
SSC Eights!
SSC Eights! (841 reputation)SSC Eights! (841 reputation)SSC Eights! (841 reputation)SSC Eights! (841 reputation)SSC Eights! (841 reputation)SSC Eights! (841 reputation)SSC Eights! (841 reputation)SSC Eights! (841 reputation)

Group: General Forum Members
Points: 841 Visits: 1033
Interesting start to a good topic.

I'm not sure I agree with the comment in the article that the "only way" to do things in the past has been with stored procedures that only take one row. It's been possible to pass XML to a stored procedure for some time now. I see someone has put up some code in the discussion to do precisely that. I believe that typed XML has some performance gains, so that might be a worthwhile comparison.

If your application must "stream" data to SQL Server, have you considered using a queue of some sort (either SQL Server Broker or MSMQ for example.) The overhead for using queues is not inconsiderable, but in some cases it may well be a better approach to having to stream the data.

Anyway, the point being that this is a new solution (and an excellent one) in SQL Server 2008, but by no means the only solution to the issue of having to pass in multiple rows of typed data to SQL Server.

Florian Reischl
Florian Reischl
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2113 Visits: 3934
@Solomon Rutzky
Just want let you know, your article is referenced in current German SQL PASS newsletter! ;-)


The more I learn, the more I know what I do not know
Blog: Things about Software Architecture, .NET development and T-SQL

How to Post Data/Code to get the best Help How to Post Performance Problems
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search