Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Streaming Data Into SQL Server 2008 From an Application Expand / Collapse
Author
Message
Posted Thursday, May 28, 2009 8:23 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, April 6, 2010 1:30 PM
Points: 34, Visits: 82
The TVP feature is an improvement, yes. BUT you wrote dozens and dozens of lines of code that were only precompiled. The better path was is at the end of your article, "Large datasets that are inserted directly (i.e. not transformed) will likely be faster via T-SQL BULK INSERT, bcp.exe..."
Here's the universally easy way to transform data

1) dump it to ASCII text on the main frame, DB2, universe, oracle, informix or mySQL; parse the attributes with a character like tab or comma or whatever; end the lines with CRLF to be fully compatible with Windows.
2) FTP pull/push the file to a disk on the SQL Server
3) BULK INSERT or BCP the text into a staging table or a memory table like TVP (watch out for your tempdb if memory table)
4) Transform the data in the staging table or the memory table
5) dump the result into the destination

This is the only sensible way to get data over a busy network into a SQL Server database. The rest just run slower and are buggier.
Post #725102
Posted Thursday, May 28, 2009 8:33 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Sunday, November 4, 2012 12:23 PM
Points: 1,893, Visits: 3,932
I think the main difference between "BCP/BULK INSERT" and TVPs is the context to use it.

Since BCP requires special privileges it's not a feature for front-end applications. TVPs can be used by usual import processes or even GUIs which create larger counts of objects.

Flo



The more I learn, the more I know what I do not know
Blog: Things about Software Architecture, .NET development and T-SQL

How to Post Data/Code to get the best Help How to Post Performance Problems
Post #725113
Posted Thursday, May 28, 2009 10:47 AM


SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Monday, July 22, 2013 11:59 AM
Points: 477, Visits: 719
Solomon Rutzky (5/28/2009)

1) Using the IEnumerable streaming method that I described here provides the greatest benefit when doing transformations in the application layer. I realize that my example code was simplistic and did not show a transformation but this technique is more focused on the "sending" of the data than the "receiving" of it. And in these cases, bulk insert/copy operations do not address the issue of doing anything more than straight reads of the source data.

That's not strictly true - you can do whatever transformation you like on the data that you give to SqlBulkCopy - you don't point it at source data, you give it data through a dataset - which you are able to transform before you put it in there...

Solomon Rutzky (5/28/2009)

2) As far as using TVPs in general goes, regardless of the full-streaming feature when using IEnumerable they do, in my opinion, provide a much cleaner interface than using a staging or loading table. True, using them is not compatible with prior versions of SQL Server but many people do not need such compatibility. Unless there is a specific reason not to do so I believe that you should use all features available in a product. I have mainly worked at Software as a Service companies where we host the application and do not need to worry about what version of the DB (or even what vendor) a customer might have.

Well, I think I just disagree there. If something is compatible, it opens up flexibility. If i tell my boss 'well this hardware is perfectly fine but you need to use x thousand pounds to upgrade your license because i've chosen this implementation route' then that usually doesn't go down too well, in my experience... Personally, having a slightly cleaner interface wouldn't want to make me go and tell my boss that!


Atlantis Interactive - SQL Server Tools
My blog
Why I wrote a sql query analyzer clone
Post #725247
Posted Thursday, May 28, 2009 11:10 AM
SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Yesterday @ 11:58 AM
Points: 1,676, Visits: 1,758
This is a great article, Solomon. Thank you.

I do have some comments about the C# code. As daniel.janz already pointed out,

If you wrap a object into a using scope it will automatically be Disposed (and therefore closed) if an exception is thrown. The call to the Close() method in your finally block will then cause another excetion because its called on a already disposed object.

I can add another point because it is amazing how many people consistently do it incorrectly. The SqlCommand inherits from DbCommand which in turn implements IDisposable interface. This means that a variable of SqlCommand reference type must be disposed, for example by enclosing it in the using block. The type does not implement Close() so wrapping it into using block will suffice. The using blocks exist entirely for our convenience, the compiler does not have a concept of using block and simply restates our code to try/finally whenever it finds using {}. It is a matter of pure luck based on the simple fact that the Microsoft's wrappers under SqlCommand are written so well that we can get away with forgetting to dispose it. If you needed to use OleDb connecting to something like Sybase (yes, I used the "S" word, sorry) then you will see how quickly the code would come to a screeching halt, so bad that the connection timeout related messages from the database engine will not even make it back to the C# code, which in turn will happily seat on the same line for days if you wish until someone will kill the process.

The implementation of the using blocks in your sample code needs to be adjusted in all places related to both connection and stream reader. When the IDisposable type variable goes out of using scope, the framework calls its respective Dispose() method, which will attempt to call the Close() if present and then delegate the work to the protected Dispose(bool disposing) where disposing will be passed as true. This is by design is because some people opt to use the using blocks, some prefer to explicitly call Close() and then Dispose() and finally, there are people who unfortunately do nothing in which case the protected Dispose(bool disposing) will be called from destructor passing false as a disposing parameter though.

In other words, if you remove your finally blocks, add respective calls to Close() methods, and wrap the SqlCommands into using blocks the code will be fine.

The article is great, it is good to learn about alternative to bulk methods.

Oleg
Post #725266
Posted Thursday, May 28, 2009 11:05 PM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 2:01 PM
Points: 368, Visits: 1,956
LP:

Correct: I had not thought of the XML solution and so that would be an incorrect statement that I made about single inserts being the “only” way to load data. I will see about updating the article to be more accurate in that regards. Thanks for pointing that out. Really.




Brian:

Regarding being able to do this in PHP. I have not had time to check out the PHP or Mono (for *NIX) documentation, but if it supports enough basic .Net functionality, then it is possible since this does not need to be done via a SQL Server specific library. I used the ADO.Net method but from what I see it can be via ODBC using “data-at-execution” (whatever that is) or via OLE DB using a custom IRowset implementation.




Florian:

Thanks. And yes, I will try the single instance of the variable. I could have sworn I started with that and got an error but maybe I am misremembering. I will try it again.




Daniel and Oleg :

Thanks for pointing out the misusage of the “using” block. I thought that I originally had it the way that both of you recommended but had someone tell me to change it to what is currently shown. It is possible that I misunderstood that person. In either case I did miss it on the SqlCommand so thanks also, Oleg, for pointing that out.




Don:

I am not sure that I understand. It seems that extracting, FTPing, BCPing, Transforming, and finally loading into the destination seems a bit more complex than simply reading the data from a local source, transforming each row as necessary, and sending it to the server. This is fewer steps, fewer processes, very little memory usage, and a single connection over the network to the DB as opposed to the two steps in your five step process (FTP and BCP). So in using FTP the disk on the SQL Server has to work one time to receive the data and write it locally, then twice more to read the local source and write it to a SQL Server datafile. Then the disk has to work a fourth time to move the data from the “staging” table into the final destination table. In the scenario I mentioned, assuming the same setup with two servers, the disk on the SQL Server would only work one time as it puts the data it receives from the application into the final destination table. This would be 25% the amount of disk I/O (in simplistic terms) as compared to the FTP / BCP / Transform method.




Matt:

1) Yes, you can transform the data before it goes into the DataTable when using SqlBulkCopy, but you still need to populate that DataTable in memory before you can send it to the destination. In doing so it is fully persisted in memory whereas in the solution that I am giving an example of, only a single row at a time is in memory.

2) I am not advocating that this feature alone is a “compelling” reason to upgrade to SQL Server 2008. If you work somewhere that does not have it or produces software that needs to work on various versions of various databases then by all means do what will work in most / all situations. BUT, what I AM saying is that if you already have SQL Server 2008 in-house then why not use a feature that is available to be used? That would not incur any more cost.





SQL# - http://www.SQLsharp.com/
Post #725602
Posted Friday, May 29, 2009 10:37 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Saturday, February 15, 2014 10:40 AM
Points: 14, Visits: 66
I enjoyed reading your article. Thanks for sharing it and providing the material.

I downloaded your code and tried adding an option to do an import via SQLXML parameter into a similar stored procedure.


The xml(untyped) approach is pretty simple:

CREATE PROCEDURE dbo.ImportDataXml (
@inputXml xml
)
AS
SET NOCOUNT ON

INSERT INTO dbo.ImportTest (SillyIDField, MeaninglessText)
SELECT r.n.value('@id','int'), r.n.value('@value','varchar(50)')
FROM @inputXml.nodes('/Root/Node') as r(n)
GO


My testing showed it to be significantly slower using a small 30K line file. Here are my results(~ 250 ms vs 1250ms)

[09:28 UTC-7] PS>./streamteste Full Test.txt
5/29/2009 9:28:03 AM -- Truncating the import table, clearing the caches, etc...
5/29/2009 9:28:03 AM -- Done setting up for the test
5/29/2009 9:28:03 AM -- Connecting to the Database...
5/29/2009 9:28:03 AM -- Calling proc to read and import the data...
5/29/2009 9:28:03 AM -- Proc is done reading and importing the data via Streamed TVP (0.241 Seconds))
5/29/2009 9:28:03 AM -- Done!
[09:28 UTC-7] PS>./streamteste xml Test.txt
5/29/2009 9:28:08 AM -- Truncating the import table, clearing the caches, etc...
5/29/2009 9:28:08 AM -- Done setting up for the test
5/29/2009 9:28:08 AM -- Connecting to the Database...
5/29/2009 9:28:08 AM -- Calling proc to Xml import the data...
5/29/2009 9:28:09 AM -- Proc is done importing the data via single Xml (1.296 Seconds)
5/29/2009 9:28:09 AM -- Done!

I attached my VS project and simple (testing) code.

Thanks,

John


  Post Attachments 
SQLStreamingTesting.zip (7 views, 177.02 KB)
Post #725944
Posted Monday, June 1, 2009 8:18 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Yesterday @ 2:01 PM
Points: 368, Visits: 1,956
john hill (5/29/2009)
I enjoyed reading your article. Thanks for sharing it and providing the material.

I downloaded your code and tried adding an option to do an import via SQLXML parameter into a similar stored procedure.

My testing showed it to be significantly slower using a small 30K line file. Here are my results(~ 250 ms vs 1250ms)


Hello John and thanks, I am glad that you liked the article.

And thanks so much for doing that test and providing the results and code. I was curious as to the performance of the XML option. I am not surprised that it takes longer given that it has to parse the XML string. Also, did you pay attention to the memory usage on the system? I am sure that it took up a good amount given that it had to package up the entire XML string before sending it to the database. Again this is the beauty of the IEnumerable interface feature with TVPs starting in SQL Server 2008 in that the application server does not need to use large amounts of memory in order to send large amounts of data to the database.

Take care,
Solomon...





SQL# - http://www.SQLsharp.com/
Post #726685
Posted Wednesday, June 10, 2009 11:14 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, June 12, 2009 4:04 AM
Points: 1, Visits: 3
This is one main problem with mass data in ADO.NET.

Most people don't know that for every command send ADO.NET needs an open Transaction.
If there is none, it will open one and Commit or Rollback depending on the result.

So if you open a Transaction for those mass updates yourself, you save a lot of time.

So instead of using 3.000.000 commands and therefor 3.000.000 transactions just use one command,
and an explicit Transaction.
This alone cuts down the time needed in this experiment to about 25%.

So change the OldSchool connection and command loop to
                using (dbConnection = new SqlConnection("Server=.\\sqlexpress;Database=Test;Trusted_Connection=True;"))
{
dbConnection.Open();
SqlTransaction trans = dbConnection.BeginTransaction();

Console.WriteLine("{0} -- Calling the proc iteratively ...", DateTime.Now );

using( SqlCommand importProc = new SqlCommand("INSERT INTO dbo.ImportTest (SillyIDField, MeaninglessText) VALUES (@SillyIDField, @MeaninglessText)", dbConnection))
{
SqlParameter sillyIDField = importProc.Parameters.AddWithValue("@SillyIDField", (int)tempTable.Rows[0][0]);
SqlParameter meaninglessText = importProc.Parameters.AddWithValue("@MeaninglessText", (string)tempTable.Rows[0][1]);

importProc.Transaction = trans;


for (int rowNum = 0; rowNum < tempTable.Rows.Count; rowNum++)
{
sillyIDField.Value=(int)tempTable.Rows[rowNum][0];
meaninglessText.Value=(string)tempTable.Rows[rowNum][1];

importProc.ExecuteNonQuery();
}
}
trans.Commit();
Console.WriteLine("{0} -- Done importing the data old-school style ({1} Seconds)", DateTime.Now, elapsedSeconds);
}

And by the way DataRows.ItemArray is only needed if you want to get/set the Rows value as an Object[].
Post #732497
Posted Thursday, June 11, 2009 8:27 PM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Tuesday, July 22, 2014 4:12 PM
Points: 758, Visits: 1,031
Interesting start to a good topic.

I'm not sure I agree with the comment in the article that the "only way" to do things in the past has been with stored procedures that only take one row. It's been possible to pass XML to a stored procedure for some time now. I see someone has put up some code in the discussion to do precisely that. I believe that typed XML has some performance gains, so that might be a worthwhile comparison.

If your application must "stream" data to SQL Server, have you considered using a queue of some sort (either SQL Server Broker or MSMQ for example.) The overhead for using queues is not inconsiderable, but in some cases it may well be a better approach to having to stream the data.

Anyway, the point being that this is a new solution (and an excellent one) in SQL Server 2008, but by no means the only solution to the issue of having to pass in multiple rows of typed data to SQL Server.
Post #733467
Posted Wednesday, July 1, 2009 6:18 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Sunday, November 4, 2012 12:23 PM
Points: 1,893, Visits: 3,932
@Solomon Rutzky
Just want let you know, your article is referenced in current German SQL PASS newsletter!



The more I learn, the more I know what I do not know
Blog: Things about Software Architecture, .NET development and T-SQL

How to Post Data/Code to get the best Help How to Post Performance Problems
Post #745277
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse