Streaming Data Into SQL Server 2008 From an Application

  • Comments posted to this topic are about the item Streaming Data Into SQL Server 2008 From an Application

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • Nice topic Solomon.

    I have some suggestion on the log growth:

    in the sp: you could use select into, or enable minimal logged inserts on SQL server 2008, see http://msdn.microsoft.com/en-us/library/dd425070.aspx to get minimal logged inserts and reduce log growth. When you do this comparing the TVP streaming to bulk import methods becomes fairer, because they always are minimal logged.

    Kind Regards,

    DM Unseen AKA M. Evers

  • I don't really see the major benefit over using the SqlBulkCopy class here, which is also backwards compatible 2000 up.

    A minor benefit that I can see is that you don't need to have a 'loading table' in which data is placed before being transformed by a stored proc if that transformation is necessary.

    But it's a really minor benefit to break compatibility from 2000/2005.

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • Hello "dm_unseen" and thanks for that suggestion.

    Matt Whitfield (5/28/2009)


    I don't really see the major benefit over using the SqlBulkCopy class here, which is also backwards compatible 2000 up.

    A minor benefit that I can see is that you don't need to have a 'loading table' in which data is placed before being transformed by a stored proc if that transformation is necessary.

    But it's a really minor benefit to break compatibility from 2000/2005.

    Hello Matt. Regarding the similarity to BULK INSERT, SqlBulkCopy, and other methods, I would say that:

    1) Using the IEnumerable streaming method that I described here provides the greatest benefit when doing transformations in the application layer. I realize that my example code was simplistic and did not show a transformation but this technique is more focused on the "sending" of the data than the "receiving" of it. And in these cases, bulk insert/copy operations do not address the issue of doing anything more than straight reads of the source data.

    2) As far as using TVPs in general goes, regardless of the full-streaming feature when using IEnumerable they do, in my opinion, provide a much cleaner interface than using a staging or loading table. True, using them is not compatible with prior versions of SQL Server but many people do not need such compatibility. Unless there is a specific reason not to do so I believe that you should use all features available in a product. I have mainly worked at Software as a Service companies where we host the application and do not need to worry about what version of the DB (or even what vendor) a customer might have.

    Take care,

    Solomon...

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

  • From article: - "Prior to SQL Server 2008 the only way to send data to the database was to send it in one row at a time"

    Really misleading statement. You can send XML with bunch of rows and translate it into a table inside of stored procedure. I were doing it for years.

    But still, good article.

    LP.

  • I do the XML thing also to the stored procedure... But I am curious to know if there is any performance gain by using the table approach vs the xml approach... Also, will the table approach work with other languages than .net like php, etc...

  • Also, will the table approach work with other languages than .net like php, etc...

    This Approach will most likely also work with PHP (Have a look at: http://php.benscom.com/manual/en/class.dotnet.php). .Net is a runtime environment which supports various languages and not a language itself.

  • Looking at the site you mentioned that would make me think it would have to be PHP on a windows IIS Machine... What if it's LINUX or a variation thereof hence no windows dll's... Will this table call method, still be available to PHP??

  • Great job, Solomon!

    That's definitely a feature which was missing in my opinion.

    I think BCP is a very powerful and nice tool but it is much less flexible. XML is a way I use since now but it produces a large overhead. Anyway, the performance of XML vs. TVPs would be really interesting :-).

    Some little things (maybe optimizations) about your C# code:

    General )

    Since only the "Normal" approach requires a DataTable all other methods could be optimized by using a simple object[] and a List<> .

    OldSchoolImport )

    Well, creating 3,000,000 instances of a SqlCommand object instead of creating just one, prepare it and just bind the new values should be more realistic usage 😉

    Thanks for this work

    Flo

  • What if it's LINUX or a variation thereof hence no windows dll's...

    You could try it with Mono but i'm not sure if it will work because of the referenced Microsoft.SqlServer-Assembly which might use PInvoke to native Windows-Specific DLLs.

    [EDIT]

    @Solomon: I think in the following part of the code is a small error which could cause an Exception:

    try

    {

    using (dbConnection = new SqlConnection("Server=localhost;Database=Test;Trusted_Connection=True;"))

    {

    dbConnection.Open();

    SqlCommand importProc = new SqlCommand("CHECKPOINT ; TRUNCATE TABLE dbo.ImportTest ; DBCC FREEPROCCACHE ; DBCC FREESYSTEMCACHE('ALL') ; CHECKPOINT ;", dbConnection);

    importProc.CommandType = CommandType.Text;

    Console.WriteLine("{0} -- Truncating the import table, clearing the caches, etc...", DateTime.Now);

    importProc.ExecuteNonQuery();

    Console.WriteLine("{0} -- Done setting up for the test", DateTime.Now);

    }

    }

    catch (Exception exception)

    {

    Console.WriteLine("Error: {0}", exception.Message);

    }

    finally

    {

    dbConnection.Close();

    }

    If you wrap a object into a using scope it will automatically be Disposed (and therefore closed) if an exception is thrown. The call to the Close() method in your finally block will then cause another excetion because its called on a already disposed object.

  • The TVP feature is an improvement, yes. BUT you wrote dozens and dozens of lines of code that were only precompiled. The better path was is at the end of your article, "Large datasets that are inserted directly (i.e. not transformed) will likely be faster via T-SQL BULK INSERT, bcp.exe..."

    Here's the universally easy way to transform data

    1) dump it to ASCII text on the main frame, DB2, universe, oracle, informix or mySQL; parse the attributes with a character like tab or comma or whatever; end the lines with CRLF to be fully compatible with Windows.

    2) FTP pull/push the file to a disk on the SQL Server

    3) BULK INSERT or BCP the text into a staging table or a memory table like TVP (watch out for your tempdb if memory table)

    4) Transform the data in the staging table or the memory table

    5) dump the result into the destination

    This is the only sensible way to get data over a busy network into a SQL Server database. The rest just run slower and are buggier.

  • I think the main difference between "BCP/BULK INSERT" and TVPs is the context to use it.

    Since BCP requires special privileges it's not a feature for front-end applications. TVPs can be used by usual import processes or even GUIs which create larger counts of objects.

    Flo

  • Solomon Rutzky (5/28/2009)


    1) Using the IEnumerable streaming method that I described here provides the greatest benefit when doing transformations in the application layer. I realize that my example code was simplistic and did not show a transformation but this technique is more focused on the "sending" of the data than the "receiving" of it. And in these cases, bulk insert/copy operations do not address the issue of doing anything more than straight reads of the source data.

    That's not strictly true - you can do whatever transformation you like on the data that you give to SqlBulkCopy - you don't point it at source data, you give it data through a dataset - which you are able to transform before you put it in there...

    Solomon Rutzky (5/28/2009)


    2) As far as using TVPs in general goes, regardless of the full-streaming feature when using IEnumerable they do, in my opinion, provide a much cleaner interface than using a staging or loading table. True, using them is not compatible with prior versions of SQL Server but many people do not need such compatibility. Unless there is a specific reason not to do so I believe that you should use all features available in a product. I have mainly worked at Software as a Service companies where we host the application and do not need to worry about what version of the DB (or even what vendor) a customer might have.

    Well, I think I just disagree there. If something is compatible, it opens up flexibility. If i tell my boss 'well this hardware is perfectly fine but you need to use x thousand pounds to upgrade your license because i've chosen this implementation route' then that usually doesn't go down too well, in my experience... Personally, having a slightly cleaner interface wouldn't want to make me go and tell my boss that! 🙂

    Atlantis Interactive - SQL Server Tools
    My blog[/url]
    Why I wrote a sql query analyzer clone

  • This is a great article, Solomon. Thank you.

    I do have some comments about the C# code. As daniel.janz already pointed out,

    If you wrap a object into a using scope it will automatically be Disposed (and therefore closed) if an exception is thrown. The call to the Close() method in your finally block will then cause another excetion because its called on a already disposed object.

    I can add another point because it is amazing how many people consistently do it incorrectly. The SqlCommand inherits from DbCommand which in turn implements IDisposable interface. This means that a variable of SqlCommand reference type must be disposed, for example by enclosing it in the using block. The type does not implement Close() so wrapping it into using block will suffice. The using blocks exist entirely for our convenience, the compiler does not have a concept of using block and simply restates our code to try/finally whenever it finds using {}. It is a matter of pure luck based on the simple fact that the Microsoft's wrappers under SqlCommand are written so well that we can get away with forgetting to dispose it. If you needed to use OleDb connecting to something like Sybase (yes, I used the "S" word, sorry) then you will see how quickly the code would come to a screeching halt, so bad that the connection timeout related messages from the database engine will not even make it back to the C# code, which in turn will happily seat on the same line for days if you wish until someone will kill the process.

    The implementation of the using blocks in your sample code needs to be adjusted in all places related to both connection and stream reader. When the IDisposable type variable goes out of using scope, the framework calls its respective Dispose() method, which will attempt to call the Close() if present and then delegate the work to the protected Dispose(bool disposing) where disposing will be passed as true. This is by design is because some people opt to use the using blocks, some prefer to explicitly call Close() and then Dispose() and finally, there are people who unfortunately do nothing in which case the protected Dispose(bool disposing) will be called from destructor passing false as a disposing parameter though.

    In other words, if you remove your finally blocks, add respective calls to Close() methods, and wrap the SqlCommands into using blocks the code will be fine.

    The article is great, it is good to learn about alternative to bulk methods.

    Oleg

  • LP:

    Correct: I had not thought of the XML solution and so that would be an incorrect statement that I made about single inserts being the “only” way to load data. I will see about updating the article to be more accurate in that regards. Thanks for pointing that out. Really.


    Brian:

    Regarding being able to do this in PHP. I have not had time to check out the PHP or Mono (for *NIX) documentation, but if it supports enough basic .Net functionality, then it is possible since this does not need to be done via a SQL Server specific library. I used the ADO.Net method but from what I see it can be via ODBC using “data-at-execution” (whatever that is) or via OLE DB using a custom IRowset implementation.


    Florian:

    Thanks. And yes, I will try the single instance of the variable. I could have sworn I started with that and got an error but maybe I am misremembering. I will try it again.


    Daniel and Oleg :

    Thanks for pointing out the misusage of the “using” block. I thought that I originally had it the way that both of you recommended but had someone tell me to change it to what is currently shown. It is possible that I misunderstood that person. In either case I did miss it on the SqlCommand so thanks also, Oleg, for pointing that out.


    Don:

    I am not sure that I understand. It seems that extracting, FTPing, BCPing, Transforming, and finally loading into the destination seems a bit more complex than simply reading the data from a local source, transforming each row as necessary, and sending it to the server. This is fewer steps, fewer processes, very little memory usage, and a single connection over the network to the DB as opposed to the two steps in your five step process (FTP and BCP). So in using FTP the disk on the SQL Server has to work one time to receive the data and write it locally, then twice more to read the local source and write it to a SQL Server datafile. Then the disk has to work a fourth time to move the data from the “staging” table into the final destination table. In the scenario I mentioned, assuming the same setup with two servers, the disk on the SQL Server would only work one time as it puts the data it receives from the application into the final destination table. This would be 25% the amount of disk I/O (in simplistic terms) as compared to the FTP / BCP / Transform method.


    Matt:

    1)Yes, you can transform the data before it goes into the DataTable when using SqlBulkCopy, but you still need to populate that DataTable in memory before you can send it to the destination. In doing so it is fully persisted in memory whereas in the solution that I am giving an example of, only a single row at a time is in memory.

    2)I am not advocating that this feature alone is a “compelling” reason to upgrade to SQL Server 2008. If you work somewhere that does not have it or produces software that needs to work on various versions of various databases then by all means do what will work in most / all situations. BUT, what I AM saying is that if you already have SQL Server 2008 in-house then why not use a feature that is available to be used? That would not incur any more cost.

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply