Sending multiple rows to the Database from an Application: Part II

  • dewit.john (10/15/2009)


    I have tackled this problem before. I retrieve the XML data directly from the ECB site, within a web application written in C#. The XML data is given as a string when calling the strored procedure. The XML data is then converted to XML again in the SP.

    I have used this approach in the SP:

    set @xmlData = @xmlString

    -- N.B. Not testing the XML against an XSD !

    declare @ECBRates table ( ident int IDENTITY(1,1) NOT NULL

    , dt nvarchar(20)

    , currency nvarchar(4)

    , rate money )

    ;WITH XMLNAMESPACES (DEFAULT 'http://www.ecb.int/vocabulary/2002-08-01/eurofxref')

    insert into @ECBRates (dt, currency, rate)

    SELECT

    tab.col.value('../@time', 'nvarchar(20)') As Date,

    tab.col.value('./@currency', 'nvarchar(4)') As Currency,

    tab.col.value('./@rate','float') As Rate

    FROM @xmlData.nodes('//Cube[not(*)]') As tab(col)

    ORDER BY 2,1 desc

    Now I have the data in a table I can process it the normal T-SQL way.

    I forgot to mention that I often call stored procedures form C#. Passing massive amounts of data in the form of XML to the database means just 1 call, 1 connection. The .Net overhead is reduced this way, i.e. compared to sifting through the XML and doing multiple inserts (= multiple conections). Also transaction processing within a SP is so much easier than doing it in the C# .Net environment.

    Good approach to tackle the problem and I have tried it. But I got performance issue when I have a large amount of data and trying to INSERT into @temp table (taking a lot of memory at the server).

  • It would be interesting to do a large performance test and find out if "performance" using this method belongs in the "Pro" or "Con" column.

    I also concern myself with such things as the "pipe". For example, the code (without the ROWS tag pair just to be fair) from the article takes 136 characters to depict 3 people by first name and salary...

    136 characters with spaces

    <ROW Name="Richard" Salary="1100"/> <ROW Name="Cliff" Salary="1200"/> <ROW Name="Donna" Salary="13000"/> <ROW Name="Ann" Salary="1500"/>

    As a tab delimited parameter, it only takes 37 characters...

    Richard1100Cliff1200Donna13000Ann1500

    Doing the math of (136-37)/37 we come up with a ratio of about 2.7 (and this is the simpler of the two forms of XML). Put another way, you can pass 2.7 times more traffic in the same amount of time using TAB delimited data than you can even with such simple XML. Also, it's been demonstrated many times on many threads on SSC that shredding XML is quite a bit slower than splitting TAB delimited parameters even if you use a WHILE loop to split one of the "blob" datatypes (which can also hold up to 2 Gig). It all get's even slower if you want to correctly shred XML bearing special characters. It's also much easier to form TAB delimited parameters on the GUI side of the house than it is XML (as stated in the article).

    Although the article was well written, I see no redeeming qualities to passing XML parameters.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The easiest way to do that is by using the user-defined table type feature in SQL. Parameters now may be of type 'Table'. This makes it easier for a developer to use it either in SQL or in Visual Studio.

    Say you create the following SQL user-defined table type:

    CREATE TYPE dbo.typTableType AS TABLE

    (

    [Id] INT NOT NULL,

    [Description] VARCHAR(50) NOT NULL

    )

    If you use it in stored procedure, you have the following code:

    CREATE PROCEDURE dbo.usp_Insert

    @ParTable as typTableType READONLY

    AS

    BEGIN

    -- Inside the procedure you use that parameter as a regular table with a cursor

    -- or any SELECT, or in an UPDATE statement in the FROM clause, etc.

    -- SQL code...

    END

    The following is a Visual Basic example that uses the benefit of the table type:

    Let's assume you have defined somewhere in the code a datatable named datMyTable

    DIM conMyConnection as new SqlClient.SqlConnection("... connection string ...")

    DIM comMyCommand as new SQLClient.SqlCommand("dbo.usp_Insert", conMyConnection)

    DIM parMyTable as new SqlParameter("@ParTable", SqlDbType.Structured)

    parMyTable.TypeName = "dbo.typTableType"

    parMyTable.Value = datMyTable

    With comMyCommand

    .Parameters.Add(@ParTable)

    .CommandType = CommandType.StoredProcedure

    .Connection.Open

    .ExecuteNonQuery

    .Connection.Close

    End With

    ' rest of the code...

    Table type parameters must be declared as READONLY in the stored procedure's definition.

    I think this is an easier way to work with multiple records passed to SQL.

  • Hi Nizamuddin,

    Thank u for a good clarification regarding my doubt.is there any possibilities to keep data

    hot coded in any file other than xml.If i use xml how to find its path.

    plz clear me

  • Just for the completeness, there is another great way to send multiple rows to database from the client side and it's covered in many articles usually under the title "streaming data". I used this article "http://www.sqlservercentral.com/articles/SQL+Server+2008/66554/" to implement one with a great success. This may require more work on the client side but if the number of rows is very large (in my case it was around 100K), the speed of this method is on par with BULK solutions.

    Hope it helps,

    Sinan

  • All,

    About a year ago I did a full time testing on all of the available methods to get multiple rows into the database. The fastest method is to use .Net DataTables with stored procedures where you pass the DataTables directly into the stored procedures.

    This was closely followed by using SqlBulkCopy. The technique for this is here:

    http://www.dotnetcurry.com/(X(1)S(pzoixbxwt3xhw0u5qurqex0g))/ShowArticle.aspx?ID=323&AspxAutoDetectCookieSupport=1

    Batching up multiple inserts with semi colons and the new multi row insert statement I think were the next closest. However, beware that there is a maximum amount of rows that can be inserted at once with the new syntax.

    I think XML was the slowest due to the extra parsing involved. However, it was so long since I ran the test that this and the 2 mentioned in the previous paragraph could easily have swapped places.

    I didn't have a chance to analyze entity framework but from what I saw in SQL Profiler, it was creating a separate line in sql profiler for each row that needed to be updated.

    Regards,

  • While it might seem a great idea to use XML to pass large rowsets to the database in a single hit. It is certainly easy to write the Stored Procedures. In reality XML processing in SQL Server does not perform, often negating the benefits of reduced round trips.

    This is well documented elsewhere but

    XML processing creates large structures in memory

    sp_xml_preparedocument grabs a flat eighth of your server's memory that is not released until sp_xml_removedocument is called. This is a real problem in production.

    Is CPU intensive in both intensity and duration.

    Do not use a WHERE clause on your OPENXML, that is a non-sargeable query.

    Filter after transferring to your temp table or filter while constructing the XML.

    If you must use XML, access it only once and pass the results into a temporary table.

    Other options for mass data transfer include

    String parsing, with care, this performs.

    Many SP parameters, this is not pretty, but performs exceptionally well on the server side. The trick is to retain code maintainability by constructing your SP using metacode at upgrade time, you only need a couple of loops, one to create the variable sets and one to generate the UNION ALL statement to insert them into a temp table.

    The above is the result of long, and bitter, experience!

  • //Difficult to create XML data at the application layer.//

    I no longer use (strong) Datasets in my applications, EXCEPT I do use them as a way to create well formed xml.

    EmployeeDS ds = new EmployeeDS();

    ds.Employee.AddNewEmployeeRow("222222222", "John", "Smith");

    ds.Employee.AddNewEmployeeRow("333333333", "Mary", "Jones");

    string wellFormedXml = ds.GetXml();

    /*

    <EmployeeDS>

    <Employee>

    <SSN>222222222</SSN>

    <FirstName>John</FirstName>

    <LastName>Smitih</LastName>

    </Employee>

    <Employee>

    <SSN>333333333</SSN>

    <FirstName>Mary</FirstName>

    <LastName>Jones</LastName>

    </Employee>

    </EmployeeDS>

    */

    POCO to Strong DataSet Pseudo Code:

    EmployeeDS ds = new EmployeeDS();

    foreach (Employee pocoEmp in someEmployeeCollection)

    {

    ds.Employee.AddNewEmployeeRow(pocoEmp.SocialSecurityNumber, pocoEmp.FirstName , pocoEmp.LastName );

    }

    string wellFormedXml = ds.GetXml();

    I then pass that very well formed xml down to the stored procedure.

    I then write "converters" so I can take my POCO objects and then turn them into strong-datasets.

    This is basically the ONLY place I use the strong datasets.

    Take my POCO objects, at the last possible moment, convert to strong dataset, and ship them down to tsql-land.

    .......

    PLEASE be aware of this issue:

    http://connect.microsoft.com/SQLServer/feedback/details/250407/insert-from-nodes-with-element-based-xml-has-poor-performance-on-sp2-with-x64

    I actually convert my element based xml to attribute based xml before sending to sql server if using sql server 2005.

    ............

    This method has been around for a while (I'm not knocking the article, this method NEEDS to be advertised more)

    http://support.microsoft.com/kb/315968

    ........

    Here is one performance benefit not discussed very often.

    When you bulk insert and/or update ....... ( or merge/upsert) ........ the indexes only need to be rebuilt ONE time (or two times if doing 2 calls for insert/update instead of one by one (RBAR as JeffM calls it ( http://www.sqlservercentral.com/Forums/Topic642789-338-1.aspx#bm643053 )).

    Again, this is a HUGE deal sometimes.

    ...........

    My advice:

    Use .nodes instead of OPENXML.

    Do your filtering on the DotNet side of things. Allow the xml you send to the stored procedure to be "perfect Xml" where no business decisions have to be made. Just xml-shred it and CRUD it. And then get out.

    Make sure you test for performance the element based xml if you're using 2005 (and maybe 2008?) Convert to attribute based if you experience issues.

    Use strong datasets to create well formed xml. (Hint, remove the default namespace ("Tempuri") of the dataset to make your life easier).

    If you have multiple entity updates going into the xml....(lets say 3 for this discussion)

    1. Create 3 @variable and/or 3 #temp tables.

    2. Shred all the xml into the 3 @variable and/or #temp tables.

    3. After all the shredding is done, then do a BEGIN TRAN.

    4. insert/update (merge/upsert) from the 3 @variable and/or #temp tables.

    5. COMMIT TRAN

    @variable and/or #temp tables ?? You have to test to see which works better. From personal experience, there is no blanket rule statement.

    I usually start with @variable, but if I see issues, I experiment with #temp tables.

    Advantages:

    You can code 1 tsql stored procedure to handle insert/updates (or merge/upserts) .... for 1 or 10 or 100 or more Entity updates.

    Your signature never changes, you just pass in @xmlDoc (as xml or ntext in older sql server versions)

    INDEXES are rebuilt after the batch insert/update (merge/upsert) and NOT ROW BY ROW. (<<That my friend is worth the price of admission alone into this club)

    Disadvantages: You will lose a little bit of performance by not sending in a bunch of scalar values.

    MOST TIMES THIS IS NOT A FACTOR, do NOT use this as a justification for avoiding this approach.

    You'll have to learn something new. Most people like learning something new. Some people would rather RBAR because that's what they've done for years and years.

    .................

    Last advice:

    Have a test database with 1,000,000 rows in it to test against. (example, if you need to update dbo.Employee rows........put 1,000,000 employee rows in the table.

    Then test your RBAR vs Bulk(Set based) methods. Set-based wins every time.

    Don't "prove" your RBAR performance in a table with 10 rows in it.

  • I'm putting this in a separate post so it doesn't get buried.

    For those of you who have conducted tests for various ways to do this......this article could be the reason you got mixed results.

    PLEASE be aware of this performance issue with element based xml:

    http://connect.microsoft.com/SQLServer/feedback/details/250407/insert-from-nodes-with-element-based-xml-has-poor-performance-on-sp2-with-x64

    (Thanks Erland for finding it)

    I actually convert my element based xml to attribute based xml before sending to sql server if using sql server 2005.

    ...I haven't done many 2008 tests to see if the issue was actually ever resolved. Erland makes a few comments at the post.

  • Very useful article detailing the methods we can use for manipulating several data rows from an application.

    On SQL 2008 I'd also mention the table valued parameters, which can save a lot of work...

  • We take a different approach not mentioned in these articles, and it seems to work very well for our always-connected client/server apps.

    We do a lot of large/wide data set processing in stored procedures, and passing huge text or XML files from the UI did not seem to be the best approach.

    Rather than using XML or table functions, we use temp tables. From within the application we create/append data to a temp table with a unique identifier for that "batch". Then we simply call a stored procedure, passing the batch_id, and the multi-row data set is available for use by the proc for whatever processing needs to be performed.

    The temp table is SPID specific, so there is no risk of data collision between users, nor between multiple processes for the same user if multi-threading (because of the batch_id). Once the proc is done with the data (which is almost always the case), it simply purges that data from the temp table. If the temp table become empty, it is dropped all together, so the overhead on the server is very low.

    Yes, the initial loading of the temp tables is performed row-by-row behind the scenes, but that is achieved with a single line of code from the app.

  • I am also using this same method in several places. This is great for when you want all the records to succeed or none. I would rather control the transaction rollback in a procedure than let the UI.

    This is a great article. Thanks for posting.

  • Disregarding that send multiple elements at once is questionable design, you con statements regard XML show your lack of comfort with XML.

    Using OpenXML or XQuery is just as obvious and natural as T-Sql once one spends equivalent the time using.

    Creating XML is superbly easy with .NET languages which have built in methods for dataset. writeXML.

    Your article shows a lack of time spent working with XML.

  • In fat that's how I worked - I built the XML in .NET, validated against a schema collection and passing it to a stored procedure for insert / update.

  • I can see immediate usefulness in a project where I intend to use it today: I have large tables in Access database spread all over the country. Ancient app, many users. Not going away. But...must centralize the data for control and for management review and reporting. There's more involved obviously, but this little gem of an article shows me how I can organize my transmission into 5K chunks and send them to a stored procedure in SS that dumps them in a holding table - no processing except landing in the dump table. Later, I can follow up on SS with scheduled job to process rows into proper tables.

    User in Access has minimum transmission time (no waiting for inserts on server end). Data gets centralized (yes, GUIDs are added at transmission source and used throughout from then on).

    Fantastic article. Exactly what I needed at exactly the right moment.

    The collection into XML cost about 20 lines, including the routine to convert a recordset data row into an xml node.

Viewing 15 posts - 31 through 45 (of 52 total)

You must be logged in to reply to this topic. Login to reply