RE: Sending multiple rows to the Database from an Application: Part II

sholliday

SSCrazy

Points: 2100

October 22, 2010 at 3:50 am

#1240791

//Difficult to create XML data at the application layer.//

I no longer use (strong) Datasets in my applications, EXCEPT I do use them as a way to create well formed xml.

EmployeeDS ds = new EmployeeDS();

ds.Employee.AddNewEmployeeRow("222222222", "John", "Smith");

ds.Employee.AddNewEmployeeRow("333333333", "Mary", "Jones");

string wellFormedXml = ds.GetXml();

<LastName>Smitih</LastName>

</Employee>

<LastName>Jones</LastName>

</Employee>

</EmployeeDS>

POCO to Strong DataSet Pseudo Code:

EmployeeDS ds = new EmployeeDS();

foreach (Employee pocoEmp in someEmployeeCollection)

{

ds.Employee.AddNewEmployeeRow(pocoEmp.SocialSecurityNumber, pocoEmp.FirstName , pocoEmp.LastName );

}

string wellFormedXml = ds.GetXml();

I then pass that very well formed xml down to the stored procedure.

I then write "converters" so I can take my POCO objects and then turn them into strong-datasets.

This is basically the ONLY place I use the strong datasets.

Take my POCO objects, at the last possible moment, convert to strong dataset, and ship them down to tsql-land.

.......

PLEASE be aware of this issue:

http://connect.microsoft.com/SQLServer/feedback/details/250407/insert-from-nodes-with-element-based-xml-has-poor-performance-on-sp2-with-x64

I actually convert my element based xml to attribute based xml before sending to sql server if using sql server 2005.

............

This method has been around for a while (I'm not knocking the article, this method NEEDS to be advertised more)

http://support.microsoft.com/kb/315968

........

Here is one performance benefit not discussed very often.

When you bulk insert and/or update ....... ( or merge/upsert) ........ the indexes only need to be rebuilt ONE time (or two times if doing 2 calls for insert/update instead of one by one (RBAR as JeffM calls it ( http://www.sqlservercentral.com/Forums/Topic642789-338-1.aspx#bm643053 )).

Again, this is a HUGE deal sometimes.

...........

My advice:

Use .nodes instead of OPENXML.

Do your filtering on the DotNet side of things. Allow the xml you send to the stored procedure to be "perfect Xml" where no business decisions have to be made. Just xml-shred it and CRUD it. And then get out.

Make sure you test for performance the element based xml if you're using 2005 (and maybe 2008?) Convert to attribute based if you experience issues.

Use strong datasets to create well formed xml. (Hint, remove the default namespace ("Tempuri") of the dataset to make your life easier).

If you have multiple entity updates going into the xml....(lets say 3 for this discussion)

1. Create 3 @variable and/or 3 #temp tables.

2. Shred all the xml into the 3 @variable and/or #temp tables.

3. After all the shredding is done, then do a BEGIN TRAN.

4. insert/update (merge/upsert) from the 3 @variable and/or #temp tables.

5. COMMIT TRAN

@variable and/or #temp tables ?? You have to test to see which works better. From personal experience, there is no blanket rule statement.

I usually start with @variable, but if I see issues, I experiment with #temp tables.

Advantages:

You can code 1 tsql stored procedure to handle insert/updates (or merge/upserts) .... for 1 or 10 or 100 or more Entity updates.

Your signature never changes, you just pass in @xmlDoc (as xml or ntext in older sql server versions)

INDEXES are rebuilt after the batch insert/update (merge/upsert) and NOT ROW BY ROW. (<<That my friend is worth the price of admission alone into this club)

Disadvantages: You will lose a little bit of performance by not sending in a bunch of scalar values.

MOST TIMES THIS IS NOT A FACTOR, do NOT use this as a justification for avoiding this approach.

You'll have to learn something new. Most people like learning something new. Some people would rather RBAR because that's what they've done for years and years.

.................

Last advice:

Have a test database with 1,000,000 rows in it to test against. (example, if you need to update dbo.Employee rows........put 1,000,000 employee rows in the table.

Then test your RBAR vs Bulk(Set based) methods. Set-based wins every time.

Don't "prove" your RBAR performance in a table with 10 rows in it.