Shredding XML into Parent/Child tables

Question

Shredding XML into Parent/Child tables

sblough

SSC Enthusiast

Points: 104
More actions
April 24, 2009 at 7:51 am

#384438

Newbie here,
Just joined and searched the posts briefly but couldn't find what I want...hopefully I can explain it well.
I need to store the contents of XML into a SQL Server 2005 database.
I originally had a "Parent" table with 4 "child" tables. The children were related to the parent by a uniqueidentifier field.
I used the following snippet in a stored procedure to populate the tables and it worked well.
NOTE: TEST_RESULTS = Parent and UUT_PROPERTIES = child
*****************************************************
DECLARE @newID uniqueidentifier
BEGIN
SET @newID = NEWID()
BEGIN TRY
-- insert the new TEST_RESULTS record
;WITH XMLNAMESPACES(DEFAULT 'http://tempuri.org/TestResults.xsd')
INSERT INTO dbo.TEST_RESULTS
SELECT @newID,
newTR.value('SystemID[1]', 'varchar(20)') SystemID,
newTR.value('DateTime[1]', 'datetime') DateTime,
newTR.value('TestPassed[1]', 'bit') TestPassed,
newTR.value('Name[1]', 'varchar(50)') Name,
newTR.value('ProcedureNumber[1]', 'varchar(20)') ProcedureNumber,
newTR.value('ProcedureRevision[1]', 'varchar(10)') ProcedureRevision,
newTR.value('TestSWVersion[1]', 'varchar(10)') TestSoftwareVersion,
newTR.value('FailedStep[1]', 'varchar(50)') FailedStep,
newTR.value('Comments[1]', 'varchar(MAX)') Comments
FROM @testData.nodes('/TestResults[1]') node(newTR)
-- insert any available UUT_PROPERTIES records
;WITH XMLNAMESPACES(DEFAULT 'http://tempuri.org/TestResults.xsd')
INSERT INTO dbo.UUT_PROPERTIES
SELECT @newID,
newUUTProp.value('Name[1]', 'varchar(50)') Name,
newUUTProp.value('Value[1]', 'varchar(255)') Value
FROM @testData.nodes('/TestResults/UUTProperty') node(newUUTProp)
*****************************************************
XML snippet:
{?xml version="1.0"?}
{TestResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://tempuri.org/TestResults.xsd"}
{SystemID}EQ-01486{/SystemID}
{DateTime}2009-04-23T14:22:27Z{/DateTime}
{TestPassed}false{/TestPassed}
{Name}Stellant DCU Module Test{/Name}
{ProcedureNumber}999999{/ProcedureNumber}
{ProcedureRevision}Z{/ProcedureRevision}
{TestSWVersion}1.0.0{/TestSWVersion}
{FailureReason xsi:nil="true" /}
{Comments xsi:nil="true" /}
{UUTProperty}
{Name}SRU Software Version{/Name}
{Value}004.001_sh_hu{/Value}
{/UUTProperty}
{/TestResults}
********************************************************
Where I'm getting lost is:
I modified the XML and SQL tables. There is now a new table (UUT_DATA_ELEMENTS) between the old parent and children. The children our now linked to the new table with another uniqueidentifier.
new XML snippet:
{?xml version="1.0"?}
{TestResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="http://tempuri.org/TestResults.xsd"}
{SystemID}EQ-01486{/SystemID}
{DateTime}2009-04-23T14:22:27Z{/DateTime}
{TestPassed}false{/TestPassed}
{Name}Stellant DCU Module Test{/Name}
{ProcedureNumber}999999{/ProcedureNumber}
{ProcedureRevision}Z{/ProcedureRevision}
{TestSWVersion}1.0.0{/TestSWVersion}
{FailureReason xsi:nil="true" /}
{Comments xsi:nil="true" /}
{UUTDataElement}
{Name}SRU Software Version{/Name}
{Type}UUTProperty{/Type}
{Order}1{/Order}
{UUTProperty}
{Value}004.001_sh_hu{/Value}
{/UUTProperty}
{/UUTDataElement}
{/TestResults}
************************************************
I now need to process each UUTDataElement node and depending on the value of Type create the appropriate child record using the uniqueidentifer I will create for each UUTDataElement.
Any help pointing me in the right direction would be appreciated.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

Take a look at the "Output Clause" in Books Online. It allows you to grab the UniqueIDs (or whatever else) from data you insert into a table, store it in a temp table, and then use that later in the script/proc.

Does that help?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

sblough SSC Enthusiast Points: 104 More actions · Answer 2

Thanks GSquared,

I'll look at the OUTPUT clause but what I really need help with is how do I iterate through each UUTDataElement node (like a for...each). If I can do this then I can create me new uniqueidentifier for that record and us it when adding my child node (UUTProperty) depending on the chile Type.

My experience so far with XML has been dumping all nodes of a particlar type into a table all at once. Now I have to add parent...add child(with parent id), add parent...add child(with parent id), add parent...add child(with parent id).

GSquared SSC Guru Points: 260824 More actions · Answer 3

If you attach a text file with a sample of the XML, and post the create scripts for the tables, I'll see if I can help you get what you need.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

sblough SSC Enthusiast Points: 104 More actions · Answer 4

GSquared,

Attached is a script to create the tables and sample xml that has data to populate each table. The trouble I'm having is with the UUTDataElements parent node and UUTproperty, UUTMeasurement, UUTPassFailTest and UUTStringCompTest child nodes.

Thanks in advance and let me know if oyu need anything else. Also, if you see naything else that I'm way off on let me know, I'm new to XML so I'v been doing alot of this by trial and error.

Steve

sblough SSC Enthusiast Points: 104 More actions · Answer 5

GSquared,

I just realized I forgot to take the name field out of the UUTproperty, UUTMeasurement, UUTPassFailTest and UUTStringCompTest equivalent database tables. The name is now in the UUTDataElement node. Maybe this was my issue the whole time! You should be fine if you just delete the name field from the four db tables. I can send a new script if you like.

Steve

GSquared SSC Guru Points: 260824 More actions · Answer 6

Take a look at what I've attached. Test it, see if you can extend it out to include your other tables, or if I have any sort of critical logic failure in it.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

sblough SSC Enthusiast Points: 104 More actions · Answer 7

I think the only error in logic was my fault. You must not have seen my last post. I mistakenly sent you a DB script without the name field being removed from the child tables but you are joining on Name. I'll play with it a little to see if I can at least insert the UUTDataElelemtn records.

Thanks for all the help.

GSquared SSC Guru Points: 260824 More actions · Answer 8

Nope. I took the name thing into account. It's only being used at the DataElements level, which is where you said it belongs.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

sblough SSC Enthusiast Points: 104 More actions · Answer 9

GSquared,

Awesome! Works great!

I assume I just have to add three more sections similar to the UUTProperty section to get my other tables populated.

Thanks for all your help. After your initial suggestion to look at the output clause I was able to get the UUTDataElement records in but it would have talken me a week to figure out the child tables.

on a side note...

This is for a .NET library that my development group will use in our test applications (we write test applications to test medical devices). The idea was to use the libray to maintain an object that holds test result data. When complete the test result data can either be saved to file, written to a database or reported on paper. I decided to use XML so that it could be saved directly to file, transformed via XSLT to HTML and printed and the stored procedure you helped me with to save to the DB.

My main reason doing it this way was the test system will not always be onsite connected to a DB but we still want to collect data. XML seemed like a reasonable approach since the amount of data will not be very large (I'm guessing 50 data points at most) Does this seem like a reasonble approach to you?

Again thanks for the help! You saved me a lot of time.

GSquared SSC Guru Points: 260824 More actions · Answer 10

Since that's pretty much what XML exists for, it does seem like a reasonable approach.

Of course, that raises the question of, "why bother shredding the XML into relational tables in the first place?" Would it work just as well to store the XML in an XML-typed column, and just query that with XQuery when you need the details? Or is there a reason to normalize it and make it relational?

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

sblough SSC Enthusiast Points: 104 More actions · Answer 11

I guess since we have other relational databases I just assumed that was the way to go. Once the data is stored in the DB it will be queried for process control, failure analysis, etc.

Is there an advantage to leaving it in XML vs relational for queries? One thing to note is we have users that are used to querying relational databases. usually via Excel with a script to retrieve the data. I assume leaving in XML will require either custom procedures for specific reports or a whole lot of training to query the XML.

GSquared SSC Guru Points: 260824 More actions · Answer 12

The main advantage to leaving it as XML is that it's less work, both for you and for the server. It has a secondary advantage in that minor schema changes don't need to be built into the database, they can just be done in the XML.

The disadvantage is querying, but a good view or two can often solve that.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

sblough SSC Enthusiast Points: 104 More actions · Answer 13

Gsquared,

Just wanted to thank you for all the help. Everything is working like a charm!

One related question I thought you might have an opinion on:

Instead of using Identity columns a peer recommended using the GUIDs. The more I'm reading on them has me wondering if I should use Identity instead or if there is a difference either way. This is going to be one database. No replication from other servers. Do you have an opinion on when / when not to use the GUIDs? I've been reading various opinions. Seems like the DBAs are concerned about performance issues but the developers love the flexibility of know the key beforehand.

GSquared SSC Guru Points: 260824 More actions · Answer 14

There are advantages and disadvantages to each.

First, neither one is a "natural key", so they're the same on that point.

Second, GUIDs take more disk space, RAM and bandwidth than Int values. Doesn't matter on some systems, does on others.

Third, GUIDs, unless you generate them in the database with NewSequentialID(), are non-sequential, which means they make a horrible clustered index candidate, while IDs are, by definition, sequential, so they make a better clustered index. Since the clustered index can make a big difference in table performance, this matters. If you cluster on a "CreatedDate" type column, then client-created GUIDs are okay on this point, but it does mean your primary key and your clustered index will be separate, and that can again increase disk/RAM/bandwidth usage.

Fourth, GUIDs can be generated in the client code, which has advantages for devs. It means one less trip of data to-from the server, since they don't have to insert into the table and use something like Scope_Identity to get the ID value. That means the pages/screens can be just that much faster, and a little bit easier to build. It's not a big advantage, it's a small one, but if the web server or application server is more overloaded than the database server (not common, but can happen), it can be worth it.

Generally, the database server is the heaviest bottleneck on performance, so using GUIDs and the effects that can have on IO and table/index fragmentation, is generally a poor tradeoff. But that's certainly not an absolute.

Really, the only way to tell which will be better for you, is fire up a test harness for each, and do load testing to see which performs better over a significant amount of time and transactions.

Usually, it's going to be IDs, but every now and again, it'll be GUIDs.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon