February 24, 2011 at 12:59 pm
I'm new to using XML with SQL and I've been given a XML file that doesn't fit the normal examples, that I've read, for importing.
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<wcsxml allrelations="false" batch="1" createtime="12/29/2010 11:21:30" delimiter="|" destination="MAS" source="145" version="4.04" whslrno="32711">
<app id="MOB" />
<app id="IBS" />
<relation allrows="true" columns="CustomerID|CustomerGroupID|InvoiceNum|InvoiceDate|InvoiceAmt" name="account" />
<relation allrows="true" columns="GroupCode|GroupDesc|SequenceNum" name="group">
<row values="10|MINIMUM 34 YRS|1" />
<row values="11|ENTRY LEVEL/CA|2" />
<row values="12|28-34 YEARS|3" />
<row values="20|35+ YEARS|4" />
<row values="21|35-49 YEARS|5" />
<row values="22|50+ YEARS|6" />
<row values="30|AGE NOT IDENT.|7" />
</relation>
<relation allrows="true" columns="ContentCode|ContentDesc|NonFlag|SequenceNbr" name="content">
<row values="AP000| .0 Percent|Y|1" />
</relation>
</wcsxml>
Any help on getting me started would be greatly appreciated.
Mark
February 24, 2011 at 1:19 pm
Are you familiar with the T-SQL version of XQuery?
With that, it's pretty easy to shred this XML.
This should get you started:
DECLARE @XML XML = '<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<wcsxml allrelations="false" batch="1" createtime="12/29/2010 11:21:30" delimiter="|" destination="MAS" source="145" version="4.04" whslrno="32711">
<app id="MOB" />
<app id="IBS" />
<relation allrows="true" columns="CustomerID|CustomerGroupID|InvoiceNum|InvoiceDate|InvoiceAmt" name="account" />
<relation allrows="true" columns="GroupCode|GroupDesc|SequenceNum" name="group">
<row values="10|MINIMUM 34 YRS|1" />
<row values="11|ENTRY LEVEL/CA|2" />
<row values="12|28-34 YEARS|3" />
<row values="20|35+ YEARS|4" />
<row values="21|35-49 YEARS|5" />
<row values="22|50+ YEARS|6" />
<row values="30|AGE NOT IDENT.|7" />
</relation>
<relation allrows="true" columns="ContentCode|ContentDesc|NonFlag|SequenceNbr" name="content">
<row values="AP000| .0 Percent|Y|1" />
</relation>
</wcsxml>';
SELECT X.Y.query('.')
FROM @XML.nodes('/wcsxml/app') X(Y);
SELECT X.Y.query('.')
FROM @XML.nodes('/wcsxml/relation') X(Y);
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 25, 2011 at 11:14 am
Awesome! Thank you for your assistance. I'm now able to shred table names out of that mess by modifying it a little so that it looks like:
SELECT T.C.value('@name','VARCHAR(64)')
FROM @XML.nodes('/wcsxml/relation') AS T(C)
I can go even further and use:
SELECT T.C.value('@name','VARCHAR(64)')
FROM @XML.nodes('/wcsxml/relation[1]') AS T(C)
This pulls out a single table name.
Now comes my last problem; the column names and values. I've read up on tsql xquery and have played around a bit in an attempt to shred out columns and values, but I've been unsuccessful so far.
Could I trouble you as to a hint on how to shred out the delimited column names? I've come up with a rather clunky way of doing it by use of a loop and parsing each line, but this is VERY slow. :crazy:
February 25, 2011 at 11:46 am
Is this what you're looking for?
SELECT T.C.value('@name','VARCHAR(64)'),
T.C.value('local-name(.)','VARCHAR(64)')
FROM @XML.nodes('/wcsxml/relation') AS T(C)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 25, 2011 at 3:06 pm
Unfortunately no. I was able to get that far, but the issue is how the file is designed. Forgive me if you already have figured out the following.
The columns are identified in a pipe-delimited format within the relation tag.
For instance the line:
<relation allrows="true" columns="ContentCode|ContentDesc|NonFlag|SequenceNbr" name="content">
This indicates the table name is "content" and the columns are, ContentCode, ContentDesc, NonFlag, and SequenceNbr.
The row tags indicated a pipe-delimited value set for those columns.
<row values="AP000| .0 Percent|Y|1" />
The optimal way it should have been designed is:
<content>
<ContentCode>AP000</ContentCode>
<ContentDesc> .0 Percent</ContentDesc>
<NonFlag>Y</NonFlag>
<SequenceNbr>1</SequenceNbr>
</content>
I now have the un-enviable task of figuring out how to parse the delimters and import it correctly. As I said in my earlier email, I have this working through a loop, but it's slow as the actual file is much larger than my example.
Mark
February 25, 2011 at 3:36 pm
Here's my approach to shred the xml file:
I'd apply a fast split string function to the shredded data. It might help performance to insert the shredded data into a staging table before applying the split function. Needs to be tested and verified.
; WITH cte AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT 1) ) AS row,
X.Y.value('@columns[1]','varchar(200)') AS col,
U.V.value('@values[1]','varchar(200)') AS val
FROM @XML.nodes('/wcsxml/relation') X(Y)
CROSS APPLY X.Y.nodes('row') U(V)
)
SELECT cc.row, cols.item AS col, vals.item AS val
FROM cte cc
CROSS APPLY dbo.DelimitedSplit8K(col,'|') cols
CROSS APPLY dbo.DelimitedSplit8K(val,'|') vals
WHERE cols.ItemNumber = vals.ItemNumber
Once you have the data shredded as shown you could either use a CrossTab query (as described in the related link in my signature) if you know the column names or use a DynamicCrossTab (also referenced in my signature).
Depending on the data volume it might help to store the split results into another (indexed) temp table. Again, this needs to be tested.
Regarding the DelimitedSplit8K function: please search this site for "DelimitedSplit8K" and you'll find the related code. see the last link in my signature (totally forgot about it - shame on me...)
February 25, 2011 at 4:05 pm
I think This will get you closer
SELECT X.Y.value('(../@name)[1]','varchar(20)') as TableName
,x.y.value('(../@columns)[1]','varchar(128)') as ColumnList
,x.y.value('(@values)[1]','varchar(128)') as ColumnValues
FROM @XML.nodes('//relation/row') X(Y);
I'm still working through it, but my thought was to use the pivot and a string shred to normalize the data.
This will create two columns with | delimited lists
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
February 25, 2011 at 4:07 pm
LutzM (2/25/2011)
Here's my approach to shred the xml file:I'd apply a fast split string function to the shredded data. It might help performance to insert the shredded data into a staging table before applying the split function. Needs to be tested and verified.
; WITH cte AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT 1) ) AS row,
X.Y.value('@columns[1]','varchar(200)') AS col,
U.V.value('@values[1]','varchar(200)') AS val
FROM @XML.nodes('/wcsxml/relation') X(Y)
CROSS APPLY X.Y.nodes('row') U(V)
)
SELECT cc.row, cols.item AS col, vals.item AS val
FROM cte cc
CROSS APPLY dbo.DelimitedSplit8K(col,'|') cols
CROSS APPLY dbo.DelimitedSplit8K(val,'|') vals
WHERE cols.ItemNumber = vals.ItemNumber
Once you have the data shredded as shown you could either use a CrossTab query (as described in the related link in my signature) if you know the column names or use a DynamicCrossTab (also referenced in my signature).
Depending on the data volume it might help to store the split results into another (indexed) temp table. Again, this needs to be tested.
Regarding the DelimitedSplit8K function:
please search this site for "DelimitedSplit8K" and you'll find the related code.see the last link in my signature (totally forgot about it - shame on me...)
Nix my response - didn't see Lutz's answer while working on a solution.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
February 25, 2011 at 5:00 pm
Thanks for the help! This looks promising and I'll play with it some more over the weekend. This has become less of a task and more of a personal challenge now!
February 25, 2011 at 5:10 pm
I played with it some more. Using combo from what lutz and I did already.
; WITH cte AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT 1) ) AS row,
X.Y.value('(@name)[1]','varchar(20)') as TableName,
X.Y.value('@columns[1]','varchar(200)') AS col,
U.V.value('@values[1]','varchar(200)') AS val
FROM @XML.nodes('//relation') X(Y)
CROSS APPLY X.Y.nodes('row') U(V)
), split as (
SELECT cc.row,cc.TableName as TableName, cols.item AS col, vals.item AS val
FROM cte cc
CROSS APPLY admin.dbo.DelimitedSplit8K(col,'|') cols
CROSS APPLY admin.dbo.DelimitedSplit8K(val,'|') vals
WHERE cols.ItemNumber = vals.ItemNumber
)
Select (TableName), GroupCode,GroupDesc,SequenceNum,ContentCode,ContentDesc,NonFlag,SequenceNbr
From (Select TableName, val,row,col
From split) as SourceTable
Pivot
( min(val) for Col in (GroupCode,GroupDesc,SequenceNum,ContentCode,ContentDesc,NonFlag,SequenceNbr)
) as Pivt;
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
February 26, 2011 at 2:19 am
CirquedeSQLeil (2/25/2011)
I played with it some more. Using combo from what lutz and I did already.
; WITH cte AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT 1) ) AS row,
X.Y.value('(@name)[1]','varchar(20)') as TableName,
X.Y.value('@columns[1]','varchar(200)') AS col,
U.V.value('@values[1]','varchar(200)') AS val
FROM @XML.nodes('//relation') X(Y)
CROSS APPLY X.Y.nodes('row') U(V)
), split as (
SELECT cc.row,cc.TableName as TableName, cols.item AS col, vals.item AS val
FROM cte cc
CROSS APPLY admin.dbo.DelimitedSplit8K(col,'|') cols
CROSS APPLY admin.dbo.DelimitedSplit8K(val,'|') vals
WHERE cols.ItemNumber = vals.ItemNumber
)
Select (TableName), GroupCode,GroupDesc,SequenceNum,ContentCode,ContentDesc,NonFlag,SequenceNbr
From (Select TableName, val,row,col
From split) as SourceTable
Pivot
( min(val) for Col in (GroupCode,GroupDesc,SequenceNum,ContentCode,ContentDesc,NonFlag,SequenceNbr)
) as Pivt;
I would have used CrossTab instead of Pivot, but the concept would have been the same.
Nice job, Jason!
As long as yon know the column names you'll need to deal with the solution Jason posted will work just fine. But if you need to deal with unknown column names you might want to consider using the CrossTab method. I, personally, think it's easier to convert at static CrossTab into a dynamic version than doing the same with the PIVOT operator (I'm not saying it's impossible. I'm just not used to it. And since there is very little -if any- benefit when using Pivot I've decided to stay with CrossTab...)
February 26, 2011 at 9:31 am
LutzM (2/26/2011)
CirquedeSQLeil (2/25/2011)
I played with it some more. Using combo from what lutz and I did already.
; WITH cte AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY (SELECT 1) ) AS row,
X.Y.value('(@name)[1]','varchar(20)') as TableName,
X.Y.value('@columns[1]','varchar(200)') AS col,
U.V.value('@values[1]','varchar(200)') AS val
FROM @XML.nodes('//relation') X(Y)
CROSS APPLY X.Y.nodes('row') U(V)
), split as (
SELECT cc.row,cc.TableName as TableName, cols.item AS col, vals.item AS val
FROM cte cc
CROSS APPLY admin.dbo.DelimitedSplit8K(col,'|') cols
CROSS APPLY admin.dbo.DelimitedSplit8K(val,'|') vals
WHERE cols.ItemNumber = vals.ItemNumber
)
Select (TableName), GroupCode,GroupDesc,SequenceNum,ContentCode,ContentDesc,NonFlag,SequenceNbr
From (Select TableName, val,row,col
From split) as SourceTable
Pivot
( min(val) for Col in (GroupCode,GroupDesc,SequenceNum,ContentCode,ContentDesc,NonFlag,SequenceNbr)
) as Pivt;
I would have used CrossTab instead of Pivot, but the concept would have been the same.
Nice job, Jason!
As long as yon know the column names you'll need to deal with the solution Jason posted will work just fine. But if you need to deal with unknown column names you might want to consider using the CrossTab method. I, personally, think it's easier to convert at static CrossTab into a dynamic version than doing the same with the PIVOT operator (I'm not saying it's impossible. I'm just not used to it. And since there is very little -if any- benefit when using Pivot I've decided to stay with CrossTab...)
I would recommend going that route too frankly. I just didn't have the time to figure out the dynamic crosstab at the moment. Both methods are topics I would need to practice more with to get more comfortable in them.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 1, 2011 at 1:16 pm
Thanks loads guys!!
The whole process still takes about 2 1/2 hours, but it's a whole lot better than before! Most of the time is because of the structure of that file! There are actually 49 tables and over a million records associated with it..., so it's not the most efficient method, but it's the only method.
I've requested they change the XML format, but they are extremely reluctant.
Again, thanks for the help.
March 1, 2011 at 1:23 pm
Mark Derryberry (3/1/2011)
Thanks loads guys!!The whole process still takes about 2 1/2 hours, but it's a whole lot better than before! Most of the time is because of the structure of that file! There are actually 49 tables and over a million records associated with it..., so it's not the most efficient method, but it's the only method.
I've requested they change the XML format, but they are extremely reluctant.
Again, thanks for the help.
You are welcome. Glad to have been of some help.
How much of an improvement did you see?
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
March 1, 2011 at 1:27 pm
Over 50% from my initial attempts! I was pushing about 6 hours and the methods you all proposed and some additional things I had to do given the scope. It's down to about 2 1/2 on a server. I say server because my desktop dev machine almost choked to death. 🙂
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply