Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Flattening XML problem Expand / Collapse
Author
Message
Posted Saturday, July 7, 2012 3:28 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, December 5, 2014 3:36 PM
Points: 18, Visits: 242
Using, SQL2005, I've been given a project that requires shredding some predefined XML. I'm having a hell of a time doing it. I'm new to the XML data type and the various methods.

Given the XML below, what is a practical way to "flatten it", so that I can normalize it into tables:

Account: AccountNum, AccountStatus
Player: AccountNum, PlayerID, FirstName, LastName
PlayerAddresses: AccountNum, PlayerID, AddressType, City, State, Zip
AddressPhones: AccountNum, PlayerID, AddressType, PhoneNumber, PhoneType

declare @myXML XML

select @myXML =
'<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1">
<AddressList>
<PlayerAddress>
<Address AddressType="primary", City="City1" State="State1" Zip="Zip1">
<FutureUse />
</Address>
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
<Address AddressType="billing", City="City1" State="State1" Zip="Zip1">
<FutureUse />
</Address>
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
</AddressList>
</PlayerInfo>
</AccountDetailsRsp>'

I've tried using examples from the following two links but am having no success:

http://www.sqlservercentral.com/articles/Miscellaneous/2996/
http://blogs.msdn.com/b/simonince/archive/2009/04/24/flattening-xml-data-in-sql-server.aspx

Thanks!
Post #1326504
Posted Sunday, July 8, 2012 8:54 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, December 5, 2014 3:36 PM
Points: 18, Visits: 242
I apologize - the XML example I provided was not valid. I attempted to abstract the actual XML because it's much bigger and I wanted to simplify it for this post. But I did a poor job. The following appears to pass a validator:

<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1">
<AddressList>
<PlayerAddress>
<Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/>
<FutureUse />
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
<PlayerAddress>
<Address AddressType="billing" City="City1" State="State1" Zip="Zip1"/>
<FutureUse />
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
</AddressList>
</PlayerInfo>
</AccountDetailsRsp>
Post #1326569
Posted Sunday, July 8, 2012 3:08 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, December 5, 2014 3:36 PM
Points: 18, Visits: 242
After reading more carefully through this:

http://blogs.msdn.com/b/simonince/archive/2009/04/24/flattening-xml-data-in-sql-server.aspx

I was able to construct the following query, that gets the results in a set that I can then easily normalize. I understand how this works... I'd like to say I could do this again without effort, but it's tedious. In addition I have no clue whether this is efficient. Efficiency isn't a big deal in my case, since my sets will be small and not high volume, but still would be nice to know.

Here is the solution I came up with:


select distinct
accountRef.value('@AccountNum', 'varchar(10)') AS AccountNum,
accountRef.value('@AccountStatus', 'varchar(20)') AS AccountStatus,
playerRef.value('@PlayerID', 'varchar(10)') as PlayerID,
addressRef.value('@AddressType', 'varchar(10)') as AddressType,
addressRef.value('@City', 'varchar(10)') as City,
addressRef.value('@State', 'varchar(10)') as Zip,
phoneRef.value('@PhoneNumber', 'varchar(15)') as Phone,
phoneRef.value('@PhoneType', 'varchar(15)') as PhoneType
from @myXML.nodes('/AccountDetailsRsp') as account(accountRef) cross apply
accountRef.nodes('//PlayerInfo') as player(playerRef) cross apply
playerRef.nodes('//AddressList/PlayerAddress/Address') as addresses(addressRef) cross apply
playerRef.nodes('//AddressList/PlayerAddress/Phone') as phones(phoneRef)
order by accountNum, AddressType, Phone

AccountNum AccountStatus PlayerID AddressType City Zip Phone PhoneType
---------- -------------------- ---------- ----------- ---------- ---------- --------------- ---------------
1 AccountStatus1 1 billing City1 State1 PhoneNumber1 Type1
1 AccountStatus1 1 billing City1 State1 PhoneNumber2 Type2
1 AccountStatus1 1 primary City1 State1 PhoneNumber1 Type1
1 AccountStatus1 1 primary City1 State1 PhoneNumber2 Type2

(4 row(s) affected)
Post #1326624
Posted Monday, July 9, 2012 12:17 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: 2 days ago @ 7:27 AM
Points: 35,769, Visits: 32,437
That actually doesn't work so well. Try it with the following data and see...

<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1">
<AddressList>
<PlayerAddress>
<Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/>
<FutureUse />
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
<PlayerAddress>
<Address AddressType="billing" Zip="Zip2" State="State2" City="City2" />
<FutureUse />
<Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" />
</PlayerAddress>
</AddressList>
</PlayerInfo>
</AccountDetailsRsp>



What I get from that using your code is...

AccountNum AccountStatus  PlayerID AddressType City  Zip    Phone        PhoneType
---------- -------------- -------- ----------- ----- ------ ------------ ---------
1 AccountStatus1 1 billing City2 State2 PhoneNumber1 Type1
1 AccountStatus1 1 billing City2 State2 PhoneNumber2 Type2
1 AccountStatus1 1 billing City2 State2 PhoneNumber3 Type3
1 AccountStatus1 1 primary City1 State1 PhoneNumber1 Type1
1 AccountStatus1 1 primary City1 State1 PhoneNumber2 Type2
1 AccountStatus1 1 primary City1 State1 PhoneNumber3 Type3

(6 row(s) affected)



I hope someone else drops in because I don't know enough about XML to fix it.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1326667
Posted Monday, July 9, 2012 10:12 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, December 5, 2014 3:36 PM
Points: 18, Visits: 242
It's interesting that there isn't more responders... I read into that that everyone hates this XML stuff.

From what I can see, what makes this especially difficult is that for any given player, there are multiple addresses, and for any given address there are multiple phones. However, the only key value for an address (and therefore the way to relate the Phones to the Address) is the AddressType attribute, that is part of the Address node, which is a sibling of the Phone nodes. It would have been much better to include those attributes in the parent PlayerAddress node (and not have the Address node). Then the key AddressType value could have been associated with the Phones. Unfortunately, this is what I've been given to work with.

The only resolution I see is a RBAR solution. First get a result set that includes the attributes for each Player as columns, and a column with the XML fragment for the PlayerAddresses. You can do that with this:


select
accountRef.value('@AccountNum', 'varchar(10)') AS AccountNum,
accountRef.value('@AccountStatus', 'varchar(20)') AS AccountStatus,
playerRef.value('@PlayerID', 'varchar(10)') as PlayerID,
playerRef.value('@FirstName', 'varchar(10)') as FirstName,
playerRef.value('@LastName', 'varchar(10)') as LastName,
playerRef.query('AddressList') as AddressList
into #tempTbl
from @myXML.nodes('/AccountDetailsRsp') as account(accountRef) cross apply
accountRef.nodes('//PlayerInfo') as player(playerRef)

Then, for each player,
1. count the number of PlayerAddress nodes
2. using the row filter, query for each PlayerAddress node individually, parsing the details and storing

I hope not to have to do that... hopefully someone else knows a better way.
Post #1326960
Posted Monday, July 9, 2012 11:25 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:28 AM
Points: 7,179, Visits: 15,775
Your query doesn't match the sample data. That said you're on the right track. You just didn't get the cross apply structures quite right.

Assuming you do it this way - you will get the appropriate data joins:

declare @xml xml

set @xml=cast('<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1"> <AddressList> <PlayerAddress> <Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/> <FutureUse /> <Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" /> <Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" /> </PlayerAddress> <PlayerAddress> <Address AddressType="billing" Zip="Zip2" State="State2" City="City2" /> <FutureUse /> <Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" /> </PlayerAddress> </AddressList> </PlayerInfo>
<PlayerInfo PlayerID="2" FirstName="FirstName1" LastName="LastName1"> <AddressList> <PlayerAddress> <Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/> <FutureUse /> <Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" /> <Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" /> </PlayerAddress> <PlayerAddress> <Address AddressType="billing" Zip="Zip2" State="State2" City="City2" /> <FutureUse /> <Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" /> </PlayerAddress> </AddressList> </PlayerInfo>
</AccountDetailsRsp>' as XML)

select act.value('(@AccountNum)[1]','Int') account,
player.value('(@PlayerID)[1]','Int') play,
c.value('(Address/@AddressType)[1]','nvarchar(100)') addrtype,
d.value('(@PhoneNumber)[1]','nvarchar(100)') phone
from @xml.nodes('AccountDetailsRsp') account(act) cross apply
act.nodes('PlayerInfo') p(player)
cross apply player.nodes('AddressList/PlayerAddress') x(c)
cross apply c.nodes('Phone') y(d) --this need to be relative to PlayerAddress, NOT above

Notice that I am stopping just shy of going to PlayerAddress within the cross apply, so you can point out that you are joining based on their mutual relation to playerAddress. I could throw in another cross apply if need be but it isn't necessary in this case.

Edit: typoed the node name they both join by.


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #1326993
Posted Monday, July 9, 2012 12:08 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, December 5, 2014 3:36 PM
Points: 18, Visits: 242
Ah! I tried something like that, referencing PlayerAddress rather than PlayerAddress/Address and then specifying the path "Address/@AddressType" but I would get an error:

XQuery [value()]: 'value()' requires a singleton (or empty sequence), found operand of type 'xdt:untypedAtomic *'

I did not know what that was, and couldn't figure out from documentation, but I see the [1] filter resolves that. I'll need to study it more to fully understand, but this does in fact work.

Thank you very much!

Post #1327026
Posted Monday, July 9, 2012 3:43 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: 2 days ago @ 7:27 AM
Points: 35,769, Visits: 32,437
Matt Miller (#4) (7/9/2012)
Your query doesn't match the sample data. That said you're on the right track. You just didn't get the cross apply structures quite right.

Assuming you do it this way - you will get the appropriate data joins:

declare @xml xml

set @xml=cast('<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1"> <AddressList> <PlayerAddress> <Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/> <FutureUse /> <Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" /> <Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" /> </PlayerAddress> <PlayerAddress> <Address AddressType="billing" Zip="Zip2" State="State2" City="City2" /> <FutureUse /> <Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" /> </PlayerAddress> </AddressList> </PlayerInfo>
<PlayerInfo PlayerID="2" FirstName="FirstName1" LastName="LastName1"> <AddressList> <PlayerAddress> <Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/> <FutureUse /> <Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" /> <Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" /> </PlayerAddress> <PlayerAddress> <Address AddressType="billing" Zip="Zip2" State="State2" City="City2" /> <FutureUse /> <Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" /> </PlayerAddress> </AddressList> </PlayerInfo>
</AccountDetailsRsp>' as XML)

select act.value('(@AccountNum)[1]','Int') account,
player.value('(@PlayerID)[1]','Int') play,
c.value('(Address/@AddressType)[1]','nvarchar(100)') addrtype,
d.value('(@PhoneNumber)[1]','nvarchar(100)') phone
from @xml.nodes('AccountDetailsRsp') account(act) cross apply
act.nodes('PlayerInfo') p(player)
cross apply player.nodes('AddressList/PlayerAddress') x(c)
cross apply c.nodes('Phone') y(d) --this need to be relative to PlayerAddress, NOT above

Notice that I am stopping just shy of going to PlayerAddress within the cross apply, so you can point out that you are joining based on their mutual relation to playerAddress. I could throw in another cross apply if need be but it isn't necessary in this case.

Edit: typoed the node name they both join by.


You have no idea how much this helped me on a "little" automation project I'm working on. I even figured out that OUTER APPLY instead of CROSS APPLY will allow for missing nodes at the "leaf" level. I also was having a heck of a time understanding the stucture of things like this because everyone seems to have their own style and they never really explain the basics. Because of this wonderfully simple yet complex example and actually understanding what the "edge" table looks like behind the scenes (which is part of the reason for my "extra" OUTER APPLYs), I've finally nailed down something that I thought was going to eat up a whole lot more of my spare time.

Thank you BOTH for posting the original question and such nice examples.

Here's what I ended up with, BTW. I just need to document things a bit (all though the naming convention I used pretty much tells it all) and I'm all set for the next phase of my project.

DECLARE @XML XML

SET @XML=CAST('
<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">
<PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1">
<AddressList>
<PlayerAddress>
<Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/>
<FutureUse />
<Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />
<Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />
</PlayerAddress>
<PlayerAddress>
<Address AddressType="billing" Zip="Zip2" State="State2" City="City2" />
<FutureUse />
<Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" />
</PlayerAddress>
</AddressList>
</PlayerInfo>
<PlayerInfo PlayerID="2" FirstName="FirstName2" LastName="LastName2">
<AddressList>
<PlayerAddress>
<Address AddressType="primary" City="City3" State="State3" Zip="Zip3"/>
<FutureUse />
<Phone PhoneNumber="PhoneNumber4" PhoneType="Type4" />
<Phone PhoneNumber="PhoneNumber5" PhoneType="Type5" />
</PlayerAddress>
<PlayerAddress>
<Address AddressType="billing" Zip="Zip4" State="State4" City="City4" />
<FutureUse />

</PlayerAddress>
</AddressList>
</PlayerInfo>
</AccountDetailsRsp>'
AS XML)

SELECT AccountNum = x1.AccountDetailsRsp.value ('@AccountNum' ,'VARCHAR(28)'),
AccountStatus = x1.AccountDetailsRsp.value ('@AccountStatus' ,'VARCHAR(28)'),
PlayerID = x2.PlayerInfo.value ('@PlayerID' ,'VARCHAR(28)'),
FirstName = x2.PlayerInfo.value ('@FirstName' ,'VARCHAR(28)'),
LastName = x2.PlayerInfo.value ('@LastName' ,'VARCHAR(28)'),
AddressType = x5.Address.value ('@AddressType' ,'VARCHAR(28)'),
City = x5.Address.value ('@City' ,'VARCHAR(28)'),
State = x5.Address.value ('@State' ,'VARCHAR(28)'),
Zip = x5.Address.value ('@Zip' ,'VARCHAR(28)'),
PhoneNumber = x6.Phone.value ('@PhoneNumber' ,'VARCHAR(28)'),
PhoneType = x6.Phone.value ('@PhoneType' ,'VARCHAR(28)')
FROM @XML.nodes ('AccountDetailsRsp') x1 (AccountDetailsRsp)
OUTER APPLY AccountDetailsRsp.nodes ('PlayerInfo') x2 (PlayerInfo)
OUTER APPLY PlayerInfo.nodes ('AddressList') x3 (AddressList)
OUTER APPLY AddressList.nodes ('PlayerAddress') x4 (PlayerAddress)
OUTER APPLY PlayerAddress.nodes ('Address') x5 (Address) --this needs to be relative to PlayerAddress
OUTER APPLY PlayerAddress.nodes ('Phone') x6 (Phone) --this needs to be relative to PlayerAddress




--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1327188
Posted Monday, July 9, 2012 4:29 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, December 5, 2014 3:36 PM
Points: 18, Visits: 242
That's awesome that it happened to help you (and hopefully others) as well. Of course Matt gets credit for the hard part. I am a frequent reader and searcher of this forum, and have gotten plenty of great info from both of you. Including making good use of the famous tally table! I'll try to help someone (with something simple!) for good karma.
Post #1327201
Posted Monday, July 9, 2012 6:53 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 7:28 AM
Points: 7,179, Visits: 15,775
Woot! Happy that helped you both. Was a somewhat weak explanation, but apparently got the message across.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #1327222
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse