Flattening XML problem

  • Sure. Would be happy to review it. Send me what you have whenever you're ready.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Will do and thanks. I appreciate it very much.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Matt Miller (#4) (7/11/2012)


    Sure. Would be happy to review it. Send me what you have whenever you're ready.

    Thanks again for your offer, Matt. I sent you the code via PM. I didn't want to post it here because 1) I know comparatively very little about XML and didn't want to post something here that someone might make the mistake of using if it's bad code and 2) I hate XML 😛 enough to write an article about how to get around all of the stuff most folks have to go through to write XML shredders to meet their needs. The code I sent you actually interrogates the XML and writes the necessary T-SQL to shred virtually any well-formed XML even if attributes or "data nodes" are missing from "entities" much like you've done on this very thread. It even "right sizes" the data columns it returns in the result set and can easily be copied and tweaked to change the datatypes of the result set. Heh... it even prints instructions for how to easily modify the code so that it works on a whole column of XML from a table. :w00t:

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • rice.tx (7/9/2012)


    Of course Matt gets credit for the hard part.

    You said a mouthful there. Matt's example has changed my life. I don't actually hate XML anymore... I only loath it now. 😀

    In all seriousness, if the code I sent Matt passes his muster or I can tweak any corrections/enhancements he may have, even "XML Dummies" like me :unsure: will have no problems with shredding XML to a flattened structure.

    Yep... I realize that a lot of people (I've been Googling the subject like crazy) have written code to flatten XML but it's either been horribly complex or actually doesn't do anything automatically in a "point'n'shoot" fashion without knowing (literally) anything about the actual structure of the XML.

    BTW... do you mind if I use the example XML data you posted for the article (presuming the code passed Matt's muster, of course)? It's a great example because, although it's nice and short, it has a couple of "levels" of "entities and attributes" built into it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Crud. I might not be done with the code, yet. Is the following considered to be a "normal well formed" type of XML?

    SET @pXML = '

    <items>

    <item id="0001" type="donut">

    <name>Cake</name>

    <ppu>0.55</ppu>

    <batters>

    <batter id="1001">Regular</batter>

    <batter id="1002">Chocolate</batter>

    <batter id="1003">Blueberry</batter>

    </batters>

    <topping id="5001">None</topping>

    <topping id="5002">Glazed</topping>

    <topping id="5005">Sugar</topping>

    <topping id="5006">Sprinkles</topping>

    <topping id="5003">Chocolate</topping>

    <topping id="5004">Maple</topping>

    </item>

    </items>

    '

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Ah... I see. It's a legal "hybrid" of both attribute and element based XML. Back to the drawing board.

    Matt, if you're reading this, forget about the code I PMd you. It won't handle the hybrid stuff.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sometimes, I get lucky. I've now got code that will automatically create a flattened result set from any of the following XML. I'll send it to Matt for review before I post it because I could certainly be wrong like I was before. I'm not quite back to hating XML but I am loathing it a bit more. 😀

    --=============================================================================

    -- Build some test data for this code review. This is NOT a part of the

    -- solution. The @pXML variable could be a parameter in a stored proc.

    --=============================================================================

    --===== This will contain the actual XML document to be shredded and flattened.

    DECLARE @pXML XML;

    --===== This is an "Attribute Based" example.

    -- "FirstName1" and "Zip2" attributes and the "Phone" entity for

    -- "PhoneNumber6" have intentionally been left out to demonstrate that the

    -- XML DOESN'T have to be "perfect" meaning that not every "entity" needs

    -- to have all the attributes. It still needs to be "well formed", though.

    SET @pXML = '

    <AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">

    <PlayerInfo PlayerID="1" LastName="LastName1">

    <AddressList>

    <PlayerAddress>

    <Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/>

    <FutureUse />

    <Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />

    <Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />

    </PlayerAddress>

    <PlayerAddress>

    <Address AddressType="billing" State="State2" City="City2" />

    <FutureUse />

    <Phone PhoneNumber="PhoneNumber3" PhoneType="Type3" />

    </PlayerAddress>

    </AddressList>

    </PlayerInfo>

    <PlayerInfo PlayerID="2" FirstName="FirstName2" LastName="LastName2">

    <AddressList>

    <PlayerAddress>

    <Address AddressType="primary" City="City3" State="State3" Zip="Zip3"/>

    <FutureUse />

    <Phone PhoneNumber="PhoneNumber4" PhoneType="Type4" />

    <Phone PhoneNumber="PhoneNumber5" PhoneType="Type5" />

    </PlayerAddress>

    <PlayerAddress>

    <Address AddressType="billing" Zip="Zip4" State="State4" City="City4" />

    <FutureUse />

    </PlayerAddress>

    </AddressList>

    </PlayerInfo>

    </AccountDetailsRsp>'

    --===== This is an "Element Based" example. Uncomment it to see the code work

    -- with it instead of the "Attribute Based" example.

    -- "Skills" have been left out for "Simon" and "Age" has been left out

    -- for "Sally" to demonstrate that the XML DOESN'T have to be perfect

    -- meaning that not every "entity" needs to have all the attributes.

    -- It still needs to be "well formed", though.

    --SET @pXML = '

    --<Root>

    -- <Person>

    -- <Name>Simon</Name>

    -- <Age>20</Age>

    -- </Person>

    -- <Person>

    -- <Name>Peter</Name>

    -- <Age>21</Age>

    -- <Skills>

    -- <Skill>Cooking</Skill>

    -- <Skill>Carpentry</Skill>

    -- <Skill>Hunting</Skill>

    -- </Skills>

    -- </Person>

    -- <Person>

    -- <Name>Sally</Name>

    -- <Skills>

    -- <Skill>Cooking</Skill>

    -- <Skill>Carpentry</Skill>

    -- </Skills>

    -- </Person>

    --</Root>

    --'

    ;

    --===== This is an "Hybrid Based" example. Uncomment it to see the code work

    -- with it instead of the other examples.

    --SET @pXML = '

    --<items>

    --<item id="0001" type="donut">

    --<name>Cake</name>

    --<ppu>0.55</ppu>

    --<batters>

    --<batter id="1001">Regular</batter>

    --<batter id="1002">Chocolate</batter>

    --<batter id="1003">Blueberry</batter>

    --</batters>

    --<topping id="5001">None</topping>

    --<topping id="5002">Glazed</topping>

    --<topping id="5005">Sugar</topping>

    --<topping id="5006">Sprinkles</topping>

    --<topping id="5003">Chocolate</topping>

    --<topping id="5004">Maple</topping>

    --</item>

    --</items>

    --'

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Had to start over because of the nuances of "hybrid" XML but I believe I may have nailed down the code to automatically flatten the 3 different types (Attribute Only, Element Only, and Hybrid of the two) of NON-TYPED, NON-XSD XML (the type of stuff most people work with, properly formed, of course).

    First, it was a bit unfair of me to try to latch onto Matt for specific help. It's also a bit unfair of me to hold the code back until I can get an article out on the subject.

    With that in mind, here's the code I ended up with. To see it work, just replace the XML example in the code with your favorite well formed XML and let it rip (or you can use the simple Hybrid example I included). As always, the details are in the comments in the code.

    As an overview, the following code build the "Edge" table (an Adjacency List Hierarchy, really) that SQL Server uses behind the scenes, adds some information to it, and builds the SQL to return the flattened XML as a single derived table result set. It also prints the XML so you can tweak it for production use.

    BWAAA-HAAAA!!! I now know more about XML than I ever wanted to. Hopefully, I won't ever have to get into the XSD stuff. 😛 Thanks again to Rice.tx for the original problem and to Matt for posting a reasonable solution for me to automate.

    --===== Put your XML document between the single quotes below like I did.

    SET NOCOUNT ON;

    DECLARE @pXML XML;

    SET @pXML =

    '

    <items>

    <item id="0001" type="Donut">

    <name>Cake</name>

    <ppu>0.55</ppu>

    <batter id="1001">Regular</batter>

    <batter id="1002">Chocolate</batter>

    <batter id="1003">Blueberry</batter>

    <topping id="5001">None</topping>

    <topping id="5002">Glazed</topping>

    <topping id="5005">Sugar</topping>

    <topping id="5006">Sprinkles</topping>

    <topping id="5003">Chocolate</topping>

    <topping id="5004">Maple</topping>

    </item>

    </items>

    '

    ;

    --=============================================================================

    -- Copy the "Edge" table including some necessary extra columns to TempDB

    -- where we can work on it.

    --=============================================================================

    --===== Create the edge table that we'll need

    IF OBJECT_ID('tempdb..#Edge','U') IS NOT NULL

    DROP TABLE #Edge

    ;

    --===== This creates the "Edge" table and gives us a "handle" to refer to it.

    DECLARE @DocHandle INT;

    EXEC sp_xml_preparedocument @DocHandle OUTPUT, @pXML;

    --===== This copies the "Edge" table to a new table on-the-fly so we can work

    -- on it. We also create a couple of extra columns that we'll need to

    -- build the SQL to flatten the XML with.

    SELECT ID = ISNULL(ID,0), --ISNULL makes a NOT NULL column.

    ParentID,

    NodeType, --1="Entity/Table", 2="Attribute/Column", 3="Data Node"

    --4= a new node type to handle "Element Based" data.

    ParentLocalName = CAST(NULL AS NVARCHAR(4000)),

    LocalName,

    Text = CAST(Text AS NVARCHAR(MAX)),

    HLevel = CAST(NULL AS INT),

    Width = CAST(NULL AS INT),

    DeDupeName = CAST(NULL AS NVARCHAR(4000)),

    Alias = CAST(NULL AS NVARCHAR(10))

    INTO #Edge

    FROM OPENXML (@DocHandle, '/') --This means "Prepare EVERYTHING" in the XML

    ;

    --===== We no longer need the real "Edge" table. Drop it to conserve resources.

    EXEC sp_xml_removedocument @DocHandle;

    --===== Create an index to make things run a bit faster.

    -- May have to add to this for XML with a lot of nodes/rows.

    CREATE UNIQUE CLUSTERED INDEX IX_#Edge_ID_ParentID

    ON #Edge (ID,ParentID)

    ;

    --=============================================================================

    -- We need to know the Hierarchical Level and parent LocalName of each

    -- node to build the SELECT list. While we're at it, we need to know the

    -- MAXimum width of the data for each data node so we can "right size" the

    -- columns for the result "table".

    --=============================================================================

    WITH

    cteBuildLevel AS

    ( --=== This is the "anchor" part of the recursive CTE

    SELECT anchor.ID,

    HLevel = 1,

    LocalName,

    ParentLocalName

    FROM #Edge AS anchor

    WHERE ParentID IS NULL

    UNION ALL -------------------------------------------------------------------

    --==== This is the "recursive" part of the CTE that adds 1 for each level

    -- and carries the previous LocalName forward as the ParentLocalName for

    -- the next level of rows.

    SELECT recur.ID,

    HLevel = cte.HLevel+1,

    LocalName = recur.LocalName,

    ParentLocalName = cte.LocalName

    FROM #Edge AS recur

    INNER JOIN cteBuildLevel AS cte

    ON cte.ID = recur.ParentID

    )--==== This does the actual update of the "edge" table from above.

    -- It also calculates the column width of the data so we can modify

    -- the resulting T-SQL later to make it so not every column has to

    -- be a TEXT or NVARCHAR(MAX) column.

    UPDATE tgt

    SET tgt.HLevel = bp.HLevel,

    tgt.ParentLocalName = bp.ParentLocalName,

    tgt.Width = DATALENGTH(tgt.Text)

    FROM #Edge tgt

    INNER JOIN cteBuildLevel bp

    ON tgt.ID = bp.ID

    ;

    --=============================================================================

    -- We need to create table aliases for both the SELECT list and the FROM

    -- clause. Type 1 and Type 4 (not yet created) nodes will appear in the

    -- FROM clause. Type 2 and Type 4 nodes will appear in the SELECT list.

    --=============================================================================

    --===== Mark Type 1 Nodes with an alias

    WITH

    cteUniqueLocalNames AS

    ( --=== Get the unique LocalNames for Type 1 nodes

    SELECT DISTINCT

    HLevel,LocalName

    FROM #Edge

    WHERE NodeType = 1

    ),

    cteCreateAliases AS

    ( --=== Number the unique LocalNames in order by their hierarchical level.

    SELECT HLevel,

    LocalName,

    Alias = 'x'

    + RIGHT('0000' + CAST(

    ROW_NUMBER() OVER (ORDER BY HLevel, LocalName)

    AS NVARCHAR(10)),4)

    FROM cteUniqueLocalNames

    ) --=== Update all Type 1 (Entity) nodes with the Alias from above

    UPDATE edge

    SET edge.Alias = alias.Alias

    FROM #Edge edge

    INNER JOIN cteCreateAliases alias

    ON edge.LocalName = alias.LocalName

    AND edge.HLevel = alias.HLevel

    AND edge.NodeType = 1

    ;

    --===== Mark Type 2 Nodes with the Alias of their parent nodes.

    -- For "Attribute Based" nodes, Type 2 Nodes will have data associated

    -- with it so the node needs to appear in the SELECT list. This marks the

    -- Type 2 Nodes with the alias of their parents so that we can use the

    -- correct alias for the nodes especially when there are duplicate named

    -- nodes.

    UPDATE type2node

    SET type2node.Alias = parent.Alias

    FROM #Edge type2node

    INNER JOIN #Edge parent

    ON type2node.ParentID = parent.ID

    AND type2node.NodeType = 2

    ;

    --=============================================================================

    -- "Attribute Based" nodes will have a Type 2 node that needs to appear in

    -- the SELECT list. "Element Based" nodes don't have Type 2 nodes. They

    -- appear, instead, as Type 1 nodes with a direct attached Type 3 data

    -- node. Well mark those nodes as Type 4 (Element Based) to make building

    -- the SELECT list a whole lot easier later on.

    --=============================================================================

    --===== Mark the Type 1 parents of Type 3 as Type 4 "Element" nodes.

    UPDATE parent

    SET parent.NodeType = 4 --Element Node (formally a Type 1 Node)

    FROM #Edge type3node

    INNER JOIN #Edge parent

    ON type3node.ParentID = parent.ID

    AND type3node.NodeType = 3 --Data Node

    AND parent.NodeType = 1 --Entity Node

    ;

    --=============================================================================

    -- We need to update the "Attribute" and "Element" nodes with the width

    -- of their respective data nodes so we can more easily "right size" the

    -- columns in the SELECT list.

    --=============================================================================

    --===== Copy the column Width from the data nodes to the column nodes.

    -- 2="Attribute Based", 4="Element Based" are column nodes.

    UPDATE tgt

    SET tgt.Width = src.Width

    FROM #Edge tgt

    INNER JOIN #Edge src

    ON tgt.LocalName = src.ParentLocalName

    WHERE tgt.NodeType IN (2,4) --Attribute and Element nodes respectively

    AND src.NodeType = 3 --Data node

    ;

    --=============================================================================

    -- Last but not least, the possibility of duplicate column names exists.

    -- In order to prevent that type of duplication, we create a new set of

    -- column names by prepending the ParentLocal name and appending the Alias

    -- to the LocalName. We'll use these names as the column names in the

    -- SELECT list if a dupe exists.

    --=============================================================================

    --===== Create unique column names to use in the SELECT list.

    WITH

    cteLocalNames AS

    ( --=== Find unique combinations of LocalName and Alias so we get a count of

    -- column names to find duplicates

    SELECT DISTINCT LocalName, Alias

    FROM #Edge

    WHERE NodeType IN (2,4)

    ),

    cteDuplicateNames AS

    (

    SELECT LocalName

    FROM cteLocalNames

    GROUP BY LocalName

    HAVING COUNT(*) > 1

    )

    UPDATE edge

    SET DeDupeName = edge.ParentLocalName + '_' + edge.LocalName + '_' + edge.Alias

    FROM #Edge edge

    INNER JOIN cteDuplicateNames dupe

    ON edge.LocalName = dupe.LocalName

    ;

    --=============================================================================

    -- Create the FROM clause from the Type 1 (Entity) and Type 4 (Element)

    -- Nodes. Each OUTER APPLY is a subset of its parent making a list of

    -- "cascading OUTER APPLYs" or (to coin an acronym) "cOAs".

    --=============================================================================

    --===== Create a place to concatenate the cOAs to build the entire FROM clause.

    DECLARE @From VARCHAR(MAX);

    SELECT @From = '';

    --===== Create the FROM clause using cOAs

    WITH

    cteFromClauseNodes AS

    ( --=== Gather the node info that will show up in the FROM clause.

    SELECT DISTINCT

    ParentLocalName, LocalName, Alias

    FROM #Edge

    WHERE NodeType IN (1,4)

    )

    SELECT @From = @From

    + ISNULL(' OUTER APPLY ' + ParentLocalName, ' FROM @pXML')

    + '.nodes (' + QUOTENAME(LocalName,'''') + ') '

    + Alias + ' '

    + '(' + QUOTENAME(LocalName) + ')' + CHAR(10)

    FROM cteFromClauseNodes

    ORDER BY Alias

    ;

    --=============================================================================

    -- Create the SELECT LIST using the unique column names from the Type 2

    -- (Attribute) and Type 4 (Element) Nodes.

    -- Note how the Type 2 and Type 4 nodes need to be handled differently.

    -- Also note that this is where we handle the possibility of needing a MAX

    -- datatype column.

    --=============================================================================

    --===== Create a place to concatenate the columns to build the entire

    -- SELECT LIST.

    DECLARE @Select VARCHAR(MAX);

    --===== Create the SELECT LIST.

    WITH

    cteFromClauseNodes AS

    ( --=== Gather the unique node info that will show up in the SELECT LIST.

    SELECT NodeType,

    ParentLocalName,

    LocalName,

    ColumnName = ISNULL(DeDupeName, LocalName),

    Alias,

    Width = CASE --= Change to MAX datatype if bigger than NVARCHAR(4000)

    WHEN MAX(Width) <= 4000

    THEN CAST(MAX(Width) AS VARCHAR(10))

    ELSE 'MAX'

    END

    FROM #Edge

    WHERE NodeType IN (2,4)

    GROUP BY NodeType, ParentLocalName, LocalName, Alias, DeDupeName

    )

    SELECT @Select = ISNULL(@Select + ',' + CHAR(10)+ SPACE(8), '')

    + '[' + ColumnName + ']' + ' = '

    + CASE

    WHEN NodeType = 2 --Node Type 4 implied for the ELSE by WHERE

    THEN Alias + '.[' + ParentLocalName + '].value '

    ELSE Alias + '.[' + LocalName + '].value '

    END

    + CASE

    WHEN NodeType = 2 --Node Type 4 implied for the ELSE by WHERE

    THEN '(''@' + LocalName + '''' +

    + ', ''NVARCHAR(' + Width + ')'')'

    ELSE '(''(text())[1]'', ''NVARCHAR(' + Width + ')'')'

    END

    FROM cteFromClauseNodes

    WHERE NodeType IN (2,4)

    ORDER BY Alias --Simplifies troubleshooting if needed

    ;

    --=============================================================================

    -- Put the final SQL together along with some helpful hints for use.

    -- Note that we add a ROW_NUMBER column to ensure uniqueness if the

    -- de-duplication effort somehow managed to fail.

    --=============================================================================

    --===== Declare a variable to hold the final SQL and hints

    DECLARE @SQL NVARCHAR(MAX);

    --===== Put the final SQL all together along with some helpful hints for use.

    SELECT @SQL = '

    /*=============================================================================

    The following T-SQL is what shredded the XML you gave it.

    If desired, please modify the datatypes and/or column names to suit your

    needs. You could also add "INTO #SomeTableName" just before the FROM clause to

    "auto-magically" build and store the results into a temporary or other table

    as a "staging" table on-the-fly to work from.

    --Jeff Moden

    =============================================================================*/

    '

    + ' SELECT RowNum = ROW_NUMBER() ' --Ensure unique rows

    + 'OVER(ORDER BY (SELECT NULL)),' + CHAR(10) + SPACE(8)

    + @Select + CHAR(10)

    + @From

    + ';' + '

    --=============================================================================

    -- Hint on how to convert the T-SQL to work on a table column.

    --=============================================================================

    /*

    Change the FROM clause in the SQL above in a similar manner to below to shred

    an entire column of similar (ie: same nodes/structure) XML.

    Change from: (Note that "... etc x" means

    "the rest of the line of existing code.")

    FROM @pXML.nodes (... etc 1

    OUTER APPLY ... etc 2

    OUTER APPLY ... etc 3

    ... etc x

    Change to:

    FROM dbo.YourTableName x0

    CROSS APPLY YourXMLDeDupeName.nodes( ... etc 1

    OUTER APPLY ... etc 2

    OUTER APPLY ... etc 3

    ... etc x

    */

    '

    ;

    --=============================================================================

    -- We're ready to rock. Display a message to the user as to where to find

    -- things, print the SQL for possible copy and modification, and execute

    -- the SQL to show that it actually works without error.

    --=============================================================================

    --===== Tell the operator where to find the final SQL.

    SELECT [NOTICE!!!] = 'Please see "Messages" tab for the rendered T-SQL'

    UNION ALL

    SELECT [NOTICE!!!] = 'that created the "table" below from the XML.'

    ;

    --===== Print the SQL before we execute it to make troubleshooting easier.

    PRINT @SQL;

    --===== Show that the SQL works as advertised and so that we can examine the

    -- results to tweak the generated SQL for datatypes, etc, later.

    EXECUTE sp_executesql @SQL,

    N'@pXML XML',

    @pXML

    ;

    Here's the result from the "Messages" tab...

    /*=============================================================================

    The following T-SQL is what shredded the XML you gave it.

    If desired, please modify the datatypes and/or column names to suit your

    needs. You could also add "INTO #SomeTableName" just before the FROM clause to

    "auto-magically" build and store the results into a temporary or other table

    as a "staging" table on-the-fly to work from.

    --Jeff Moden

    =============================================================================*/

    SELECT RowNum = ROW_NUMBER() OVER(ORDER BY (SELECT NULL)),

    [item_id_x0002] = x0002.[item].value ('@id', 'NVARCHAR(8)'),

    [type] = x0002.[item].value ('@type', 'NVARCHAR(10)'),

    [batter_id_x0003] = x0003.[batter].value ('@id', 'NVARCHAR(8)'),

    [batter] = x0003.[batter].value ('(text())[1]', 'NVARCHAR(14)'),

    [name] = x0004.[name].value ('(text())[1]', 'NVARCHAR(8)'),

    [ppu] = x0005.[ppu].value ('(text())[1]', 'NVARCHAR(8)'),

    [topping_id_x0006] = x0006.[topping].value ('@id', 'NVARCHAR(8)'),

    [topping] = x0006.[topping].value ('(text())[1]', 'NVARCHAR(8)')

    FROM @pXML.nodes ('items') x0001 ([items])

    OUTER APPLY items.nodes ('item') x0002 ([item])

    OUTER APPLY item.nodes ('batter') x0003 ([batter])

    OUTER APPLY item.nodes ('name') x0004 ([name])

    OUTER APPLY item.nodes ('ppu') x0005 ([ppu])

    OUTER APPLY item.nodes ('topping') x0006 ([topping])

    ;

    --=============================================================================

    -- Hint on how to convert the T-SQL to work on a table column.

    --=============================================================================

    /*

    Change the FROM clause in the SQL above in a similar manner to below to shred

    an entire column of similar (ie: same nodes/structure) XML.

    Change from: (Note that "... etc x" means

    "the rest of the line of existing code.")

    FROM @pXML.nodes (... etc 1

    OUTER APPLY ... etc 2

    OUTER APPLY ... etc 3

    ... etc x

    Change to:

    FROM dbo.YourTableName x0

    CROSS APPLY YourXMLDeDupeName.nodes( ... etc 1

    OUTER APPLY ... etc 2

    OUTER APPLY ... etc 3

    ... etc x

    */

    Last but not least, here's the flattened result set from the XML sample in the code including "right sized" columns.

    RowNum item_id_x0002 type batter_id_x0003 batter name ppu topping_id_x0006 topping

    -------------------- ------------- ---------- --------------- -------------- -------- -------- ---------------- --------

    1 0001 Donut 1001 Regular Cake 0.55 5001 None

    2 0001 Donut 1001 Regular Cake 0.55 5002 Glazed

    3 0001 Donut 1001 Regular Cake 0.55 5005 Sugar

    4 0001 Donut 1001 Regular Cake 0.55 5006 Sprinkle

    5 0001 Donut 1001 Regular Cake 0.55 5003 Chocolat

    6 0001 Donut 1001 Regular Cake 0.55 5004 Maple

    7 0001 Donut 1002 Chocolate Cake 0.55 5001 None

    8 0001 Donut 1002 Chocolate Cake 0.55 5002 Glazed

    9 0001 Donut 1002 Chocolate Cake 0.55 5005 Sugar

    10 0001 Donut 1002 Chocolate Cake 0.55 5006 Sprinkle

    11 0001 Donut 1002 Chocolate Cake 0.55 5003 Chocolat

    12 0001 Donut 1002 Chocolate Cake 0.55 5004 Maple

    13 0001 Donut 1003 Blueberry Cake 0.55 5001 None

    14 0001 Donut 1003 Blueberry Cake 0.55 5002 Glazed

    15 0001 Donut 1003 Blueberry Cake 0.55 5005 Sugar

    16 0001 Donut 1003 Blueberry Cake 0.55 5006 Sprinkle

    17 0001 Donut 1003 Blueberry Cake 0.55 5003 Chocolat

    18 0001 Donut 1003 Blueberry Cake 0.55 5004 Maple

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff your work is amazing. I am learning a _lot_ from it. Thank you!

  • kenambrose (10/17/2012)


    Jeff your work is amazing. I am learning a _lot_ from it. Thank you!

    Thanks for the feedback, Ken. I really appreciate it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Another article I'm looking forward to reading.

    Wish I'd seen this thread sooner as I might have been able to help. Not that I claim to know that much about XML but somehow I manage to brute-force it and get it to work.:-P


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (10/19/2012)


    Another article I'm looking forward to reading.

    Wish I'd seen this thread sooner as I might have been able to help. Not that I claim to know that much about XML but somehow I manage to brute-force it and get it to work.:-P

    It is interesting. I just ventured across the 2nd iteration just yesterday. (Sorry Jeff - I don't think I ever saw a followup to the initial thread).

    Jeff - a few things: it's a definitely good start. It is VERY good at identifying and sniffing out the internal structure: the #edge table is a very interesting artifact, and I am very curious to keep looking through it to see how we could improve upon it. This script also fairly good at flattening certain kinds of XML and can be a good starting point to create a staging table..

    There's a big issue to look at however. There seems to be a presumption that the internal structure is very loose and doesn't in fact neatly fit into tables already. This has an interesting side effect: when the XML IS somewhat based on "tables", the flattening process tends to turn the output into something reasonably horrible. There's a distinct triangle join effect: try running the script against this kind of XML (a lot of the XML message I get look a lot like this).

    declare @x varchar(max)

    select @x='<root>

    <locs>

    <location name="New York" id="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"/>

    <location name="Philadelphia" id="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"/>

    <location name="Boston" id="CCCCCCCC-4925-4DB0-88B5-CBC5408CCEDF"/>

    </locs>

    <Mascots>

    <Mascot name="Tiger" id="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Buffalo" id="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Hawk" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Chicken" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Orca" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Corsair" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    </Mascots>

    <Leagues>

    <League name="Major league" id="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"/>

    <League name="Minor League" id="AAAAAAAA-1111-4DB0-88B5-CBC5408CCEDF"/>

    <League name="Bush League" id="AAAAAAAA-2222-4DB0-88B5-CBC5408CCEDF"/>

    </Mascots>

    <roster>

    <teams>

    <team name="blue team" id="1"

    Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"

    locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"

    mascotref="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF">

    the

    <player name="Bob"/>

    cow

    <player name="Joe">

    <age>21</age>

    </player>

    jumped over

    <player name="Gary"/>

    the

    <player name="Bruce"/>

    <player name="Bruce"/>

    moon

    </team>

    <team name="red team" id="2"

    Leagueref="AAAAAAAA-222-4DB0-88B5-CBC5408CCEDF"

    locationref="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"

    mascotref="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF">

    <player name="Mary"/>

    <player name="Susie">

    <age>45</age>

    </player>

    <player name="Gary"/>

    <player name="Shelia"/>

    <player name="Bruce"/>

    </team>

    <team name="grey team" id="3"

    Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"

    locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"

    mascotref="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF">

    <player name="Bob"/>

    <player name="Joe">

    <age>21</age>

    </player>

    <player name="Gary"/>

    <player name="Bruce"/>

    <player name="Bruce"/>

    </team>

    </teams>

    </roster>

    </root>';

    The fly in the ointment is that you basically are getting some "reference" tables outside of the main structure. By blowing out the internal structure (which isn't half bad if you are trying to relate the entities to each other), it's now going to be VERY hard to determine how many of "each thing" needs to be inserted/imported when you actually go to use the flattened output.

    Specifically, you might not pick up on the fact that you actually have 2 players called "Bruce" on several of the teams, since everything ELSE is duplicated as well.Or - there' no good way to know that some of those "reference values" aren't actually referenced elsewhere (with the cartesian product here - it kind of looks like they are).

    Now the example also happens to be using an XML construct called id/idref (essentially PK/FK in XML-speak). I just don't see how you could reasonably sniff that out without a schema.

    Again - great building blocks on discovering the internals of the file without a preexisting schema. I am just not quite sure they're leveraged correctly at this point (or properly put boundaries on what this actually does).I need to keep stewing on this a bit more, but just wanted to post a little feedback on it.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff seems to have beat me to it, but here's a similar XML parsing procedure I've used extensively.

    CREATE PROCEDURE dbo.ParseXML

    @strXML AS XML

    ,@rootnode NVARCHAR(255)

    AS

    BEGIN

    /*

    EXEC dbo.ParseXML

    '<items>

    <item id="0001" type="Donut">

    <name>Cake</name>

    <ppu>0.55</ppu>

    <batter id="1001">Regular</batter>

    <batter id="1002">Chocolate</batter>

    <batter id="1003">Blueberry</batter>

    <topping id="5001">None</topping>

    <topping id="5002">Glazed</topping>

    <topping id="5005">Sugar</topping>

    <topping id="5006">Sprinkles</topping>

    <topping id="5003">Chocolate</topping>

    <topping id="5004">Maple</topping>

    </item>

    </items>'

    ,'items'

    EXEC dbo.ParseXML

    '<AccountDetailsRsp AccountNum="1" AccountStatus="AccountStatus1">

    <PlayerInfo PlayerID="1" FirstName="FirstName1" LastName="LastName1">

    <AddressList>

    <PlayerAddress>

    <Address AddressType="primary" City="City1" State="State1" Zip="Zip1"/>

    <FutureUse>Example Text1</FutureUse>

    <Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />

    <Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />

    </PlayerAddress>

    <PlayerAddress>

    <Address AddressType="billing" City="City1" State="State1" Zip="Zip1"/>

    <FutureUse>Example Text2</FutureUse>

    <Phone PhoneNumber="PhoneNumber1" PhoneType="Type1" />

    <Phone PhoneNumber="PhoneNumber2" PhoneType="Type2" />

    </PlayerAddress>

    </AddressList>

    </PlayerInfo>

    </AccountDetailsRsp>'

    ,'AccountDetailsRsp'

    */

    SET NOCOUNT ON

    DECLARE

    @strText AS NVARCHAR(MAX)

    ,@idoc INT

    ,@id INT

    ,@parentid INT

    IF OBJECT_ID('tempdb..#ChildList') IS NOT NULL

    DROP TABLE #ChildList

    CREATE TABLE #ChildList (

    [RowNum] INT IDENTITY(1,1) NOT NULL,

    [parentid] INT NULL,

    [id] INT NULL,

    PRIMARY KEY (RowNum),

    UNIQUE (RowNum))

    IF OBJECT_ID('tempdb..#NodeList') IS NOT NULL

    DROP TABLE #NodeList

    CREATE TABLE #NodeList (

    [RowNum] INT NOT NULL,

    [id] INT NULL,

    [parentid] INT NULL,

    [nodetype] INT NULL,

    [localname] NVARCHAR(MAX) NULL,

    [text] NVARCHAR(MAX) NULL,

    PRIMARY KEY (RowNum),

    UNIQUE (RowNum))

    SET @id = 1

    SET @parentid = NULL

    /* Get rid of tabs and extra spaces */

    SET @strText = CAST(@strXML AS NVARCHAR(MAX))

    SET @strText =

    REPLACE(

    REPLACE(

    REPLACE(

    REPLACE(

    @strText

    ,' ',' '+CHAR(7))

    ,CHAR(7)+' ','')

    ,CHAR(7),'')

    ,CHAR(9),' ')

    SET @strXML = CONVERT(XML,@strText)

    /* Validate the XML */

    EXEC sp_xml_preparedocument @idoc OUTPUT, @strXML

    /* Parse the XML data */

    ;WITH cte

    AS (

    SELECT

    CAST(p1.parentid AS INT) AS parentid

    ,CAST(p1.id AS INT) AS id

    FROM

    OPENXML (@idoc,@rootnode,2) AS p1

    UNION ALL

    SELECT

    CAST(p2.parentid AS INT) AS parentid

    ,CAST(p2.id AS INT) AS id

    FROM

    OPENXML (@idoc,@rootnode,2) AS p2

    JOIN

    cte

    ON CAST(cte.id AS INT) = CAST(p2.ParentID AS INT)

    WHERE

    CAST(p2.parentid AS INT) = @parentid

    )

    INSERT INTO #ChildList

    SELECT *

    FROM cte

    INSERT INTO #NodeList

    SELECT

    #ChildList.RowNum

    ,xmllist.id

    ,xmllist.parentid

    ,xmllist.nodetype

    ,xmllist.localname

    ,CAST(xmllist.[text] AS NVARCHAR(MAX)) AS [text]

    FROM #ChildList

    INNER JOIN

    OPENXML (@idoc,@rootnode,2) AS xmllist

    ON #ChildList.id = xmllist.id

    WHERE

    #ChildList.RowNum > 0

    /* Display the results */

    ;WITH RecursiveNodes(RowNum,id,parentid,nodepath,localname,[text],nodetype)

    AS (

    SELECT

    #NodeList.RowNum

    ,#NodeList.id

    ,#NodeList.parentid

    ,CAST('/' + REPLACE(REPLACE(REPLACE(REPLACE(#NodeList.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath

    ,#NodeList.localname

    ,CAST(#NodeList.[text] AS NVARCHAR(MAX)) AS [text]

    ,0 AS nodetype

    FROM #ChildList

    INNER JOIN

    #NodeList

    ON #ChildList.id = #NodeList.id

    WHERE

    #NodeList.parentid IS NULL

    AND #ChildList.RowNum > 0

    AND #NodeList.RowNum > 0

    UNION ALL

    SELECT

    n.RowNum

    ,n.id

    ,n.parentid

    ,CAST(r.nodepath + '/'+ REPLACE(REPLACE(REPLACE(REPLACE(n.localname,'&',''),'?',''),' ',''),'.','') AS NVARCHAR(255)) AS nodepath

    ,n.localname

    ,n.[text]

    ,n.nodetype

    FROM #NodeList AS n

    INNER JOIN

    RecursiveNodes AS r

    ON n.parentid = r.id

    WHERE

    n.RowNum > 0

    AND r.RowNum > 0

    AND n.parentid >= 0

    )

    SELECT

    ROW_NUMBER() OVER (ORDER BY Result.RowNum) AS RowNum

    ,Result.id

    ,Result.parentid

    ,Result.nodepath

    ,Result.nodetype

    ,Result.nodename

    ,Result.property

    ,Result.value

    ,Result.nodecontents

    FROM

    (

    SELECT

    rn.RowNum

    ,rn.id

    ,rn.parentid

    ,rn.nodepath

    ,(CASE

    WHEN rn.nodetype = 0 THEN 'Root'

    WHEN rn.nodetype = 1 THEN 'Node'

    WHEN rn.nodetype = 2 THEN 'Property'

    ELSE 'Data'

    END) AS nodetype

    ,(CASE

    WHEN rn.nodetype = 0 THEN rn.localname

    WHEN rn.nodetype = 1 THEN rn.localname

    WHEN rn.nodetype = 2 THEN (SELECT TOP(1) localname FROM RecursiveNodes WHERE id = rn.parentid)

    ELSE NULL

    END) AS nodename

    ,(CASE

    WHEN rn.nodetype = 2 THEN rn.localname

    ELSE NULL

    END) AS property

    ,(CASE

    WHEN rn.nodetype = 2 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.id)

    ELSE NULL

    END) AS value

    ,(CASE

    WHEN rn.nodetype = 1 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.id)

    WHEN rn.nodetype = 2 THEN (SELECT TOP(1) [text] FROM RecursiveNodes WHERE parentid = rn.parentid and [text] is not null)

    ELSE NULL

    END) AS nodecontents

    FROM

    RecursiveNodes AS rn

    WHERE

    rn.localname <> '#text'

    ) AS Result

    WHERE

    Result.id >= 0

    AND (Result.id = 0

    OR property IS NOT NULL

    OR value IS NOT NULL

    OR nodecontents IS NOT NULL)

    END

    Output example:

    RowNumidparentidnodepathnodetypenodenamepropertyvaluenodecontents

    10NULL/itemsRootitemsNULLNULLNULL

    232/items/item/idPropertyitemid0001NULL

    342/items/item/typePropertyitemtypeDonut NULL

    452/items/item/nameNodenameNULLNULLCake

    562/items/item/ppuNodeppuNULLNULL0.55

    687/items/item/batter/idPropertybatterid1001Regular

    71110/items/item/batter/idPropertybatterid1002Chocolate

    81413/items/item/batter/idPropertybatterid1003Blueberry

    91716/items/item/topping/idPropertytoppingid5001None

    102019/items/item/topping/idPropertytoppingid5002Glazed

    112322/items/item/topping/idPropertytoppingid5005Sugar

    122625/items/item/topping/idPropertytoppingid5006Sprinkles

    132928/items/item/topping/idPropertytoppingid5003Chocolate

    143231/items/item/topping/idPropertytoppingid5004Maple

  • Matt Miller (#4) (10/19/2012)


    dwain.c (10/19/2012)


    Another article I'm looking forward to reading.

    Wish I'd seen this thread sooner as I might have been able to help. Not that I claim to know that much about XML but somehow I manage to brute-force it and get it to work.:-P

    It is interesting. I just ventured across the 2nd iteration just yesterday. (Sorry Jeff - I don't think I ever saw a followup to the initial thread).

    Jeff - a few things: it's a definitely good start. It is VERY good at identifying and sniffing out the internal structure: the #edge table is a very interesting artifact, and I am very curious to keep looking through it to see how we could improve upon it. This script also fairly good at flattening certain kinds of XML and can be a good starting point to create a staging table..

    There's a big issue to look at however. There seems to be a presumption that the internal structure is very loose and doesn't in fact neatly fit into tables already. This has an interesting side effect: when the XML IS somewhat based on "tables", the flattening process tends to turn the output into something reasonably horrible. There's a distinct triangle join effect: try running the script against this kind of XML (a lot of the XML message I get look a lot like this).

    declare @x varchar(max)

    select @x='<root>

    <locs>

    <location name="New York" id="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"/>

    <location name="Philadelphia" id="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"/>

    <location name="Boston" id="CCCCCCCC-4925-4DB0-88B5-CBC5408CCEDF"/>

    </locs>

    <Mascots>

    <Mascot name="Tiger" id="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Buffalo" id="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Hawk" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Chicken" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Orca" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Corsair" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    </Mascots>

    <Leagues>

    <League name="Major league" id="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"/>

    <League name="Minor League" id="AAAAAAAA-1111-4DB0-88B5-CBC5408CCEDF"/>

    <League name="Bush League" id="AAAAAAAA-2222-4DB0-88B5-CBC5408CCEDF"/>

    </Mascots>

    <roster>

    <teams>

    <team name="blue team" id="1"

    Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"

    locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"

    mascotref="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF">

    the

    <player name="Bob"/>

    cow

    <player name="Joe">

    <age>21</age>

    </player>

    jumped over

    <player name="Gary"/>

    the

    <player name="Bruce"/>

    <player name="Bruce"/>

    moon

    </team>

    <team name="red team" id="2"

    Leagueref="AAAAAAAA-222-4DB0-88B5-CBC5408CCEDF"

    locationref="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"

    mascotref="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF">

    <player name="Mary"/>

    <player name="Susie">

    <age>45</age>

    </player>

    <player name="Gary"/>

    <player name="Shelia"/>

    <player name="Bruce"/>

    </team>

    <team name="grey team" id="3"

    Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"

    locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"

    mascotref="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF">

    <player name="Bob"/>

    <player name="Joe">

    <age>21</age>

    </player>

    <player name="Gary"/>

    <player name="Bruce"/>

    <player name="Bruce"/>

    </team>

    </teams>

    </roster>

    </root>';

    The fly in the ointment is that you basically are getting some "reference" tables outside of the main structure. By blowing out the internal structure (which isn't half bad if you are trying to relate the entities to each other), it's now going to be VERY hard to determine how many of "each thing" needs to be inserted/imported when you actually go to use the flattened output.

    Specifically, you might not pick up on the fact that you actually have 2 players called "Bruce" on several of the teams, since everything ELSE is duplicated as well.Or - there' no good way to know that some of those "reference values" aren't actually referenced elsewhere (with the cartesian product here - it kind of looks like they are).

    Now the example also happens to be using an XML construct called id/idref (essentially PK/FK in XML-speak). I just don't see how you could reasonably sniff that out without a schema.

    Again - great building blocks on discovering the internals of the file without a preexisting schema. I am just not quite sure they're leveraged correctly at this point (or properly put boundaries on what this actually does).I need to keep stewing on this a bit more, but just wanted to post a little feedback on it.

    Haven't done a deep dive on everything above, yet, but wouldn't a big part of the problem be the malformed XML that's included? A snippet from the XML you included from above follows...

    <Mascots>

    <Mascot name="Tiger" id="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Buffalo" id="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Hawk" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Chicken" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Orca" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Corsair" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    </Mascots>

    <Leagues>

    <League name="Major league" id="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"/>

    <League name="Minor League" id="AAAAAAAA-1111-4DB0-88B5-CBC5408CCEDF"/>

    <League name="Bush League" id="AAAAAAAA-2222-4DB0-88B5-CBC5408CCEDF"/>

    </Mascots>

    Notice that the Leagues tag isn't properly closed.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Sorry about that, I was reformatting the XML after I pasted it in, and screwed up the stuff I posted as a result.

    the corrected XML is

    declare @x varchar(max)

    select @x='<root>

    <locs>

    <location name="New York" id="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"/>

    <location name="Philadelphia" id="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"/>

    <location name="Boston" id="CCCCCCCC-4925-4DB0-88B5-CBC5408CCEDF"/>

    </locs>

    <Mascots>

    <Mascot name="Tiger" id="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Buffalo" id="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Hawk" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Chicken" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Orca" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    <Mascot name="Corsair" id="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF"/>

    </Mascots>

    <Leagues>

    <League name="Major league" id="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"/>

    <League name="Minor League" id="AAAAAAAA-1111-4DB0-88B5-CBC5408CCEDF"/>

    <League name="Bush League" id="AAAAAAAA-2222-4DB0-88B5-CBC5408CCEDF"/>

    </Leagues>

    <roster>

    <teams>

    <team name="blue team" id="1"

    Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"

    locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"

    mascotref="DDDDDDDD-4925-4DB0-88B5-CBC5408CCEDF">

    the

    <player name="Bob"/>

    cow

    <player name="Joe">

    <age>21</age>

    </player>

    jumped over

    <player name="Gary"/>

    the

    <player name="Bruce"/>

    <player name="Bruce"/>

    moon

    </team>

    <team name="red team" id="2"

    Leagueref="AAAAAAAA-222-4DB0-88B5-CBC5408CCEDF"

    locationref="BBBBBBBB-4925-4DB0-88B5-CBC5408CCEDF"

    mascotref="EEEEEEEE-4925-4DB0-88B5-CBC5408CCEDF">

    <player name="Mary"/>

    <player name="Susie">

    <age>45</age>

    </player>

    <player name="Gary"/>

    <player name="Shelia"/>

    <player name="Bruce"/>

    </team>

    <team name="grey team" id="3"

    Leagueref="AAAAAAAA-0000-4DB0-88B5-CBC5408CCEDF"

    locationref="AAAAAAAA-4925-4DB0-88B5-CBC5408CCEDF"

    mascotref="FFFFFFFF-4925-4DB0-88B5-CBC5408CCEDF">

    <player name="Bob"/>

    <player name="Joe">

    <age>21</age>

    </player>

    <player name="Gary"/>

    <player name="Bruce"/>

    <player name="Bruce"/>

    </team>

    </teams>

    </roster>

    </root>';

    The closing tag used to be there when I tested (the output is too long to post).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply