Importing an XML file into a sql table

  • dallibabs

    Say Hey Kid

    Points: 706

    Edit to make things more understandable, I wish to import my XML file into an sql table.

    The XML  has a structure like this

    <ICECAT-interface xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://data.icecat.biz/xsd/ICECAT-supplier_mapping.xsd">
        <SupplierMappings Generated="20190215010005">
            <SupplierMapping supplier_id="1" name="HP">
                <Symbol>COMPAQ</Symbol>
               <Symbol>HP SUPPL</Symbol>
                <Symbol>HEWLETT PACKARD</Symbol>
                <Symbol>HP</Symbol>
                <Symbol>hp systems</Symbol>
                <Symbol>hp peripherals</Symbol>
    </SupplierMapping>
            <SupplierMapping supplier_id="2" name="Toshiba">
                <Symbol>TOSHIBA</Symbol>
                <Symbol>TOSHANDH</Symbol>
                <Symbol>TOSCNOTE</Symbol>
    </SupplierMapping>

    I wish to pull the <SupplierMapping> supplier_id and name, but also the Child <Symbol> as well so as for the example above my table would look like

    ---------------------------------------
    :supplier_id: name : Symbol :
    :1                :HP     : COMPAQ:
    :1                :HP     : HP SUPPL
    :1                :HP     : HEWLETT PACKARD
    :1                :HP     : HP
    :1                :HP     : hp systems
    :1                :HP     : hp peripherals.
    :2                :Toshiba     : TOSHIBA
    :2                :Toshiba     : TOSHANDH
    :2                :Toshiba     : TOSCNOTE
    ---------------------------------------------------
    However with the below sql it returns results like this.

    ---------------------------------------
    :supplier_id: name : Symbol :
    :1                :HP     : COMPAQ:
    :2                :Toshiba     : TOSHIBA
    ---------------------------------------------------
    ie it returns the first <Symbol> entry but does not repeat.

    this is what i am trying to achieve but as im new to XML its hardd putting this into words.

    thanks you for any tips or guides.

    Hi all I am trying to import an XML file into my sql server

    I have used the below script to import the file as a single blob

    USE ICECATtesting
    GO
    drop table XMLwithOpenXML

    CREATE TABLE XMLwithOpenXML
    (
    Id INT IDENTITY PRIMARY KEY,
    XMLData XML,
    LoadedDateTime DATETIME
    )

    INSERT INTO XMLwithOpenXML(XMLData, LoadedDateTime)
    SELECT CONVERT(XML, BulkColumn) AS BulkColumn, GETDATE()
    FROM OPENROWSET(BULK 'C:\Scripts\icecat\supplier_mapping.xml', SINGLE_BLOB) AS x;

    SELECT * FROM XMLwithOpenXML

    I then run the following code to query the blob


    USE ICECATtesting
    GO

    DECLARE @XML AS XML, @hDoc AS INT, @SQL NVARCHAR (MAX)

    SELECT @XML = XMLData FROM XMLwithOpenXML

    EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML

    SELECT supplier_id, name, Symbol
    FROM OPENXML(@hDoc, 'ICECAT-interface/SupplierMappings/SupplierMapping')
    WITH
    (
    supplier_id [varchar](150) '@supplier_id',
    name [varchar](150) '@name',
    Symbol [varchar](250) 'Symbol'
    )

    EXEC sp_xml_removedocument @hDoc
    GO

    this returns results but I know that there are missing rows where the Symbol XML element has multiple entries see sample of some of the file below

    <SupplierMapping supplier_id="4536" name="Joby">    <Symbol>Joby</Symbol>    </SupplierMapping>    <SupplierMapping supplier_id="4537" name="ACT">    <Symbol>ACT</Symbol>    <Symbol>ADVANCED CABLE TECH</Symbol>    <Symbol>ADVANCED CABLE TECHNOLOGY</Symbol>    <Symbol>IEC LOCK</Symbol>    </SupplierMapping>


    <SupplierMapping supplier_id="4536" name="Joby">    <Symbol>Joby</Symbol>   </SupplierMapping>   <SupplierMapping supplier_id="4537" name="ACT">    <Symbol>ACT</Symbol>    <Symbol>ADVANCED CABLE TECH</Symbol>    <Symbol>ADVANCED CABLE TECHNOLOGY</Symbol>    <Symbol>IEC LOCK</Symbol>   </SupplierMapping>

    <SupplierMapping supplier_id="4536" name="Joby">    <Symbol>Joby</Symbol>    </SupplierMapping>    <SupplierMapping supplier_id="4537" name="ACT">    <Symbol>ACT</Symbol>    <Symbol>ADVANCED CABLE TECH</Symbol>    <Symbol>ADVANCED CABLE TECHNOLOGY</Symbol>    <Symbol>IEC LOCK</Symbol>    </SupplierMapping>

    The main problem is the supplier ID and name have multiple <Symbol> entries  and I would like to return all of the data.

    Any suggestions please.

    sorry im not sure how to paste the xml in a friendly format.

  • Scott Coleman

    One Orange Chip

    Points: 27406

    You will only get as many rows as the nodes at the level of your XPATH expression.  If your XPATH expression tells it to drill down to SupplierMapping nodes, then it will only find the first Symbol node under that.  You need to drill down to <Symbol> in a .nodes() function to get all the rows.

    One approach is to add another XPATH to return all the Symbol nodes under each SupplierMapping node

    SELECT  SupplierMapping.value('@supplier_id', 'int'),
            SupplierMapping.value('@name', 'varchar(100)'),
            Symbol.value('.[1]', 'varchar(100)')
    FROM @sm.nodes('/ICECAT-interface/SupplierMappings/SupplierMapping') x(SupplierMapping)
    CROSS APPLY SupplierMapping.nodes('Symbol') sm(Symbol);

    Another approach would be to go straight to Symbol nodes in the first place, then use parent references to get the other attributes

    SELECT  x.Symbol.value('../@supplier_id', 'int'),
            x.Symbol.value('../@name', 'varchar(100)'),
            x.Symbol.value('.[1]', 'varchar(100)')
    FROM @sm.nodes('/ICECAT-interface/SupplierMappings/SupplierMapping/Symbol') x(Symbol)

    Both queries return 9 rows
    1 HP    COMPAQ
    1 HP    HP SUPPL
    1 HP    HEWLETT PACKARD
    1 HP    HP
    1 HP    hp systems
    1 HP    hp peripherals
    2 Toshiba  TOSHIBA
    2 Toshiba  TOSHANDH
    2 Toshiba  TOSCNOTE

  • Jonathan Szeto

    SSC Enthusiast

    Points: 106

    In your OPENXML query, you need to drill all the way down to the Symbol node of your node tree. So it should look like this:


    SELECT supplier_id, name, Symbol
    FROM OPENXML(@hDoc, 'ICECAT-interface/SupplierMappings/SupplierMapping/Symbol')
    WITH
    (
    supplier_id [varchar](150) '../@supplier_id',
    name [varchar](150) '../@name',
    Symbol [varchar](250) '.'
    )

    The single dot (".") in the XML path means to use the current node level (Symbol). The double dots ("..") instructs SQL Server to move up to the parent node (SupplierMapping) and then obtain the values from the appropriate attributes.

  • dallibabs

    Say Hey Kid

    Points: 706

    Thankyou both

    Scott Coleman and Jonathan Szeto for your replies.

    I used Jonathan's code as it just pasted straight in and worked. I under stand what you said about moving up a branch in the tree and using .. to move back to the parent node, I did try this before but as I was trying to use 'Symbol' to select the Symbol node it wasnt working. I also tried "./@Symbol" , "@Symbol" but obviously neither were correct.

    Thankyou Jonathan for your help.

    @scott 

    I also tried both of your examples.

    however I kept getting errors, Was I supposed to replace  the @hDoc with the @sm-2 variable ?
    I tried this and kept getting errors , at first saying I should declare the variable and when I did replace @hDoc  with @sm-2 I got The XMLDT method 'nodes' can only be invoked on columns of type xml.


    USE ICECATtesting
    GO

    DECLARE @XML AS XML, @sm-2 AS INT, @SQL NVARCHAR (MAX)

    SELECT @XML = XMLData FROM XMLwithOpenXML

    EXEC sp_xml_preparedocument @sm-2 OUTPUT, @XML

    SELECT SupplierMapping.value('@supplier_id', 'int'),
       SupplierMapping.value('@name', 'varchar(100)'),
       Symbol.value('.[1]', 'varchar(100)')
    FROM @sm.nodes('/ICECAT-interface/SupplierMappings/SupplierMapping') x(SupplierMapping)
    CROSS APPLY SupplierMapping.nodes('Symbol') sm(Symbol);

    EXEC sp_xml_removedocument @sm-2
    GO

    did I use the code incorrectly or am I misunderstanding ?, I already have a result so thank you but I would like to learn from your example as well.

    thanks again for your time

    John.

  • Eirikur Eiriksson

    SSC Guru

    Points: 182369

    Scott Coleman - Monday, February 18, 2019 12:42 PM

    You will only get as many rows as the nodes at the level of your XPATH expression.  If your XPATH expression tells it to drill down to SupplierMapping nodes, then it will only find the first Symbol node under that.  You need to drill down to <Symbol> in a .nodes() function to get all the rows.

    One approach is to add another XPATH to return all the Symbol nodes under each SupplierMapping node

    SELECT  SupplierMapping.value('@supplier_id', 'int'),
            SupplierMapping.value('@name', 'varchar(100)'),
            Symbol.value('.[1]', 'varchar(100)')
    FROM @sm.nodes('/ICECAT-interface/SupplierMappings/SupplierMapping') x(SupplierMapping)
    CROSS APPLY SupplierMapping.nodes('Symbol') sm(Symbol);

    Strongly recommend that you use the text() function to retrieve the element value, prevents a reconstruct of the XML for the output.
    😎


    SELECT SupplierMapping.value('@supplier_id', 'int'),
       SupplierMapping.value('@name', 'varchar(100)'),
       Symbol.value('(./text())[1]', 'varchar(100)')
    FROM @TXML.nodes('/ICECAT-interface/SupplierMappings/SupplierMapping') x(SupplierMapping)
    CROSS APPLY SupplierMapping.nodes('Symbol') sm(Symbol);

  • Scott Coleman

    One Orange Chip

    Points: 27406

    dallibabs - Saturday, February 23, 2019 4:26 PM

    @scott 

    I also tried both of your examples.

    however I kept getting errors, Was I supposed to replace  the @hDoc with the @sm-2 variable ?
    I tried this and kept getting errors , at first saying I should declare the variable and when I did replace @hDoc  with @sm-2 I got The XMLDT method 'nodes' can only be invoked on columns of type xml.

    OPENXML requires sp_xml_preparedocument and sp_xml_removedocument, and operates on an internal version of the XML pointed to by the @hdoc integer variable.  All prepared documents in all sessions on the server are stored in the same memory structure.

    The XML functions .nodes(), .query(), and .value() operate directly on an XML variable or column value, it does not need to be prepared.  They only work with XML variables or columns, and their names are case-sensitive.  nodes() is a table function, and must have both a table and column alias ( "FROM @xyz.nodes('/...') xtable(xnode)" ).  If you use these, forget about prepared documents or hdoc variables.

    I used "DECLARE @sm-2  XML" in my script to hold the XML text, in your original post this was the @XML variable.  In Eirikur's post it is @TXML, and he is absolutely correct about using '(./text())[1]' for a performance boost.  I just wanted to point you to using '../@name' parent expressions or the CROSS APPLY alternative, and not go into that level of detail.

    I had to modify your XML to get it to work, as I recall it is missing closing tags at the end for </SupplierMappings> and </ICECAT-Interface>.  But if the XML is valid, the queries I posted should work.

  • dallibabs

    Say Hey Kid

    Points: 706

    Scott Coleman - Tuesday, February 26, 2019 12:58 PM

    dallibabs - Saturday, February 23, 2019 4:26 PM

    @scott 

    I also tried both of your examples.

    however I kept getting errors, Was I supposed to replace  the @hDoc with the @sm-2 variable ?
    I tried this and kept getting errors , at first saying I should declare the variable and when I did replace @hDoc  with @sm-2 I got The XMLDT method 'nodes' can only be invoked on columns of type xml.

    OPENXML requires sp_xml_preparedocument and sp_xml_removedocument, and operates on an internal version of the XML pointed to by the @hdoc integer variable.  All prepared documents in all sessions on the server are stored in the same memory structure.

    The XML functions .nodes(), .query(), and .value() operate directly on an XML variable or column value, it does not need to be prepared.  They only work with XML variables or columns, and their names are case-sensitive.  nodes() is a table function, and must have both a table and column alias ( "FROM @xyz.nodes('/...') xtable(xnode)" ).  If you use these, forget about prepared documents or hdoc variables.

    I used "DECLARE @sm-2  XML" in my script to hold the XML text, in your original post this was the @XML variable.  In Eirikur's post it is @TXML, and he is absolutely correct about using '(./text())[1]' for a performance boost.  I just wanted to point you to using '../@name' parent expressions or the CROSS APPLY alternative, and not go into that level of detail.

    I had to modify your XML to get it to work, as I recall it is missing closing tags at the end for </SupplierMappings> and </ICECAT-Interface>.  But if the XML is valid, the queries I posted should work.

    Thankyou after changing the XML variable it worked fine, thanks for pointing out where I went wrong, as its quite a small amount of data both ways work very fast but it is nice to know how to go about it in a couple of different ways.

    thankyou both.

  • dallibabs

    Say Hey Kid

    Points: 706

    I have another file I am trying to import all is fine using your methods but there are multiple repeating child entries <Description> see file below but Wish to only extract the first entry, IE the first row of description and its values ID and Value.  I have played around for hours now and have googled for a long time too but im not sure how to put this into words.

    If anyone can give me any direction it would be much appreciated.

    thanks again.

    example of xml

        <?xml version="1.0" encoding="UTF-8"?>
        <!--source: Icecat.biz 2018-->
        <ICECAT-interface>
            <Response Date="Wed Jul 11 19:00:01 2018" ID="0" Request_ID="1531328401" Status="1">
                <CategoriesList>
                    <Category ID="1" LowPic="" Score="" Searchable="0" ThumbPic="" UNCATID="" Visible="0">
                        <ParentCategory ID="1">
                        </ParentCategory>
                    </Category>
                    <Category ID="2" LowPic="" Score="114668888" Searchable="0" ThumbPic="" UNCATID="43160000" Visible="0">
                        <ParentCategory ID="2">   
                        </ParentCategory>
                         <Description ID="548795" Value="Instructions for a computer's processor to perform specific operations e.g. system software such as Windows and iOS, application software such as internet browsers and apps." langid="1"/>
                        <Description ID="2" Value="" langid="2"/>
                        <Description ID="2645" Value="" langid="3"/>
                        <Description ID="12301" Value="" langid="4"/>
                </Category>
                <Category ID="4" LowPic="http://images.icecat.biz/img/low_pic/4-5517.jpg" Score="1195709" Searchable="0" ThumbPic="http://images.icecat.biz/thumbs/CAT4.jpg"     UNCATID="43161501" Visible="0">
                   <ParentCategory ID="3"> 
                   </ParentCategory>
                    <Description ID="1723" Value="A lot of applications use large complicated databases for storing information. The softw these databases fast and easy." langid="1"/>
                    <Description ID="1724" Value="Voor veel toepassingen kans bij te houden. De software uit deze categorie is ontworpen om dit snel en efficiënt te doen." langid="2"/>
                    <Description ID=="2888983" Value="PC-Dienstprogramme" langid="53"/>
                </Category>
            </CategoriesList>
        </Response>
    </ICECAT-interface>

    Here is my sql to extract the data

    USE ICECATtesting
    GO

    DECLARE @sm-2 as XML , @hDoc AS INT, @SQL NVARCHAR (MAX)

    SELECT @sm-2 = XMLData FROM XMLwithOpenXML

    EXEC sp_xml_preparedocument @hDoc OUTPUT, @sm-2

    SELECT CategoriesList.value('@ID', 'int') as ID,
       CategoriesList.value('@LowPic', 'varchar(100)') as LowPic,
            CategoriesList.value('@Score', 'int') as Score,
       CategoriesList.value('@Searchable', 'int') as Searchable,
            CategoriesList.value('@ThumbPic', 'varchar(250)') as ThumbPic,
       CategoriesList.value('@UNCATID', 'int') as UNCATID,
       CategoriesList.value('@Visible', 'int') as Visible,
            ParentCategory.value('@ID' , 'int') as ParentID,
            Description.value('@ID' , 'int') as DescriptionID,
            Description.value('@Value' , 'varchar(max)') as DescriptionValue
            FROM @sm.nodes
            
            ('/ICECAT-interface/Response/CategoriesList/Category') x(CategoriesList)
    CROSS APPLY CategoriesList.nodes('ParentCategory') sm(ParentCategory)
    CROSS APPLY CategoriesList.nodes('Description') sm2(Description);

    EXEC sp_xml_removedocument @hDoc

    GO

  • Eirikur Eiriksson

    SSC Guru

    Points: 182369

    The XML you posted is not valid and cannot be converted to the XML data type in SQL Server, can you please post a valid example?
    😎

  • dallibabs

    Say Hey Kid

    Points: 706

    Eirikur Eiriksson - Wednesday, February 27, 2019 2:11 AM

    The XML you posted is not valid and cannot be converted to the XML data type in SQL Server, can you please post a valid example?
    😎

    Apologies i was trying to cut down a large file and got some of the tags in the wrong places,

    Thanks.

  • Scott Coleman

    One Orange Chip

    Points: 27406

    The tags were OK, but the XML header said it was UTF-8 and there were illegal characters in it (not in proper UTF-8 encoding).
    Specifically the fancy e in efficiënt.  I removed the encoding="UTF-8" declaration and it worked fine.

    If you're going to parse your XML with .nodes() then remove the sp_xml_preparedocument and sp_xml_removedocument functions.  They are not needed.

    Only use added levels of "CROSS APPLY xyz.nodes()" if you want extra rows.  You don't need them just to parse sub-elements.  As your XML shows, any node can appear multiple times at the same level, and the .value() function will not deal with multiple values.  So you need to add "[1]" to element names to clarify you want the first node of that name.  (You don't need to do this for attributes, attributes must have unique names.)  (You can put more complex expressions inside the square brackets if you don't always want the first one, such as looking for the Description element that has the longest Value attribute.)

    So if you want one row per category element, you don't need any CROSS APPLY to get the sub-elements.  Just more precise XPATH expressions.SELECT CategoriesList.value('@ID', 'int') as ID,
       CategoriesList.value('@LowPic', 'varchar(100)') as LowPic,
       CategoriesList.value('@Score', 'int') as Score,
       CategoriesList.value('@Searchable', 'int') as Searchable,
       CategoriesList.value('@ThumbPic', 'varchar(250)') as ThumbPic,
       CategoriesList.value('@UNCATID', 'int') as UNCATID,
       CategoriesList.value('@Visible', 'int') as Visible,
       CategoriesList.value('ParentCategory[1]/@ID', 'int') as ParentID,
       CategoriesList.value('Description[1]/@ID', 'int') as DescriptionID,
       CategoriesList.value('Description[1]/@Value', 'varchar(max)') as DescriptionValue
    FROM @sm.nodes('/ICECAT-interface/Response/CategoriesList/Category') x(CategoriesList);

  • dallibabs

    Say Hey Kid

    Points: 706

    Scott thank you so much, i have been messing with this all day trying multiple cross joins and then trying to filter the results with a where =.

    in every tutorial i found they kept saying use sp_xml_preparedocument and sp_xml_removedocument and as that worked before I thought it was needed.

    Im going to play with this some more to get some more results so it sticks in my brain.

    Thankyou again.🙂

  • Solomon Rutzky

    SSCoach

    Points: 16175

    Scott Coleman - Wednesday, February 27, 2019 1:25 PM

    The tags were OK, but the XML header said it was UTF-8 and there were illegal characters in it (not in proper UTF-8 encoding).
    Specifically the fancy e in efficiënt.  I removed the encoding="UTF-8" declaration and it worked fine.

    Hi there. Just to be clear, the error actually is the XML syntax and not the encoding. The UTF-8 encoding is fine as long as you use VARCHAR and not NVARCHAR for this XML document. So if you are setting directly to an XML variable, you need to not use an upper-case "N" prefix on the string literal. Using an upper-case "N" makes it NVARCHAR which is a conflict for the declaration stating that it is UTF-8, which is an 8-bit encoding, not 16 bit like NVARCHAR. Or, you can remove the XML declaration (as you did), in which case it will work in both cases, with and without the upper-case "N" prefix, or in a VARCHAR or NVARCHAR column / variable.

    The actual error is towards the end of the XML document, specifically the following element:

    <Description ID=="2888983" 

    which results in the following error:

    Msg 9413, Level 16, State 1, Line XXXXX
    XML parsing: line 22, character 17, A string literal was expected

    The problem is the extraneous "=" after the name of the "ID" attribute. Remove the extra "=" and all should be good to go.

    Take care,
    Solomon..

    SQL#https://SQLsharp.com/ ( SQLCLR library ofover 340 Functions and Procedures)
    Sql Quantum Lifthttps://SqlQuantumLift.com/ ( company )
    Sql Quantum Leaphttps://SqlQuantumLeap.com/ ( blog )
    Info sitesCollations     •     Module Signing     •     SQLCLR

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply