Problem Extracting Data from XML File into Field in SQL Server Table

  • Hi all,

    Apologies in advance if I am asking something really obvious but I have an XML file that is provided daily, from which I need to extract the data and import it into a table in SQL Server.  I think the issue I am having is down to namespaces but I don't fully understand how to work with them and despite reading numerous posts, I still cannot get any of the suggested solutions to work with my data so was hoping that somebody may be able to point me in the right direction please?

    OK, the data comes from the following file (I have truncated it to a couple of records for ease of reading):


    <jobs xmlns="https://adview.online/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://adview.online/XMLSchema jobs.xsd ">
    <job>
      <id>123456789</id>
      <url>https://adview.online/dispatch/job/feed/job1</url&gt;
      <title>Parts Analyst</title>
      <desc>Parts Analyst  Job Description</desc>
      <town>West Midlands West Midlands (County) Birmingham</town>
      <salary-from>20000.00</salary-from>
      <salary-to>25000.00</salary-to>
      <salary-type>annum</salary-type>
      <job-type>contract</job-type>
      <job-status>part-time</job-status>
      <category>-N/A-</category>
      <reference />
      <company>Test Company</company>
      <expiration-date>2017-07-24</expiration-date>
    </job>
    <job>
      <id>123456987</id>
      <url>https://adview.online/dispatch/job/feed/data-processor</url&gt;
      <title>Data Processor</title>
      <desc>Data Processor Job Desc</desc>
      <town>North West Cheshire Chester</town>
      <salary-from>15000.00</salary-from>
      <salary-to>20000.00</salary-to>
      <salary-type>annum</salary-type>
      <job-type>permanent</job-type>
      <job-status>full-time</job-status>
      <category>Information Technology</category>
      <reference>ABC123</reference>
      <company>Test Company 2</company>
      <expiration-date>2017-07-24</expiration-date>
    </job>
    </jobs>


    The code I am using below works if I remove the xmlns="https://adview.online/XMLSchema" text from the <jobs> tag at the start of the XML file, and successfully returns the data for each job, however with the xmlns code in place, I get no records returned (I have simplified the code to just do a select statement rather than an insert/update one for testing purposes):  There are 2 lines that I have commented out on the sp_xml_preparedocument line and the openxml line as I was using these to try to get the namespaces to work but they just return all null values rather than the actual values in the fields (so I think I was getting warmer but not quite there).

    USE MyDB
    GO
    DROP TABLE xmlimport
    CREATE TABLE XMLImport
    (
    Id INT IDENTITY PRIMARY KEY,
    XMLData XML,
    LoadedDateTime DATETIME
    )
    INSERT INTO XMLImport(XMLData, LoadedDateTime)
    SELECT CONVERT(XML, BulkColumn) AS BulkColumn, GETDATE()
    FROM OPENROWSET(BULK 'E:\imports\testfile1.xml', SINGLE_BLOB) AS x;
    SELECT * FROM XMLImport
    GO
    DECLARE @XML AS XML, @hDoc AS INT, @SQL NVARCHAR (MAX)
    SELECT @XML = XMLData FROM XMLImport
    EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML
    --EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML, N'<root xmlns:n="https://adview.online/XMLSchema" />'
    SELECT title, company
    FROM OPENXML(@hDoc, '/jobs/job', 2)
    --FROM OPENXML(@hDoc, '/n:jobs/n:job', 2)
    WITH
    (
    [title] varchar(170) 'title',
    [company] varchar(100) 'company'
    )
    EXEC sp_xml_removedocument @hDoc
    GO


    As I said, I'm really not that familiar with XML (which unfortunately will probably be quite obvious having read this message!!) so any advice would be very much appreciated as I am running out of ideas and really not sure what else to try.

    Thanks,

    Ian

  • You can much more easily get the data out of your XML by using XQuery on your table. So the below would be used after you insert your data into your XMLImport table:
    WITH XMLNAMESPACES(DEFAULT 'https://adview.online/XMLSchema') --Declare the Namespace
    SELECT c.j.value('(title/text())[1]','varchar(50)') AS Title, --Returns the value of the first 'title' node
           c.j.value('(company/text())[1]','varchar(50)') AS Company --Returns the value of the first 'company' node
    FROM XMLImport X
      CROSS APPLY X.XMLColumn.nodes('/jobs/job') c(j); --Cross apply to the jobs/job node, so that two rows (with the sample data) are created

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Hi Thom,

    Sorry for the delay replying.  Thank you very much for your help - the solution you suggested worked fine and imported the data correctly.

    Just one thing I am a little unsure on - in the code you posted, each value is prefixed with c.j..  Could I just ask what this refers to and whether it always needs to be c.j. or if it is just an alias and can be any characters?  Just so I know for when I am replicating this for other scripts that I'm working on.

    Thanks again for your help which I am extremely grateful for.

    Best regards,

    Ian

  • c(j) was just an alias. You can use what ever you like for it (for example jobs(job), and then you use jobs.job.value).

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply