SQLServerCentral Article

XML Workshop : Utilizing Relational Data In XML Files

,

For today’s XML workshop, we will be reading and importing datasets from multiple XML files containing relational data. Traditional methods of transferring large datasets are still prevalent in today’s connected world. Every database administrator will inevitably be required to script out or build an ad hoc extract, transform and load process to migrate one or multiple datasets from formatted files to their managed environment. Some will even have the luxury of dealing with industry-standard file formats, such as XML. Let us explore a quick and easy method of importing XML files to a similarly structured user table.

Importing XML Data Sources

The following are two datasets formatted in XML. Each includes three patients, and all six patients have a unique identification number that will be loaded into a user table. For the purpose of this example, the identification numbers have a one-to-one relationship with the primary key of our destination. You may copy and paste these examples into files and store them in a working directory ( C:\XML\ ) on the server hosting your SQL Server instance.

[xml-example_01.xml]

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <patient>
        <id>1</id>
        <name>Bob</name>
        <birthdate>1999-01-10</birthdate>
    </patient>
    <patient>
        <id>2</id>
        <name>Sam</name>
        <birthdate>1989-05-11</birthdate>
    </patient>
    <patient>
        <id>3</id>
        <name>Richard</name>
        <birthdate>1990-12-01</birthdate>
    </patient>
</root>

[xml-example_02.xml]

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <patient>
        <id>4</id>
        <name>Matt</name>
        <birthdate>1991-01-13</birthdate>
    </patient>
    <patient>
        <id>5</id>
        <name>Joe</name>
        <birthdate>1982-06-14</birthdate>
    </patient>
    <patient>
        <id>6</id>
        <name>Test Patient</name>
        <birthdate>1993-12-09</birthdate>
    </patient>
</root>

Before importing from the XML sources, verify that the contents of each file is in a valid format. Execute the following T-SQL query to extract the XML contents and convert the binary result into a readable XML string.

SELECT  CONVERT(XML, BulkColumn) [XmlSource]
FROM    OPENROWSET(BULK 'C:\XML\xml-example_01.xml', SINGLE_BLOB) t

The OPENROWSET() table function may be used as an ad hoc method of creating an OLE DB data connection to another database or, in our case, an external file. For today’s example, we are accessing the file using a BULK operation in order to efficiently pull the contents of the file into memory. Should the XML be malformed or contain an invalid character, the conversion will fail with an error message like the two examples below :

Msg 9436, Level 16, State 1, Line 2

XML parsing: line 17, character 7, end tag does not match start tag

or

Msg 9455, Level 16, State 1, Line 2

XML parsing: line 5, character 4, illegal qualified name character

When working with source files in a production environment I suggest storing the XML string in an archive for auditing purposes. This will allow for easy retrieval in the future. The additional step is especially useful should you be applying any data transformations during the load process.

Next, read the parsed XML string.

DECLARE @idoc INT
DECLARE @result XML
SELECT  @result = CONVERT(XML, BulkColumn)
FROM    OPENROWSET(BULK 'C:\XML\xml-example_01.xml', SINGLE_BLOB) t
EXEC sys.sp_xml_preparedocument @idoc OUTPUT, @result;
SELECT  *
FROM    OPENXML(@idoc, 'root/patient', 2)
              WITH ( id INT, name VARCHAR(200), birthdate DATETIME ) t

The result set ought to look familiar with a defined schema :

id   name       birthdate

---- ---------- -----------------------

1    Bob        1999-01-10 00:00:00.000

2    Sam        1989-05-11 00:00:00.000

3    Richard    1990-12-01 00:00:00.000

Let us examine the code. First, an XML variable ( @result ) is declared to store the contents of the XML source file and then parsed into memory using the sp_xml_preparedocument system stored procedure. An integer variable ( @idoc ) is declared as well to store a session-specific identity that points to the internally parsed XML document. The pointer is used by the OPENXML() table function that allows the user to easily define a schema for the XML.

Notice that in our example we use the “element-centric” option ( 2 ) when parsing the XML at the ‘root/patient’ path with the OPENXML() table function. Our example source files use only text node values. Should the examples use only attribute values, we would need to use the “attribute-centric” option ( 1 ). Mixed mode ( 8 ) is beyond the scope of this workshop, but I suggest reading more about schema declaration with x-path definitions on MSDN.

With either element- or attribute-centric parsing we need to define an output schema. Our example source files follow a simple and balanced XML schema. In other words, each parent node ( [patient] ) has the same child nodes ( [id], [name], and [birthdate] ). The script above translates each child node as a field, which we are going to import into a table with a matching schema.

CREATE TABLE [dbo].[Test]
    (
      [Id] [int] NOT NULL ,
      [Name] [varchar](200) NULL ,
      [BirthDate] [datetime] NULL ,
      PRIMARY KEY CLUSTERED ( [Id] ASC )
    )

Run the query above to create a user table that matches the XML source, and then execute the script below to import the contents of the XML file.

DECLARE @idoc INT
DECLARE @result XML
SELECT  @result = CONVERT(XML, BulkColumn)
FROM    OPENROWSET(BULK 'C:\XML\xml-example_01.xml', SINGLE_BLOB) t
EXEC sys.sp_xml_preparedocument @idoc OUTPUT, @result;
INSERT  INTO [dbo].[Test]
        ( [Id] ,
          [Name] ,
          [BirthDate]
        )
        SELECT  [id] ,
                [name] ,
                [birthdate]
        FROM    OPENXML(@idoc, 'root/patient', 2)
              WITH ( id INT, name VARCHAR(200), birthdate DATETIME ) t
EXEC sys.sp_xml_removedocument @idoc;

The query will read the results into the user table. Notice that sp_xml_removedocument is executed at the end of the transaction. The system stored procedure will remove the parsed XML data from memory and prevent resource starvation when using multiple large XML files in a single session. Once complete, execute the following script to import the second file :

DECLARE @idoc INT
DECLARE @result XML
SELECT  @result = CONVERT(XML, BulkColumn)
FROM    OPENROWSET(BULK 'C:\XML\xml-example_02.xml', SINGLE_BLOB) t
EXEC sys.sp_xml_preparedocument @idoc OUTPUT, @result;
INSERT  INTO [dbo].[Test]
        ( [Id] ,
          [Name] ,
          [BirthDate]
        )
        SELECT  [id] ,
                [name] ,
                [birthdate]
        FROM    OPENXML(@idoc, 'root/patient', 2)
              WITH ( id INT, name VARCHAR(200), birthdate DATETIME ) t
EXEC sp_xml_removedocument @idoc;

Next, run a select query on the user table.

SELECT  [Id] ,
        [Name] ,
        [BirthDate]
FROM    [dbo].[Test]

Results :

Id   Name              BirthDate

---- ----------------- -----------------------

1    Bob               1999-01-10 00:00:00.000

2    Sam               1989-05-11 00:00:00.000

3    Richard           1990-12-01 00:00:00.000

4    Matt              1991-01-13 00:00:00.000

5    Joe               1982-06-14 00:00:00.000

6    Test Patient      1993-12-09 00:00:00.000

All patients were successfully loaded into the new user table with unique primary key values. If the table had an identity column, setting the IDENTITY_INSERT option ON prior to the load will allow a full import from the source files, identities and all, should no other record exist with the same primary key value. Do note that there are limitations to this method of importing an XML source. The script above will only import one file at a time, and the OPENXML() table function will only read up to 250 elements. Overcoming these limitations will require moving into SSIS ETL builds, PowerShell scripting, or application development.

Rate

4.53 (17)

You rated this post out of 5. Change rating

Share

Share

Rate

4.53 (17)

You rated this post out of 5. Change rating