Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

A Generic Process to Convert XML Data - Part 1

By Leo Peysakhovich, (first published: 2008/08/25)

There are many articles on XML and how to work with XML structures. SQLServerCentral.com is one of many great sources with an abundance of articles. In this article I would like to show how XML can be used in real life scenarios and how XML can be validated by the process against XML schema registered in SQL Server.

My company collects a lot of statistical data from various applications. The data elements are often either changed or added, and sometimes both. When an adjustment is required, it's required to be modified in many stored procedures because usually the number of parameters is changed.

Recently we decided to create a generic solution that will require applications to pass only few standard parameters: stats data in XML, XML schema name, and application code. The third parameter is irrelevant for this article and will be omitted in further discussion.

Below I have drawn out a general architectural idea. As you can see, the application(s) output data into a general staging XML table. It can be one table per application or one table per server, or table(s) for any other grouping. The decision should be made based on the number of records, number of applications, and practicality. The number of records should be the last concern because application(s) will only insert statistical data and never consume it.

Let start with a simple example of an XML schema and pretend that the statistical data consists of three fields: companyid integer, prodno varchar, and userid integer. To be able to validate the XML value against the registered XML schema, the custom schema has to be created and registered in SQL Server. Remember that the schema registration is per database.

To avoid the schema registration in every database, I created a special staging database per server to hold table(s) with staging statistical data. The same database can be used to hold tables with extracted "raw" statistical data. But I would recommend create another independent database. I will not explain in detail the XML schema because it is not a purpose of the article. Instead I will touch on the necessary parts of the XML syntax's.

 drop XML SCHEMA COLLECTION stat

go

 create XML SCHEMA COLLECTION stat AS

'<?xml version="1.0" encoding="ISO-8859-1" ?>

    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://ns" xmlns="http://ns">

      <xs:element name="statroot">

        <xs:complexType>

            <xs:sequence>

           <xs:element name="statrecord" minOccurs="0">

             <xs:complexType>

                <xs:sequence>

                      <xs:element name="companyid" type="xs:integer" />

                      <xs:element name="prodno" type="xs:string" />

                      <xs:element name="userid" type="xs:integer" />

                  </xs:sequence>

                  </xs:complexType>

             </xs:element>

            </xs:sequence>

      </xs:complexType>

     </xs:element>

  </xs:schema> '

 To verify the proper schema registration we can declare the XML variable type of schema "stat" and assign data.

 declare @xmlvar1 xml (stat);

set @xmlvar1 = '  <xs:statroot xmlns:xs="http://ns">

           <statrecord>

                    <companyid>6</companyid>

                    <prodno>prd1234</prodno>

                    <userid>12345</userid>     

              </statrecord>

            </xs:statroot>     '

As I said before, different applications and different parts of the same application usually output different statistical data. This means that we will have data packed into multiple XML schemas that have to be inserted into one staging table. The following table can be created for such purposes that were described above.

 CREATE TABLE [dbo].[XML_TestTbl](

      [XML_ID] [bigint] IDENTITY(1,1) NOT NULL primary key,

      [Table_NM]  AS (('XML_TestTbl')),

      [XML_Data] [xml] NOT NULL,

      [Schema_NM] [varchar](20) NOT NULL,

[InsertDate] [datetime] NOT NULL default ( getdate() ) )

Computed field [Table_NM]  AS (('XML_TestTbl')) is created for the future generic process and needs to be able to link extracted XML record with staging table in case that multiple staging tables exist.

As I mentioned before, some additional standard fields can be added to the table. For example, we can add application name or module code, or both.

The next step before we extract XML data from table XML_TestTbl into standard table with columns, is to verify that data is matching with registered schema in SQL Server. The simplest way to do it is to allow SQL Server to do the job for us by inserting data into another table with a column of XML data type of the registered schema.

 declare @verify table (XML_Data XML(stat) );

 Insert into @verify(XML_Data)

Select XML_Data from XML_TestTbl where schema_nm = 'stat';

If there are no errors, our verification for this schema is done, and it is possible to load data into regular raw data tables with regular columns by using the following statement as an example.

 SELECT STATEMENT 1  

select 

   [XML_Data].query('//customerid').value('.','integer') as customerid,

   [XML_Data].query('//prodno').value('.','varchar(max)') as prodno,

   [XML_Data].query('//userid').value('.','integer') as userid

from dbo.XML_TestTbl

where schema_nm = 'stat';

As you can see from the statement, each row in XML_TestTbl in field XML_Data has to keep only one row for output to do the easiest XML to column data conversion. To do this, XML inserted into table XML_TestTbl (XML_Data), has to be split to keep only one to one row. For example, it can be done by the code below. However, you can do it by many other ways

 declare @xmlvar1 xml ;

set @xmlvar1 = '

      <root>

           <statrecord>

                    <companyid>6</companyid>

                    <prodno>prd06</prodno>

                    <userid>12345</userid>     

              </statrecord>

      </root>

      <root>

            <statrecord>

                     <companyid>3</companyid>

                    <prodno>prd03</prodno>

                    <userid>12345</userid>     

              </statrecord>

      </root>    '

 select T1.rt.query('.')

from @xmlvar1.nodes('/*/*') T1(rt)

 As you can see, if this is the case and multiple rows can be sent by application(s) with one XML record, the XML structure has to have another tag <root> between <statroot> and <statrecord> tags.

One of the issues with SELECT STATEMENT 1 is that the string value does not have a limit. Although in most cases we do know the size of the inserted data. Lets say, prodno has to be varchar(6). Let us limit the size of the prodno in the registration schema by adding schema's own data type.

 drop XML SCHEMA COLLECTION stat

go

create XML SCHEMA COLLECTION stat AS

'<?xml version="1.0" encoding="ISO-8859-1" ?>

    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://ns" xmlns="http://ns">

      <xs:simpleType name="myST">

       <xs:restriction base="xs:string">

          <xs:maxLength value="6"/>

       </xs:restriction>

    </xs:simpleType>

      <xs:element name="statroot">

        <xs:complexType>

            <xs:sequence>

           <xs:element name="statrecord" minOccurs="0">

             <xs:complexType>

                <xs:sequence>

                      <xs:element name="companyid" type="xs:integer" />

                        <xs:element name="prodno" type="myST" />

                        <xs:element name="userid" type="xs:integer" />

                          </xs:sequence>

                        </xs:complexType>

             </xs:element>

            </xs:sequence>

      </xs:complexType>

     </xs:element>

  </xs:schema>   '

 declare @xmlvar1 xml (stat);

 set @xmlvar1 = '  <xs:statroot xmlns:xs="http://ns">

                  <statrecord>

                  <companyid>6</companyid>

                    <prodno>prd12</prodno>

                    <userid>12345</userid>     

              </statrecord>

            </xs:statroot>  '

   

If we are trying to set the value for prodno = ‘prd1234' where data length is more than 6 characters we will have an error:

 Msg 6926, Level 16, State 1, Line 4

XML Validation: Invalid simple type value: 'prd1234'. Location: /*:statroot[1]/*:statrecord[1]/*:prodno[1]

Now we know how to add schema custom data type from the base one defined in SQL Server.

As you can see from the example, the first and last line for XML requires a name space of registered XML schema. A namespace is required for the custom data type declaration.

To minimize the amount of data passed by application, add some security, and provide generic decoding solution, which incidentally will be the topic of my next article; it is possible to save XML data in table XML_TestTbl without first and last tags like this:

<statrecord>

                  <companyid>6</companyid>

                    <prodno>prd12</prodno>

                    <userid>12345</userid>     

              </statrecord>

 First and last tags will be added by the generic process to verify data and they are not required for the XML data extraction. However, the schema must be registered with root element tag. SQL Server XML meta data tables will be used in the verification process. Let us see how to get a namespace for the first tag

<xs:statroot xmlns:xs="http://ns">

 We need to use sys.xml_schema_collections and sys.xml_schema_namespaces

 select sn.name as namespace_nm, sc.name as schema_nm

from sys.xml_schema_collections sc

                        inner join sys.xml_schema_namespaces sn

                                    on sc.xml_collection_id  = sn.xml_collection_id

where sc.name = 'stat'

 Table sys.xml_schema_collections © 2008 Microsoft

Returns a row per XML schema collection. An XML schema collection is a named set of XSD definitions. The XML schema collection itself is contained in a relational schema, and it is identified by a schema-scoped Transact-SQL name. The following tuples are unique: xml_collection_id, namespace_id, and schema_id, name.

Column name

Data type

Description

xml_collection_id

int

ID of the XML schema collection. Unique within the database.

schema_id

int

ID of the relational schema that contains this XML schema collection.

principal_id

int

ID of the individual owner if different from the schema owner. By default, schema-contained objects are owned by the schema owner. However, an alternate owner may be specified by using the ALTER AUTHORIZATION statement to change ownership.

NULL = No alternate individual owner.

name

sysname

Name of the XML schema collection.

create_date

datetime

Date the XML schema collection was created.

modify_date

datetime

Date the XML schema collection was last altered.

   

Table sys.xml_schema_namespaces © 2008 Microsoft

Returns a row per XSD-defined XML namespace. The following tuples are unique: collection_id, namespace_id, and collection_id, and name.

Column name

Data type

Description

xml_collection_id

int

ID of the XML schema collection that contains this namespace.

name

nvarchar (4000)

Name of XML namespace. Blank name indicates no target namespace.

xml_namespace_id

int

1-based ordinal that uniquely identifies the XML namespace in the database.

   

It means that the next code can be utilized to verify XML data with root element constructed by process.

 declare @schemanm varchar(20), @namespacenm varchar(20), @xmlvar1 varchar(1000);

declare @verify table (XML_Data XML(stat) );

 

set @schemanm = 'stat';

 

select @namespacenm = sn.name  

from sys.xml_schema_collections sc

            inner join sys.xml_schema_namespaces sn

                  on sc.xml_collection_id  = sn.xml_collection_id

where sc.name = @schemanm;

 

set @xmlvar1 = ' <statrecord>

                              <companyid>6</companyid>

                    <prodno>prd12</prodno>

                    <userid>12345</userid>     

              </statrecord>    '

 

Insert into @verify(XML_Data)

select ' <xs:statroot xmlns:xs="' + @namespacenm + '">' +

            cast(@xmlvar1 as varchar(max) ) +

      '</xs:statroot>'

 select xml_data from @verify;

 

Conclusion

This article describes the theory of generic process architecture that allows store data with different input XML schemas into one staging location and then creates a process that will output data into designated standard tables. It is important for such data that is not required for the immediate consumption and that will be used by applications or reports later with some form of aggregation or the other data processed form. In our case, I used this idea to gather statistical data. Different type of statistical "raw" data is gathered from the application(s) into one table and extracted daily. Then, once a week staging table clean up by the process which deleting processed records. In my next article I will show you how to create generic code for the XML data verification and extraction and I will explain some of the many additional options of the process.

Resources:

spGenericXML_Step.txt
Total article views: 16165 | Views in the last 30 days: 5
 
Related Articles
FORUM

updating xml schema collections

adding and removing xml elements

ARTICLE

XML Schema validation in SQL Server

An introduction to the XML Schema Collection and XML validation.

FORUM

help on xml schema

adding attribute to element using xml schema in sql server

FORUM

XML schema collection confused

multiple schema's one xml field

ARTICLE

XML Workshop VI - Typed XML and SCHEMA Collection

The sixth installment of the XML Workshop for SQL Server shows how the XML Schemas fit into the XML ...

Tags
xml    
xsd    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones