Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

A Generic Process to Convert XML Data - Part 2

By Leo Peysakhovich, (first published: 2008/08/26)

My previous article, A Generic Process to Convert XML Data - Part 1, described the theory of generic process architecture that allows store data with different input XML schemas into one staging location and then creates a process that will output data into designated standard tables. It is important for the data that is not required for the immediate consumption and that will be used by applications or reports later with some form of aggregation or the other in the processed form. In our case, I am using this idea to gather statistical data. Different types of statistical RAW data is gathered from the application(s) into one table and extracted daily. Then once a week the staging table is cleaned up by the process which deletes processed records.

Let see how we can create generic process. The point of the process is to extract data based on the standard parameters from application e.g. from the staging table XML_TestTbl by the Schema_NM.

CREATE TABLE [dbo].[XML_TestTbl](

[XML_ID] [bigint] IDENTITY(1,1) NOT NULL primary key,

[Table_NM] AS (('XML_TestTbl')),

[XML_Data] [xml] NOT NULL,

[Schema_NM] [varchar](20) NOT NULL,

[InsertDate] [datetime] NOT NULL default ( getdate() ) )

In our example stat schema, process requires root element, data elements, and data types to be able to create select statement from the staging table and root element with namespace name for verification purposes.

select

[XML_Data].query('//customerid').value('.','integer') as customerid,

[XML_Data].query('//prodno').value('.','varchar(max)') as prodno,

[XML_Data].query('//userid').value('.','integer') as userid

from dbo.XML_TestTbl

where schema_nm = 'stat';

To get the process required XML structure we have to select information from the multiple system views.

select sco.name, sn.name, sc.name, sc1.name, sc2.name, sc1.kind,

sc1.kind_desc, sf.kind, sf.kind_desc, sf.is_fixed, sf.value

from sys.xml_schema_components sc

inner join sys.xml_schema_component_placements scp on scp.xml_component_id = sc.xml_component_id

inner join sys.xml_schema_components sc1 on scp.placed_xml_component_id = sc1.xml_component_id

inner join sys.xml_schema_components sc2 on sc1.base_xml_component_id = sc2.xml_component_id

inner join sys.xml_schema_collections sco on sco.xml_collection_id = sc.xml_collection_id

inner join sys.xml_schema_facets sf on sf.xml_component_id = sc1.xml_component_id

inner join sys.xml_schema_namespaces sn on sn.xml_collection_id = sc.xml_collection_id

where sco.name = 'stat';

sys.xml_schema_component_placements © 2008 Microsoft

Returns a row per placement for XML schema components.

Column name

Data type

Description

xml_component_id

int

ID of the XML schema component that owns this placement.

placement_id

int

ID of the placement. This is unique within the owning XML schema component.

placed_xml_component_id

int

ID of the placed XML schema component.

is_default_fixed

bit

1 = The default value is a fixed value. This value cannot be overridden in an XML instance.

0 = The value can be overridden.(default)

min_occurrences

int

Minimum number of placed component occurs.

max_occurrences

int

Maximum number of placed component occurs.

default_value

nvarchar (4000)

Default value if one is supplied. Is NULL if a default value is not supplied.

sys.xml_schema_collections © 2008 Microsoft

Returns a row per XML schema collection. An XML schema collection is a named set of XSD definitions. The XML schema collection itself is contained in a relational schema, and it is identified by a schema-scoped Transact-SQL name. The following tuples are unique: xml_collection_id, namespace_id, and schema_id, name.

Column name

Data type

Description

xml_collection_id

int

ID of the XML schema collection. Unique within the database.

schema_id

int

ID of the relational schema that contains this XML schema collection.

principal_id

int

ID of the individual owner if different from the schema owner. By default, schema-contained objects are owned by the schema owner. However, an alternate owner may be specified by using the ALTER AUTHORIZATION statement to change ownership.

NULL = No alternate individual owner.

name

sysname

Name of the XML schema collection.

create_date

datetime

Date the XML schema collection was created.

modify_date

datetime

Date the XML schema collection was last altered

sys.xml_schema_facets © 2008 Microsoft

Returns a row per facet (restriction) of an xml-type definition (corresponds to sys.xml_types).

Column name

Data type

Description

xml_component_id

int

ID of XML component (type) to which this facet belongs.

facet_id

int

ID (1-based ordinal) of facet, unique within component-id.

kind

char(2)

Kind of facet:

LG = Length

LN = Minimum Length

LX = Maximum Length

PT = Pattern (regular expression)

EU = Enumeration

IN = Minimum Inclusive value

IX = Maximum Inclusive value

EN = Minimum Exclusive value

EX = Maximum Exclusive value

DT = Total Digits

DF = Fraction Digits

WS = White Space normalization

kind_desc

nvarchar (60)

Description of kind of facet:

LENGTH

MINIMUM_LENGTH

MAXIMUM_LENGTH

PATTERN

ENUMERATION

MINIMUM_INCLUSIVE_VALUE

MAXIMUM_INCLUSIVE_VALUE

MINIMUM_EXCLUSIVE_VALUE

MAXIMUM_EXCLUSIVE_VALUE

TOTAL_DIGITS

FRACTION_DIGITS

WHITESPACE_NORMALIZATION

is_fixed

bit

1 = Facet has a fixed, prespecified value.

0 = No fixed value. (default)

value

nvarchar (4000)

Fixed, pre-specified value of the facet.

We discussed sys.xml_schema_components and sys.xml_schema_namespaces in part 1 of this article. View sys.xml_schema_components keeping not only the references to the registered custom schema elements but also the references to the SQL Server XML base data types and schema custom created data types. By joining this table we can find out each of the elements data type, data type it derived from, and the length of the derived data type.

inner join sys.xml_schema_components sc1 on scp.placed_xml_component_id = sc1.xml_component_id

In this part of the statement placed_xml_component_id is the id of XML data type for the data element. In our case, companyid and userid have data type 'integer' and prodno has schema defined data type data type 'myST'.

inner join sys.xml_schema_components sc2 on sc1.base_xml_component_id = sc2.xml_component_id

In this part of the statement base _xml_component_id is the id of XML data the actual element data type is derived. In our case, companyid and userid have derived from (parent data type) 'decimal' and prodno data type 'myST' derived from string. It will help generic process convert data properly when extracting from XML field.

select

[XML_Data].query('//customerid').value('.','integer') as customerid,

[XML_Data].query('//prodno').value('.','varchar(6)') as prodno,

[XML_Data].query('//userid').value('.','integer') as userid

from dbo.XML_TestTbl

where schema_nm = 'stat';

inner join sys.xml_schema_facets sf on sf.xml_component_id = sc1.xml_component_id

This relationship returns the schema custom data type restrictions. In our case the length of myST data type in bytes.

The main portion of the process is stored in procedure spGenericXML_Step.txt that takes XML table name, database name, and schema name to validate data and generate select statement.

declare @selectCMD varchar(max);

exec dbo.spGenericXML_Step

@XML_ID = 0,

@fromTBLNM = 'dbo.XML_TestTbl', -- has to be with schema name - dbo.test

@fromDBNM = 'test',

@XMLSchemaNM = 'stat',

@selectCMD = @selectCMD OUTPUT;

print @selectCMD;

Result:

select XML_ID, Table_NM, [XML_Data].query('//companyid').value('.','integer ') , [XML_Data].query('//prodno').value('.','varchar(6) ') ,

[XML_Data].query('//userid').value('.','integer ') from asiwork.dbo.XML_TestTbl

where Schema_NM = 'stat' and XML_ID > 0;

This stored procedure can be used from a wrapper procedure. The wrapper can take the output statement and insert data to the specific destination table with regular columns. The table name and database from which XML data will be extracted can be defined in custom meta data based on the schema. As I mentioned before, the XML data table can hold data for the different XML schemas. The output regular data table can be defined through meta data for the process or both, source XML and destination tables can be hard coded. It is up to the person who is writing the process.

Why do we need XML_ID in the statement and as parameter to the stored procedure?

The process of extracting XML data will be running daily (weekly, monthly). One of the parameters that will define the processed rows will be XML_ID. If the rows are processed, the process will insert XML_ID with other data into output table. Output table structure can be like this one:

Create table XML_Output (

Recid int identity(1,1) primary key, Companyid int, Userid int,

Prodno varchar(6), External_XML_ID bigint, External_Table_NM varchar(50))

Then processed XML_IDs are known and will bypass process. Another process that will clean up XML data table can find processed XML_IDs and clean up only the ones that already processed. Field External_table_name gives the information about source XML data table.

Conclusion

This article described only one of various ways XML can be used in the database world. This architecture and process can be extended by the use of custom meta data, adding data validation and data mining layers. Additional XML feature can be used in XML schema. The article code was simplified for the sake of clarity.

Resources:

spGenericXML_Step.txt
Total article views: 12617 | Views in the last 30 days: 3
 
Related Articles
FORUM

USING sys.xml_schema_collections WITH xml_schema_namespace()

How to list all AdventureWorks XML Schemas one by one?

FORUM

XML Schema Collections

Is there a way to validate multiple xml doc types into one schema

FORUM

XML schema collection confused

multiple schema's one xml field

ARTICLE

XML Workshop VI - Typed XML and SCHEMA Collection

The sixth installment of the XML Workshop for SQL Server shows how the XML Schemas fit into the XML ...

FORUM

Script Component

Troubleshooting Script Component

Tags
xml    
xsd    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones