A Generic Process to Convert XML Data - Part 2

,

My previous article, A Generic Process to Convert XML Data - Part 1, described the theory of generic process architecture that allows store data with different input XML schemas into one staging location and then creates a process that will output data into designated standard tables. It is important for the data that is not required for the immediate consumption and that will be used by

applications or reports later with some form of aggregation or the other in the

processed form. In our case, I am using this idea to gather statistical data.

Different types of statistical RAW data is gathered from the application(s) into

one table and extracted daily. Then once a week the staging table is cleaned up by the

process which deletes processed records.

Let see how we can create

generic process. The point of the process is to extract data based on the

standard parameters from application e.g. from the staging table XML_TestTbl

by the Schema_NM.

CREATE

TABLE [dbo].[XML_TestTbl](

[XML_ID] [bigint] IDENTITY(1,1)

NOT NULL primary

key,

[Table_NM] AS

(('XML_TestTbl')),

[XML_Data] [xml] NOT NULL,

[Schema_NM]

[varchar](20) NOT

NULL,

[InsertDate] [datetime]

NOT NULL default ( getdate() ) )

In our example stat schema, process requires root element,

data elements, and data types to be able to create select statement from the

staging table and root element with namespace name for verification purposes.

select

[XML_Data].query('//customerid').value('.','integer')

as customerid,

[XML_Data].query('//prodno').value('.','varchar(max)')

as prodno,

[XML_Data].query('//userid').value('.','integer')

as userid

from dbo.XML_TestTbl

where schema_nm =

'stat';

To get the process required XML structure we have to select

information from the multiple system views.

select sco.name,

sn.name,

sc.name, sc1.name,

sc2.name, sc1.kind,

sc1.kind_desc, sf.kind,

sf.kind_desc, sf.is_fixed, sf.value

from sys.xml_schema_components sc

inner join sys.xml_schema_component_placements scp

on scp.xml_component_id =

sc.xml_component_id

inner join sys.xml_schema_components sc1

on scp.placed_xml_component_id = sc1.xml_component_id

inner join sys.xml_schema_components sc2

on sc1.base_xml_component_id = sc2.xml_component_id

inner join sys.xml_schema_collections sco

on sco.xml_collection_id = sc.xml_collection_id

inner join sys.xml_schema_facets sf

on sf.xml_component_id = sc1.xml_component_id

inner join sys.xml_schema_namespaces sn

on sn.xml_collection_id = sc.xml_collection_id

where sco.name

= 'stat';

sys.xml_schema_component_placements

© 2008 Microsoft

Returns a row per placement for XML schema

components.

Column

name

Data

type

Description

xml_component_id

int

ID of the XML schema component that owns this

placement.

placement_id

int

ID of the placement. This is unique within

the owning XML schema component.

placed_xml_component_id

int

ID of the placed XML schema component.

is_default_fixed

bit

1 = The default value is a fixed value. This

value cannot be overridden in an XML instance.

0 = The value can be overridden.(default)

min_occurrences

int

Minimum number of placed component occurs.

max_occurrences

int

Maximum number of placed component occurs.

default_value

nvarchar (4000)

Default value if one is supplied. Is NULL if

a default value is not supplied.

sys.xml_schema_collections

© 2008 Microsoft

Returns a row per XML schema collection. An XML

schema collection is a named set of XSD definitions. The XML schema collection

itself is contained in a relational schema, and it is identified by a

schema-scoped Transact-SQL name. The following tuples are unique: xml_collection_id,

namespace_id, and schema_id, name.

Column

name

Data

type

Description

xml_collection_id

int

ID of the XML schema collection. Unique

within the database.

schema_id

int

ID of the relational schema that contains

this XML schema collection.

principal_id

int

ID of the individual owner if different from

the schema owner. By default, schema-contained objects are owned by the

schema owner. However, an alternate owner may be specified by using the

ALTER AUTHORIZATION statement to change ownership.

NULL = No alternate individual owner.

name

sysname

Name of the XML schema collection.

create_date

datetime

Date the XML schema collection was created.

modify_date

datetime

Date the XML schema collection was last

altered

sys.xml_schema_facets

© 2008 Microsoft

Returns a row per facet (restriction) of an

xml-type definition (corresponds to sys.xml_types).

Column

name

Data

type

Description

xml_component_id

int

ID of XML component (type) to which this

facet belongs.

facet_id

int

ID (1-based ordinal) of facet, unique within

component-id.

kind

char(2)

Kind of facet:

LG = Length

LN = Minimum Length

LX = Maximum Length

PT = Pattern (regular expression)

EU = Enumeration

IN = Minimum Inclusive value

IX = Maximum Inclusive value

EN = Minimum Exclusive value

EX = Maximum Exclusive value

DT = Total Digits

DF = Fraction Digits

WS = White Space normalization

kind_desc

nvarchar (60)

Description of kind of facet:

LENGTH

MINIMUM_LENGTH

MAXIMUM_LENGTH

PATTERN

ENUMERATION

MINIMUM_INCLUSIVE_VALUE

MAXIMUM_INCLUSIVE_VALUE

MINIMUM_EXCLUSIVE_VALUE

MAXIMUM_EXCLUSIVE_VALUE

TOTAL_DIGITS

FRACTION_DIGITS

WHITESPACE_NORMALIZATION

is_fixed

bit

1 = Facet has a fixed, prespecified value.

0 = No fixed value. (default)

value

nvarchar (4000)

Fixed, pre-specified value of the facet.

We discussed sys.xml_schema_components

and sys.xml_schema_namespaces in part 1 of this article. View sys.xml_schema_components

keeping not only the references to the registered custom schema elements

but also the references to the SQL Server XML base data types and schema custom

created data types. By joining this table we can find out each of the elements

data type, data type it derived from, and the length of the derived data type.

inner join sys.xml_schema_components sc1

on scp.placed_xml_component_id = sc1.xml_component_id

In this part of the statement placed_xml_component_id

is the id of XML data type for the data element. In our case, companyid and

userid have data type 'integer' and prodno has schema defined data type data

type 'myST'.

inner

join sys.xml_schema_components sc2 on

sc1.base_xml_component_id = sc2.xml_component_id

In this part of the statement base _xml_component_id

is the id of XML data the actual element data type is derived. In our

case, companyid and userid have derived from (parent data type) 'decimal'

and prodno data type 'myST' derived from string. It will help generic

process convert data properly when extracting from XML field.

select

[XML_Data].query('//customerid').value('.','integer')

as customerid,

[XML_Data].query('//prodno').value('.','varchar(6)')

as prodno,

[XML_Data].query('//userid').value('.','integer')

as userid

from dbo.XML_TestTbl

where schema_nm =

'stat';

inner

join sys.xml_schema_facets sf on

sf.xml_component_id = sc1.xml_component_id

This relationship returns the schema custom data type

restrictions. In our case the length of myST data type in bytes.

The main portion of the process is stored in procedure spGenericXML_Step.txt

that takes XML table name, database name, and schema name to validate data and

generate select statement.

declare

@selectCMD varchar(max);

exec dbo.spGenericXML_Step

@XML_ID = 0,

@fromTBLNM = 'dbo.XML_TestTbl',

-- has to be with schema name - dbo.test

@fromDBNM =

'test',

@XMLSchemaNM = 'stat',

@selectCMD = @selectCMD OUTPUT;

print

@selectCMD;

Result:

select

XML_ID, Table_NM,

[XML_Data].query('//companyid').value('.','integer

') ,

[XML_Data].query('//prodno').value('.','varchar(6)

') ,

[XML_Data].query('//userid').value('.','integer

') from

asiwork.dbo.XML_TestTbl

where

Schema_NM = 'stat'

and XML_ID >

0;

This stored procedure can be used from a wrapper procedure. The wrapper can take the output statement and insert data to the specific destination table with regular columns. The table name and database from which XML data will be extracted can be defined in custom meta data based on the schema. As I mentioned before, the XML data table can hold data for the different XML schemas. The output regular data table can be defined through meta data for the process or both, source XML and destination tables can be hard coded. It is up to the person who is writing the process.

Why do we need XML_ID in the

statement and as parameter to the stored procedure?

The process of extracting XML data will be running daily (weekly, monthly). One of the

parameters that will define the processed rows will be XML_ID. If the rows are processed, the process will insert XML_ID with other data into output table. Output table structure can be like this one:

Create table XML_Output (

Recid int identity(1,1) primary key,

Companyid int,

Userid int,

Prodno varchar(6),

External_XML_ID bigint,

External_Table_NM varchar(50))

Then processed XML_IDs are known and will bypass process.

Another process that will clean up XML data table can find processed XML_IDs and

clean up only the ones that already processed. Field External_table_name gives

the information about source XML data table.

Conclusion

This article described only one of various ways XML can be

used in the database world. This architecture and process can be extended by the

use of custom meta data, adding data validation and data mining layers.

Additional XML feature can be used in XML schema. The article code was

simplified for the sake of clarity.

Resources

Rate

5 (5)

Share

Share

Rate

5 (5)