A Generic Process to Convert XML Data - Part 1

  • Comments posted to this topic are about the item A Generic Process to Convert XML Data - Part 1

  • So, basically you are talking about an Audit table that just stores the "Audits" in a different Xml format? I've been doing that for a while but never relied on the Schemas (I guess I should). Basically, a View can be created for each "type" or "xml schema" so that if you DID need to see what audits took place, it's pretty easy to report on.

    I've recently done something similar with regards to importing multiple Products into our database but support an old Xml format as well. So, basically I created two Xml Schema Collections (called them ProductSchema_v01 and ProductSchema_v02). The Stored Procedure then ensures that the Xml data coming in conforms to one or the other schemas and then imports the data appropriately based on the version.

    CREATE PROCEDURE [dbo].[usp_ImportProducts]

    (

    @Products Xml

    )

    AS

    Declare @v1 Xml (ProductSchema_v01)

    Declare @v2 Xml (ProductSchema_v02)

    BEGIN TRY

    Set @v1 = @Products

    END TRY

    BEGIN CATCH

    BEGIN TRY

    Set @v2 = @Products

    END TRY

    BEGIN CATCH

    RAISERROR('Invalid Xml...', 16, 1)

    END CATCH

    END CATCH

    If (@v1 Is Not Null)

    ....

    Else If (@v2 Is Not Null)

    ....

  • Good one...

  • Good article! One question did come to mind as I read it: would it be possible to modify your method to use a single XML schema for validation against multiple namespaces:

    DROP XML SCHEMA COLLECTION stat;

    GO

    CREATE XML SCHEMA COLLECTION stat

    AS

    '<?xml version="1.0" encoding="ISO-8859-1" ?>

    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema&quot; targetNamespace="leo:stat:a" xmlns="leo:stat:a">

    <xs:element name="statroot">

    <xs:complexType>

    <xs:sequence>

    <xs:element name="statrecord" minOccurs="0">

    <xs:complexType>

    <xs:sequence>

    <xs:element name="companyid" type="xs:integer" />

    <xs:element name="prodno" type="xs:string" />

    <xs:element name="userid" type="xs:integer" />

    </xs:sequence>

    </xs:complexType>

    </xs:element>

    </xs:sequence>

    </xs:complexType>

    </xs:element>

    </xs:schema> ';

    GO

    ALTER XML SCHEMA COLLECTION stat

    ADD '<?xml version="1.0" encoding="ISO-8859-1" ?>

    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema&quot; targetNamespace="leo:stat:b" xmlns="leo:stat:b">

    <xs:element name="statroot">

    <xs:complexType>

    <xs:sequence>

    <xs:element name="statrecord" minOccurs="0">

    <xs:complexType>

    <xs:sequence>

    <xs:element name="co" type="xs:integer" />

    <xs:element name="prod" type="xs:string" />

    <xs:element name="uid" type="xs:integer" />

    <xs:element name="ext" type="xs:string" />

    </xs:sequence>

    </xs:complexType>

    </xs:element>

    </xs:sequence>

    </xs:complexType>

    </xs:element>

    </xs:schema> '

    GO

    declare @xmlvar1 xml (stat);

    set @xmlvar1 = '<xs:statroot xmlns:xs="leo:stat:a">

    <statrecord>

    <companyid>6</companyid>

    <prodno>prd12</prodno>

    <userid>12345</userid>

    </statrecord>

    </xs:statroot>

    <xs:statroot xmlns:xs="leo:stat:b">

    <statrecord>

    <co>6</co>

    <prod>prd12</prod>

    <uid>12345</uid>

    <ext>Test</ext>

    </statrecord>

    </xs:statroot>'

    This would require you to pass in the namespace qualifiers in your input XML, but might eliminate the need to pass the XML schema collection name.

    Thanks!

  • You can do it. But I did not add it because I need to be able to get different XML schemas with different namespaces generically. You will see it in part 2. Sorry, but it has to be a drawing of the process but for some reason it didn't shows up. I did email to the publisher. They should correct it.

  • Why is this a sql centric solution? Would performance be further improved with some C# code on an application server doing the xml heaving lifting?

  • I don't see how performance could be improved by using an application solution. SQL is already set-based as can insert, update, and delete anything from the Xml you send in in one pass. CLR or any C# solution on the other hand will need to issue multiple statements to the database (and since you're talking about an application service - you also talking about multiple network passes back and forth between SQL Server). No, this would slow down performance drastically. You'd be surprised how efficient the Xml query engine is in SQL 2005 and up.

  • Eric Inman (7/3/2009)


    Why is this a sql centric solution? Would performance be further improved with some C# code on an application server doing the xml heaving lifting?

    If you're interested in guaranteeing data stored in the database is valid you have to use the database engine to check it, just like any other database-side constraints. You can always check data in client apps or in the middle tier before it gets stored in the database, but there's no guarantee that these methods of enforcing constraints on data can't be circumvented (by a DBA with direct table access, for instance).

  • tymberwyld (7/3/2009)


    I don't see how performance could be improved by using an application solution. SQL is already set-based as can insert, update, and delete anything from the Xml you send in in one pass. CLR or any C# solution on the other hand will need to issue multiple statements to the database (and since you're talking about an application service - you also talking about multiple network passes back and forth between SQL Server). No, this would slow down performance drastically. You'd be surprised how efficient the Xml query engine is in SQL 2005 and up.

    The Insert Update and Delete are of course set based, parsing the DOM is about as RBAR as you can get. Who cares if its not apparnt to the developer writing the code. In modern web applications, round trips are not as costly as they used to be in client server set ups. The data tier and the web tier are usuallu a few firewalls away on a very fast backplane. That is no argument to introduct xpath and its friends into the data tier.

    Do not even get me started with what the TDS protocol will do to a 200k xml blob. Which is what you will start getting if your app group reliazes you are in bed with the XML mistress.

    Have you taken into account what XML parsing will do to tempdb? All of this for what, to be able to play with xpath in your stored procs. No thanks. XML is a data strucutre for the application tier.

  • Mike C (7/4/2009)


    Eric Inman (7/3/2009)


    Why is this a sql centric solution? Would performance be further improved with some C# code on an application server doing the xml heaving lifting?

    If you're interested in guaranteeing data stored in the database is valid you have to use the database engine to check it, just like any other database-side constraints. You can always check data in client apps or in the middle tier before it gets stored in the database, but there's no guarantee that these methods of enforcing constraints on data can't be circumvented (by a DBA with direct table access, for instance).

    Understand, but I just dont think that passing around a string with over 95% waste in every packet gets that done. I enforce the data requirements in my domain, not the data interface. Over the last few years I have noticed that people are moving more to enforcing the domain data in the data interface layer and not in the domain iteslef. I dont know how this can hold up when said data becomes more than a trivial xml hierarchy.

  • Eric Inman (7/4/2009)


    Mike C (7/4/2009)


    If you're interested in guaranteeing data stored in the database is valid you have to use the database engine to check it, just like any other database-side constraints. You can always check data in client apps or in the middle tier before it gets stored in the database, but there's no guarantee that these methods of enforcing constraints on data can't be circumvented (by a DBA with direct table access, for instance).

    Understand, but I just dont think that passing around a string with over 95% waste in every packet gets that done. I enforce the data requirements in my domain, not the data interface. Over the last few years I have noticed that people are moving more to enforcing the domain data in the data interface layer and not in the domain iteslef. I dont know how this can hold up when said data becomes more than a trivial xml hierarchy.

    Ahhh, the old "wasted packets" argument. Fortunately for all of us there are many methods of optimizing network traffic through various compression algorithms, in any of several network protocol layers. The 95% waste you're referring to is highly compressible, as XML itself tends to be highly compressible. The trade-off for the waste is proprietary technologies that are difficult to work with, require a lot of binary data manipulation and custom programming and don't play well with firewalls. After all, DCOM is relatively efficient over the wire, but the tradeoff for that savings in network bandwidth is proprietary technology + higher admin and maintenance costs.

    XML Schema, which the author writes about in this article, makes it fairly easy to enforce domain level constraints on individual elements and attributes of your XML data. Of course you can always shred your XML in the business logic layer and store it as purely relational data (which is what I normally recommend, except in the case of exceptional circumstances or hard requirements to the contrary).

    XML Schema supports fairly complex data constraints, including regular expressions for strings, ranges and sets for numeric values, etc. Jacob Sebastian has written a free e-book ("The Art of XSD") that describes it in detail. I don't have a link handy, but it's distributed by Red Gate -- definitely worth Googling if you want to see what types of constraints you can define with it.

  • Eric Inman (7/4/2009)


    The Insert Update and Delete are of course set based, parsing the DOM is about as RBAR as you can get. Who cares if its not apparnt to the developer writing the code. In modern web applications, round trips are not as costly as they used to be in client server set ups. The data tier and the web tier are usuallu a few firewalls away on a very fast backplane. That is no argument to introduct xpath and its friends into the data tier.

    Do not even get me started with what the TDS protocol will do to a 200k xml blob. Which is what you will start getting if your app group reliazes you are in bed with the XML mistress.

    Have you taken into account what XML parsing will do to tempdb? All of this for what, to be able to play with xpath in your stored procs. No thanks. XML is a data strucutre for the application tier.

    Right, don't get me wrong, I'm not advocating Xml for every data operation. However, sometimes there's no getting around RBAR. Mostly, I've only had the need to parse small amounts of Xml and only at times where I am transmitting data through the Service Broker. It's never been a performance issue because all of the processing happens asynchronously and on different threads. The client application just "fires and forgets". Of course, I'm talking about data that doesn't need to be "real-time" or anything. The one thing I do hate is creating another application that needs to be maintained by the IS/IT and takes a programmer to dig in and change the logic. By having the Xml processed within SQL Server, any DBA (with a little reading) can manipulate the Xml processing logic when the business needs change without recompilation of any external dependencies (i.e. applications).

  • I'm trying to get a grip on the XML to DB concept, so I executed the following, copy/pasted from the article.

    I don't understand why I'm getting 0 rows returned.

    I most likely forgot something or I need clarification on the subject.

    Pointers would be appreciated.

    Thanks,

    Henk

    PS: For a reason that's beyond me, the code snippet after "-- final code line" is automagically inserted by the web interface of this post-editor.

    The line ending in " -- final code line" delimits the code that I executed

    create XML SCHEMA COLLECTION stat AS

    '

    '

    CREATE TABLE [dbo].[XML_TestTbl](

    [XML_ID] [bigint] IDENTITY(1,1) NOT NULL primary key,

    [Table_NM] AS (('XML_TestTbl')),

    [XML_Data] [xml] NOT NULL,

    [Schema_NM] [varchar](20) NOT NULL,

    [InsertDate] [datetime] NOT NULL default ( getdate() ) )

    go

    declare @verify table (XML_Data XML(stat) );

    declare @xmlvar1 xml (stat);

    set @xmlvar1 = '

    6

    prd1234

    12345

    '

    Insert into @verify(XML_Data)

    Select XML_Data from XML_TestTbl where schema_nm = 'stat';

    select

    [XML_Data].query('//customerid').value('.','integer') as customerid,

    [XML_Data].query('//prodno').value('.','varchar(max)') as prodno,

    [XML_Data].query('//userid').value('.','integer') as userid

    from dbo.XML_TestTbl

    where schema_nm = 'stat' -- final code line

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply