XML import from a Cross-tab Report

willysantara, 2017-06-08 (first published: 2017-06-01)

ETL is truly a continuing journey of exploration.

It is often the case that source databases are, for whatever reason, inaccessible as SQL databases so surrogates have to be used. These can be files of various sorts, scheduled and exported. Today's tale relates to that most verbose of formats - XML.

As with so many ETL projects, the scope and complexity of the ETL phase has been understated, and so quick-and-dirty - that is to say - Spackle - is the order of the day.

The XML in question is not a pure database extract: it is an XML representation of a cross-tab Statistical report from a hotel reservation system. What is required is a simple list of the various statistics, their values, and the grouping to which each belongs. These rows can then be furher shaped and finally inserted to a standard star-schema warehouse.

Attempt 1 used SSIS for the XML file read, via an XML Data Source. This resulted in a formidable XSD file to describe the source file contents and assign datatypes. But as should be obvious from the xml, most of the document object model is quite irrelevant for our purposes. Prime example - Column Numbers, which don't pass the 'So What?' test for the warehouse destination.

In another of those real-world sidebars, the entire reservation system is under review. This extract may have a life only of months, so making this whole exercise as simple as possible is an important consideration. It may have to be rejigged for quite another data source - very possibly non-XML - once a replacement system is decided upon.

Recalling that this is an xml rendition of a report, one has to ask - what if a user in the source application tweaks the report definition? The XSD is likely to be rendered instantly obselete, but as it is embedded deep within an SSIS Control Flow, this would cause the task to fail. Failed tasks mean SSIS changes and re-deployment - not a very maintainable proposition. So XML Data Source plus XSD definitions won't do.

Attempt 2, results from a decision to use pure SQL and the old standby: OPENROWSET - which has been well covered in these very forums (or, for the purist, fora), for two reasons.

1 - as pure SQL, the script can be wrapped up as a stored procedure on the BI server, and thus quickly amended to suit any application report definition changes. As many such reports need to be ETL'ed, the procedure can be invoked by a simple SSIS Filewatcher task (Konesans or similar) and the relevant filename passed to it.

2 - as SSIS is so monstrously picky about datatypes, a 'With Result Sets' invocation of the procedure will allow explicit datatyping at the point of extraction - always the best in my experience.

Attempt 2 looks 'down' the XML tree, so the initial Nodes definition in the Cross Apply is'//RES_STATISTICS2/LIST_G_LAST_YEAR/G_LAST_YEAR/LIST_G_CROSS/G_CROSS/LIST_G_SUBGROUP/G_SUBGROUP/LIST_G_DETAIL/G_DETAIL/LIST_G_HEADING1/G_HEADING1/LIST_G_AMOUNT/G_AMOUNT'

This slavishly follows the tree top-down but will, clearly, suffer if that report definition is changed (number of layers, names of elements), plus is a rather formidable string. The procedure does have the advantage of running, but a little optimising seems possible.

Attempt 3 simplifies the Cross Apply nodes to the rather simpler './/G_AMOUNT' and mirabile dictu, the script still works. Job done. On to the next challenge.

I explain this as follows. It is actually wonderfully simple, and I hope it is for readers too:

Find the lowest/innermost element in the document tree, at which required data resides. In this case, it is the SUM_AMOUNT, which has a parent G_AMOUNT.
Point the Cross Apply nodes at G_AMOUNT parent - this determines the number of rows in the output. './/G_AMOUNT' means 'ignore all of the parents, just get all G_AMOUNT's'
The Select from the PackageSource CTE can then acquire the other values, by stepping up or down a known number of parents/children from G_AMOUNT - 1 click down for SUM_AMOUNT, 2 clicks up for HEADING_1, 6 clicks up for MASTER_VALUE
This bottom-up 'layer counting' approach completely ignores intermediate element names: it avoids the need to quote them at all.
If the source application report definition changes, the procedure can be easily maintained, either as to element names needed for the output columns, or as to the number of parents back up (or children down) the document tree to climb for each required value. This can be achieved by just looking at the XML.
The procedure is executed 'with result sets' to be absolutely sure of datatypes. The procedure is itself a dynamic SQL execution, because, for reasons best known to the authors, a BULK command won't take a parameter as its file name.
SSIS is thus likely not to be able to 'sniff out' datatypes for its dataflows otherwise. And we have all had experience of letting SSIS decide these for itself.....best not to.

Caveat:

The source files, being renditions of summary reports, are inherently small. This method may not scale well as file sizes increase. YMMV, as always

Images below: the xml (top of doc only) and the result set - exactly as required.

CREATE procedure [dbo].[sp_Get_Statistics] ( @filename nvarchar(100) ) AS DECLARE @sql NVARCHAR(4000) = ' with SourcePackage as ( SELECT CAST(pkgblob.BulkColumn AS XML) pkgXML FROM OPENROWSET(bulk ''' + @filename + ''', single_blob) AS pkgblob ) SELECT Props.Prop.value(''../../../../../../MASTER_VALUE[1]'',''nvarchar(100)'') [Segment] , Props.Prop.value(''../../HEADING_1[1]'',''nvarchar(100)'') [Statistic] , Props.Prop.value(''SUM_AMOUNT[1]'', ''decimal(18,5)'') [Value] FROM SourcePackage t CROSS APPLY pkgXML.nodes(''.//G_AMOUNT'') Props(Prop) where Props.Prop.value(''SUM_AMOUNT[1]'', ''decimal(18,5)'') <> 0.00000 and Props.Prop.value(''../../../../../../MASTER_VALUE[1]'',''nvarchar(100)'') <> ''Grand Total'' '; EXEC(@sql); GO

Basics of XML and SQL Server, Part 3: Transform and Shred XML in SSIS

by Stan Kulp

SQLServerCentral.com

This SSIS package performs multiple XSL transformations on an XML document, then shreds the transformed document and inserts its data into a SQL Server table.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4.83 (6)

You rated this post out of 5. Change rating

2012-03-21

3,868 reads

Discuss

XML Workshop : Utilizing Relational Data In XML Files

by Matthew Pettit

SQLServerCentral.com

Have you received an XML file that must be migrated into a production database? Today’s workshop dives into an ad hoc method of processing relational datasets delivered in an XML file format.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4.53 (17)

You rated this post out of 5. Change rating

2014-04-24

4,421 reads

Discuss

Export to XML Using SSIS

by Additional Articles

SQL Server Performance

Exporting data to XML format using SSIS initially seems like it should be straight forward – just dump it in a flat file and give it a name xml type, however SSIS has no XML destination just an XML source. I had no luck dumping the XML in flat file. My solution was to use a script task which worked well.

2013-06-27

3,831 reads

Basics of XML and SQL Server, Part 4: Create an XML invoice with SSIS

by Stan Kulp

SQLServerCentral.com

This article demonstrates how to build an SSIS package that generates an XML invoice document from data stored in SQL Server and saves it to an XML file.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

4.36 (11)

You rated this post out of 5. Change rating

2012-03-28

4,861 reads

Discuss

XML Configuration files in SQL Server Integration Services

by Additional Articles

SimpleTalk

Package configuration files are a great way of providing the values of SSIS package properties so that packages can be used in a far more versatile way. They make the deployment of SSIS packages easier and can provide parameters that are based on the server configuration, or which change for each runtime. They're easy to understand, especially when explained by Rob Sheldon.

2011-08-25

2,997 reads

XML import from a Cross-tab Report

Resources

Rate

Share

Share

Rate

XML import from a Cross-tab Report

Resources

Rate

Share

Share

Rate

Related content

Basics of XML and SQL Server, Part 3: Transform and Shred XML in SSIS

XML Workshop : Utilizing Relational Data In XML Files

Export to XML Using SSIS

Basics of XML and SQL Server, Part 4: Create an XML invoice with SSIS

XML Configuration files in SQL Server Integration Services