SQL Server 2000
One of the many useful features gained by SQL Server 2000 is the ability to handle XML documents.
This means that any computer language capable of opening an XML files and calling SQL Server stored procedures can make use of this new ability.
SQL Server can use any of the data types used to hold character information.
TEXT and NTEXT cannot be used as local variables and can only be passed as arguments to a stored procedure.
Bare in mind that the Unicode data types consume twice as much memory as the ASCII equivalents.
| ASCII Type |
Unicode Type |
Max Size |
Comment |
| CHAR |
NCHAR |
8000 |
Fixed length |
| VARCHAR |
NVARCHAR |
8000 characters |
Variable length |
| TEXT |
NTEXT |
2Gb |
May require you to use SET TEXTSIZE as the @@TEXTSIZE parameter determines how much data is returned from a TEXT/NTEXT field by a SELECT statement. |
| SQL Command |
Comment |
| sp_xml_preparedocument |
Parses the XML document using MSXML 2.0.
WARNING: - This will use 1/8th of the memory available to SQL Server.
If the procedure is successful it outputs a handle to the XML document as stored in the SQL Server cache.
|
| sp_xml_removedocument |
Removes the XML document from the cache and destroys the handle.
|
OPENXML |
Returns the rowset from the prepared XML document.
|
In the same way that any C++
object created should always be explicitly destroyed and any pointer to a VB
object should always have the equivalent SET <obj>=Nothing so you should
never use sp_xml_preparedocument in a procedure without using
sp_xml_removedocument.
The example document contains an
XML schema used within a VB program to draw a form for maintaining web page
metadata.
The XML document is written out as
a file by a content management system (CMS) in order to allow off-line editing.
The off-line editing process
outputs the amended XML file that can then be read and the salient portions
written back out into the CMS.
<?xml version="1.0"?>
<Page>
<AuthorsForm>
<MetaDataItem
MaxSize="100">
<Name>Class
Number</Name>
<Value><![CDATA[null]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS
Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please
enter a Class Number value.</ToolTip>
</MetaDataItem>
<MetaDataItem
MaxSize="100">
<Name>Title</Name>
<Value><![CDATA[Prisoners:
getting further help]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS
Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>True</Editable>
<ToolTip>Please
enter a Title value.</ToolTip>
</MetaDataItem>
<MetaDataItem
MaxSize="50">
<Name>Author</Name>
<Value><![CDATA[David
Poole]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS
Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please
enter a Author value.</ToolTip>
</MetaDataItem>
<MetaDataItem
MaxSize="50">
<Name>Translator</Name>
<Value><![CDATA[Renee
Arnoux]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS
Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>True</Editable>
<ToolTip>Please
enter a Translator value.</ToolTip>
</MetaDataItem>
<MetaDataItem
MaxSize="100">
<Name>Subject</Name>
<Value><![CDATA[Civil
Rights]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS
Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please
enter a Subject value.</ToolTip>
</MetaDataItem>
<MetaDataItem
MaxSize="200">
<Name>Description</Name>
<Value><![CDATA[Prisoners:
getting further help]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS
Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please
enter a Description value.</ToolTip>
</MetaDataItem>
<MetaDataItem
MaxSize="100">
<Name>Publisher</Name>
<Value><![CDATA[HMSO]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS
Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please
enter a Publisher value.</ToolTip>
</MetaDataItem>
<MetaDataItem
MaxSize="10">
<Name>Date</Name>
<Value><![CDATA[2002-1-09]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS
Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please
enter a Date value.</ToolTip>
</MetaDataItem>
<MetaDataItem
MaxSize="150">
<Name>Resource
Type</Name>
<Value><![CDATA[Document]]></Value>
<Type>text</Type>
<CTRLType>combobox</CTRLType>
<CTRLFont>MS
Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please
enter a Resource Type value.</ToolTip>
</MetaDataItem>
<MetaDataItem
MaxSize="100">
<Name>Resource
Identifier</Name>
<Value><![CDATA[prisoners_getting_further_help]]></Value>
<Type>text</Type>
<CTRLType>textbox</CTRLType>
<CTRLFont>MS
Sans Serif</CTRLFont>
<Mandatory>True</Mandatory>
<Editable>False</Editable>
<ToolTip>Please
enter a Resource Identifier value.</ToolTip>
</MetaDataItem>
</AuthorsForm>
</Page>
The most important point to note is that as OPENXML is querying an XML document then regardless of the settings within SQL Server the query is case sensitive.
Replacing Page/AuthorsForm/MetaDataItem with page/authorsform/metadataitem will not
work. Neither will changing the case of the field names within the WITH statement.
Let us suppose that you have a stored procedure defined as follows.
CREATE PROC usp_ParseXML @stXML NTEXT AS
DECLARE
@iDocumentHandle Int
exec sp_xml_preparedocument @iDocumentHandle OUTPUT, @stdoc
SELECT *
FROM OPENXML
(@iDocumentHandle , 'Page/AuthorsForm/MetaDataItem',3)
WITH (Name VARCHAR(50) , Value VARCHAR(50), MaxSize
CHAR(3) )
exec sp_xml_removedocument @iDocumentHandle
1.
We declare the variable @iDocumentHandle to hold the handle for our XML
document.
2.
We tell SQL Server to prepare our document.
3.
Our OPENXML statement accepts three arguments.
Ø @iDocumentHandle
containing our handle to our document.
Ø The
XPATH to our metadata items.
Ø An
argument to say whether or not we are interested in Attributes or Elements.
4. Our
WITH clause says which elements and/or attributes we are interested in.
5.
We tell SQL Server to remove our document from the cache and
destroy the pointer.
Just for the sake of argument, to
call our stored procedure we will have done something like the following
DECLARE @stXML VARCHAR(8000)
SET @stDoc=
'<?xml version="1.0"?>
<Page> …..etc
exec usp_ParseXML @stDoc
In this case we would get the
results set as follows
| Name |
Value |
MaxSize |
| Class Number |
null |
100 |
| Title |
Prisoners: getting further help |
100 |
| Author |
David Poole |
50 |
| Translator |
Renee Arnoux |
50 |
| Subject |
Civil Rights |
100 |
| Description |
Prisoners: getting further help |
200 |
| Publisher |
HMSO |
100 |
| Date |
2002-1-09 |
10 |
| Resource Type |
Document |
150 |
| Resource Identifier |
prisoners_getting_further_help |
100 |
The OPENXML statement allows all
of its arguments to be passed as parameters.
This means that the xpath and
flags themselves can be passed as parameters.
This gives you considerable flexibility however as the WITH statement
cannot accept parameters this means that you will probably have to accept
output as an EDGE table (see Books Online).
As the querying of the document
is not handled by SQL Server but via MS XML the xpath argument can be any valid
xpath supported by the MS XML API. Remember that this is MSXML2.
Page/AuthorsForm/MetaDataItem[*="null"]
will return any metadata item where any child element has a value of null.
Page/AuthorsForm/MetaDataItem[Name="Title"]
will return any metadata item where the Name element is set to Title.
In our example XML document Name and
Value
are elements where as MaxSize is an attribute.
The 3rd parameter of OPENXML tells SQL Server what we are interested in.
The parameter is a bit flag so
we can combine flags to tell SQL Server to return both, which is what we have
done in the preceding example.
The flag settings are as follows.
| Value |
Meaning |
| 0, 1 or omitted |
Return only the attributes. |
| 2 |
Return only the elements. |
| 3 (combination of 1 and 2) |
Return both attributes and elements |
This has the following affect on our results.
Flags set to 0, 1 or omitted
Here we can see that only the MaxSize attribute has been returned.
| Name |
Value |
MaxSize |
| NULL |
NULL |
100 |
| NULL |
NULL |
100 |
| NULL |
NULL |
50 |
| NULL |
NULL |
50
|
| NULL |
NULL |
100
|
| NULL |
NULL |
200 |
| NULL |
NULL |
100 |
| NULL |
NULL |
10 |
| NULL |
NULL |
150 |
| NULL |
NULL |
100 |
Flags set to 2
Here we can see that the Name
and Value elements have been returned but the MaxSize element has been ignored.
| Name |
Value |
MaxSize |
| Class Number |
NULL |
NULL |
| Title |
Prisoners: getting further help |
NULL |
| Author |
David Poole |
NULL |
| Translator |
Renee Arnoux |
NULL |
| Subject |
Civil Rights |
NULL |
| Description |
Prisoners: getting further help |
NULL |
| Publisher |
HMSO |
NULL |
| Date |
2002-1-09 |
NULL |
| Resource Type |
Document |
NULL |
| Resource Identifier |
prisoners_getting_further_help |
NULL |
The WITH statement can be omitted, in which case you will receive what is known as an EDGE table (see books online).
Where it is used the WITH statement can contain field definitions or the name of a table containing the field definitions.
As mentioned earlier the names of the fields must match the names of the elements and are case sensitive.
If you want the fieldnames retrieved to be something other than those in the XML document then the field definitions themselves have an xpath portion.
For example, let us suppose that we want to return the MaxSize attribute as TextBoxSize, then we would amend our
WITH statement as follows.
WITH(Name VARCHAR(50) , Value VARCHAR(50),
TextBoxSize CHAR(3) 'MaxSize')
Within the WITH clause the xpath
portion of the field definition also allows you to reference the ancestors or
parents of the elements you are referencing.
The example given in books
online shows a hierarchy of
Ø Customers
o
Order Headers
§
Order Details
We would query the Order Details
line but our xpath arguments would allow us to retrieve information from the
Order Header and Customer level of the document.
As a live example let us look at
an XML document used to update content within a CMS system.
<?xml version="1.0"?>
<Page>
<SlotBlock>
<SlotBlockStyle>heading3</SlotBlockStyle>
<SlotBlockContent>
<PreObject>This
is my heading</PreObject>
<PostObject
/>
<ObjectID>-1</ObjectID>
<ObjectType>B</ObjectType>
</SlotBlockContent>
</SlotBlock>
<SlotBlock>
<SlotBlockStyle>Paragraph</SlotBlockStyle>
<SlotBlockContent>
<PreObject><![CDATA[This
is my paragraph]]></PreObject>
<PostObject
/>
<ObjectID>0</ObjectID>
<ObjectType
/>
</SlotBlockContent>
<SlotBlockContent>
<PreObject><![CDATA[This
is another paragraph]]></PreObject>
<PostObject
/>
<ObjectID>0</ObjectID>
<ObjectType>T</ObjectType>
</SlotBlockContent>
</SlotBlock>
</Page>
A <SlotBlock> is a
paragraph.
<SlotBlockStyle> is the
paragraph style.
<SlotBlockContent>
identifies a piece of text within the paragraph.
We want to query the elements
within the SlotBlockContent elements so we would write the following.
DECLARE @stDoc VARCHAR(8000)
DECLARE @stxPath VARCHAR(50)
DECLARE @iFlags Int
DECLARE @iDocumentHandle Int
SET @iFlags=2
SET @stxPath='Page/SlotBlock/SlotBlockContent'
SET @stDoc=……
Our
example document
exec sp_xml_preparedocument @iDocumentHandle OUTPUT, @stdoc
SELECT *
FROM OPENXML
(@iDocumentHandle , @stxpath,@iFlags)
WITH( PreObject
VARCHAR(50) ,
PostObject
VARCHAR(50),
ObjectType
CHAR(1) )
exec sp_xml_removedocument @iDocumentHandle
This would return the following
data
| PreObject |
PostObject |
ObjectType |
| This is my heading |
|
B |
| This is my paragraph |
|
|
| This is another paragraph |
|
T |
If we want to grab the
<SlotBlockStyle> element from the level above then we would have to add the xpath for the element as shown by the code as marked in bold below.
DECLARE @stDoc VARCHAR(8000)
DECLARE @stxPath VARCHAR(50)
DECLARE @iFlags Int
DECLARE @iDocumentHandle Int
SET @iFlags=2
SET @stxPath='Page/SlotBlock/SlotBlockContent'
SET @stDoc=……
Our
example document
exec sp_xml_preparedocument @iDocumentHandle OUTPUT, @stdoc
SELECT *
FROM OPENXML
(@iDocumentHandle , @stxpath,@iFlags)
WITH( ParagraphStyle
VARCHAR(50) '../SlotBlockStyle'
PreObject
VARCHAR(50) ,
PostObject
VARCHAR(50),
ObjectType
CHAR(1) )
exec sp_xml_removedocument @iDocumentHandle
This would return the following data
| ParagraphStyle |
PreObject |
PostObject |
ObjectType |
| heading3 |
This is my heading |
|
B |
| Paragraph |
This is my paragraph |
|
|
| Paragraph |
This is another paragraph |
|
T |
The "../" in the xpath
definition for ParagraphStyle tells SQL Server to jump up a level to retrieve
SlotBlockStyle.
If we added "../../"
this would mean jump up two levels to read a specified element.
Finally, let us suppose that
there were other elements within <SlotBlockContent> that we didn't know
anything about.
We could return information on
this using an overflow directive.
WITH( ParagraphStyle
VARCHAR(50) '../SlotBlockStyle'
PreObject
VARCHAR(50) ,
PostObject
VARCHAR(50),
ObjectType
CHAR(1) ,
Overflow
VARCHAR(250) '@mp:xmltext' )
The @mp:xmltext field is a
special field that will return the parent element and any child elements that
have not been explicitly declared in the WITH statement.
If there are no additional child
elements then @mp:xmltext will simply return an empty parent element, in this
case <SlotBlockContent/>