query a column of xml files

Question

query a column of xml files

ertao0215

SSC Enthusiast

Points: 109
More actions
February 12, 2015 at 6:03 am

#295324

hey, everyone
I have a table with lots of xml files in one column(more than 1000), like this
1. <content xmlns:xsi="http://www.w3.org/2001/XMLSchema...
2. <content xmlns:xsi="http://www.w3.org/2001/XMLSchema...
3. <content xmlns:xsi="http://www.w3.org/2001/XMLSchema...
each is big
I want to query some values for all to see the duration
but now I can only query one of them
declare @bp xml
select @bp=xml
from bloodpressureohneschema
;WITH XMLNAMESPACES('http://schemas.openehr.org/v1' as bp,'http://www.w3.org/2001/XMLSchema-instance' as xsi,'OBSERVATION' as type)
select * from(
select
m.c.value('(./bp:time/bp:value)[1]','date') as time,
m.c.value('(./bp:data/bp:items[1]/bp:value[1]/bp:magnitude)[1]','int') as value
from @bp.nodes('/bp:content/bp:data/bp:events') as m(c)
)m
is there somewhere I can make better?
2010-04-30 20:47:00.00012273
2010-04-30 20:48:00.00012273
2010-04-30 23:55:00.00011976
2010-04-30 23:56:00.00011976
2010-04-30 23:57:00.00011976
2010-04-30 23:58:00.00012480
2010-04-30 23:59:00.00012480

@bp seem like only declare one xml file or I should seek some other way?

Alan Burstein SSC Guru Points: 61152 More actions · Answer 1

There are certainly a few things you can do to improve this. Can you post a sample XML file?

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

ertao0215 SSC Enthusiast Points: 109 More actions · Answer 2

<content xsi:type="OBSERVATION" archetype_node_id="openEHR-EHR-OBSERVATION.blood_pressure.v1" xmlns="http://schemas.openehr.org/v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<name>

<value/>

</name>

<protocol xsi:type="ITEM_TREE" archetype_node_id="at0011">

<name>

<value>protocol</value>

</name>

</protocol>

<data archetype_node_id="at0001">

<name>

<value>data</value>

</name>

<origin xsi:nil="true"/>

<events xsi:type="POINT_EVENT" archetype_node_id="at0006">

<name>

<value>any event</value>

</name>

<time>

<value>2013-06-19T09:20:00</value>

</time>

<data xsi:type="ITEM_TREE" archetype_node_id="at0003">

<name>

<value>data</value>

</name>

<items xsi:type="ELEMENT" archetype_node_id="at0004">

<name>

<value/>

</name>

<value xsi:type="DV_QUANTITY">

<magnitude>140</magnitude>

<units>mm[Hg]</units>

<precision>0</precision>

</value>

</items>

<items xsi:type="ELEMENT" archetype_node_id="at0005">

<name>

<value/>

</name>

<value xsi:type="DV_QUANTITY">

<magnitude>99</magnitude>

<units>mm[Hg]</units>

<precision>0</precision>

</value>

</items>

</data>

</events>

</data>

</content>

there are many <events/> tags, I just paste one.

thanks

Alan Burstein SSC Guru Points: 61152 More actions · Answer 3

Ok, I put together some sample data that emulates (crudely) what it is I think you are working with... in the script below there is a table that contains multiple xml files...

The column data looks like this (only 3 files included for this demo)

Then I have two different queries to get the data that you are looking for. Code below:

IF OBJECT_ID('tempdb..#sampledata') IS NOT NULL DROP TABLE #sampledata;

CREATE TABLE #sampledata (did int primary key, xmldata xml not null);

DECLARE @bp xml =

'

<content xsi:type="OBSERVATION" archetype_node_id="openEHR-EHR-OBSERVATION.blood_pressure.v1"

xmlns="http://schemas.openehr.org/v1"

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<name>

<value/>

</name>

<protocol xsi:type="ITEM_TREE" archetype_node_id="at0011">

<name>

<value>protocol</value>

</name>

</protocol>

<data archetype_node_id="at0001">

<name>

<value>data</value>

</name>

<origin xsi:nil="true"/>

<events xsi:type="POINT_EVENT" archetype_node_id="at0006">

<name>

<value>any event</value>

</name>

<time>

<value>2013-06-19T09:20:00</value>

</time>

<data xsi:type="ITEM_TREE" archetype_node_id="at0003">

<name>

<value>data</value>

</name>

<items xsi:type="ELEMENT" archetype_node_id="at0004">

<name>

<value/>

</name>

<value xsi:type="DV_QUANTITY">

<magnitude>140</magnitude>

<units>mm[Hg]</units>

<precision>0</precision>

</value>

</items>

<items xsi:type="ELEMENT" archetype_node_id="at0005">

<name>

<value/>

</name>

<value xsi:type="DV_QUANTITY">

<magnitude>99</magnitude>

<units>mm[Hg]</units>

<precision>0</precision>

</value>

</items>

</data>

</events>

</data>

</content>

'

DECLARE @bpt varchar(8000) = cast(@bp as varchar(8000));

INSERT #sampledata

VALUES (1,(

SELECT CAST

(@bpt + replace(replace(@bpt,'2013-06-19T09:20:00','2014-11-13T02:20:02'),'140','399')+

(replace(replace(@bpt,'2013-06-19T09:20:00','2003-03-13T02:20:06'),'140','15')) AS xml)))

--SELECT * FROM #sampledata;

-- not specifying the text node, gets me a parallel query plan

;WITH XMLNAMESPACES('http://schemas.openehr.org/v1' as bp,'http://www.w3.org/2001/XMLSchema-instance' as xsi,'OBSERVATION' as type)

select * from

(

select

m.c.value('(bp:data/bp:events/bp:time/bp:value)[1]','date') as time,

m.c.value('(bp:data/bp:events/bp:data/bp:items[1]/bp:value[1]/bp:magnitude)[1]','int') as value

from #sampledata t

CROSS APPLY t.xmldata.nodes('bp:content') as m(c)

)m

-- specifying the text node, me a serial plan that is faster and less costly

;WITH XMLNAMESPACES('http://schemas.openehr.org/v1' as bp,'http://www.w3.org/2001/XMLSchema-instance' as xsi,'OBSERVATION' as type)

select * from

(

select

m.c.value('(bp:data/bp:events/bp:time/bp:value/text())[1]','date') as time,

m.c.value('(bp:data/bp:events/bp:data/bp:items[1]/bp:value[1]/bp:magnitude/text())[1]','int') as value

from #sampledata t

CROSS APPLY t.xmldata.nodes('bp:*') as m(c)

)m

DROP TABLE #sampledata;

GO

The first query gets a parallel query plan, the second gets a serial parallel plan but runs at about the same speed. In each case, to solve the problem you were having getting data from all the XML files in your column, I start at the root node (bp:content). Let me know if this helps.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

ertao0215 SSC Enthusiast Points: 109 More actions · Answer 4

Your code works, thanks.

but what I need is that, I can query all xml files in whole table at one time, each xml file has thousands events, and each event contains (time, systolisch and diastolisch)

the result most like this

ertao0215 SSC Enthusiast Points: 109 More actions · Answer 5

according your code, I think I know how to change it. thanks a lot.