query a column of xml files

  • hey, everyone

    I have a table with lots of xml files in one column(more than 1000), like this

    1. <content xmlns:xsi="http://www.w3.org/2001/XMLSchema...

    2. <content xmlns:xsi="http://www.w3.org/2001/XMLSchema...

    3. <content xmlns:xsi="http://www.w3.org/2001/XMLSchema...

    each is big

    I want to query some values for all to see the duration

    but now I can only query one of them

    declare @bp xml

    select @bp=xml

    from bloodpressureohneschema

    ;WITH XMLNAMESPACES('http://schemas.openehr.org/v1' as bp,'http://www.w3.org/2001/XMLSchema-instance' as xsi,'OBSERVATION' as type)

    select * from(

    select

    m.c.value('(./bp:time/bp:value)[1]','date') as time,

    m.c.value('(./bp:data/bp:items[1]/bp:value[1]/bp:magnitude)[1]','int') as value

    from @bp.nodes('/bp:content/bp:data/bp:events') as m(c)

    )m

    is there somewhere I can make better?

  • There are certainly a few things you can do to improve this. Can you post a sample XML file?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • <content xsi:type="OBSERVATION" archetype_node_id="openEHR-EHR-OBSERVATION.blood_pressure.v1" xmlns="http://schemas.openehr.org/v1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <name>

    <value/>

    </name>

    <protocol xsi:type="ITEM_TREE" archetype_node_id="at0011">

    <name>

    <value>protocol</value>

    </name>

    </protocol>

    <data archetype_node_id="at0001">

    <name>

    <value>data</value>

    </name>

    <origin xsi:nil="true"/>

    <events xsi:type="POINT_EVENT" archetype_node_id="at0006">

    <name>

    <value>any event</value>

    </name>

    <time>

    <value>2013-06-19T09:20:00</value>

    </time>

    <data xsi:type="ITEM_TREE" archetype_node_id="at0003">

    <name>

    <value>data</value>

    </name>

    <items xsi:type="ELEMENT" archetype_node_id="at0004">

    <name>

    <value/>

    </name>

    <value xsi:type="DV_QUANTITY">

    <magnitude>140</magnitude>

    <units>mm[Hg]</units>

    <precision>0</precision>

    </value>

    </items>

    <items xsi:type="ELEMENT" archetype_node_id="at0005">

    <name>

    <value/>

    </name>

    <value xsi:type="DV_QUANTITY">

    <magnitude>99</magnitude>

    <units>mm[Hg]</units>

    <precision>0</precision>

    </value>

    </items>

    </data>

    </events>

    </data>

    </content>

    there are many <events/> tags, I just paste one.

    thanks

  • Ok, I put together some sample data that emulates (crudely) what it is I think you are working with... in the script below there is a table that contains multiple xml files...

    The column data looks like this (only 3 files included for this demo)

    Then I have two different queries to get the data that you are looking for. Code below:

    IF OBJECT_ID('tempdb..#sampledata') IS NOT NULL DROP TABLE #sampledata;

    CREATE TABLE #sampledata (did int primary key, xmldata xml not null);

    DECLARE @bp xml =

    '

    <content xsi:type="OBSERVATION" archetype_node_id="openEHR-EHR-OBSERVATION.blood_pressure.v1"

    xmlns="http://schemas.openehr.org/v1"

    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    <name>

    <value/>

    </name>

    <protocol xsi:type="ITEM_TREE" archetype_node_id="at0011">

    <name>

    <value>protocol</value>

    </name>

    </protocol>

    <data archetype_node_id="at0001">

    <name>

    <value>data</value>

    </name>

    <origin xsi:nil="true"/>

    <events xsi:type="POINT_EVENT" archetype_node_id="at0006">

    <name>

    <value>any event</value>

    </name>

    <time>

    <value>2013-06-19T09:20:00</value>

    </time>

    <data xsi:type="ITEM_TREE" archetype_node_id="at0003">

    <name>

    <value>data</value>

    </name>

    <items xsi:type="ELEMENT" archetype_node_id="at0004">

    <name>

    <value/>

    </name>

    <value xsi:type="DV_QUANTITY">

    <magnitude>140</magnitude>

    <units>mm[Hg]</units>

    <precision>0</precision>

    </value>

    </items>

    <items xsi:type="ELEMENT" archetype_node_id="at0005">

    <name>

    <value/>

    </name>

    <value xsi:type="DV_QUANTITY">

    <magnitude>99</magnitude>

    <units>mm[Hg]</units>

    <precision>0</precision>

    </value>

    </items>

    </data>

    </events>

    </data>

    </content>

    '

    DECLARE @bpt varchar(8000) = cast(@bp as varchar(8000));

    INSERT #sampledata

    VALUES (1,(

    SELECT CAST

    (@bpt + replace(replace(@bpt,'2013-06-19T09:20:00','2014-11-13T02:20:02'),'140','399')+

    (replace(replace(@bpt,'2013-06-19T09:20:00','2003-03-13T02:20:06'),'140','15')) AS xml)))

    --SELECT * FROM #sampledata;

    -- not specifying the text node, gets me a parallel query plan

    ;WITH XMLNAMESPACES('http://schemas.openehr.org/v1' as bp,'http://www.w3.org/2001/XMLSchema-instance' as xsi,'OBSERVATION' as type)

    select * from

    (

    select

    m.c.value('(bp:data/bp:events/bp:time/bp:value)[1]','date') as time,

    m.c.value('(bp:data/bp:events/bp:data/bp:items[1]/bp:value[1]/bp:magnitude)[1]','int') as value

    from #sampledata t

    CROSS APPLY t.xmldata.nodes('bp:content') as m(c)

    )m

    -- specifying the text node, me a serial plan that is faster and less costly

    ;WITH XMLNAMESPACES('http://schemas.openehr.org/v1' as bp,'http://www.w3.org/2001/XMLSchema-instance' as xsi,'OBSERVATION' as type)

    select * from

    (

    select

    m.c.value('(bp:data/bp:events/bp:time/bp:value/text())[1]','date') as time,

    m.c.value('(bp:data/bp:events/bp:data/bp:items[1]/bp:value[1]/bp:magnitude/text())[1]','int') as value

    from #sampledata t

    CROSS APPLY t.xmldata.nodes('bp:*') as m(c)

    )m

    DROP TABLE #sampledata;

    GO

    The first query gets a parallel query plan, the second gets a serial parallel plan but runs at about the same speed. In each case, to solve the problem you were having getting data from all the XML files in your column, I start at the root node (bp:content). Let me know if this helps.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Your code works, thanks.

    but what I need is that, I can query all xml files in whole table at one time, each xml file has thousands events, and each event contains (time, systolisch and diastolisch)

    the result most like this

  • 2010-04-30 20:47:00.00012273

    2010-04-30 20:48:00.00012273

    2010-04-30 23:55:00.00011976

    2010-04-30 23:56:00.00011976

    2010-04-30 23:57:00.00011976

    2010-04-30 23:58:00.00012480

    2010-04-30 23:59:00.00012480

  • @bp seem like only declare one xml file or I should seek some other way?

  • according your code, I think I know how to change it. thanks a lot.

  • Viewing 6 posts - 1 through 6 (of 6 total)

    You must be logged in to reply to this topic. Login to reply