XML Shred skips around if node is missing expected elements

  • Hello, 

    I have bulk imported xml to a local table  called CAQH_Return_XML. as a blob so that I can then shred it..  Also note that I am using dynamic sql so that we can use the variable @i to iterate through new singleton values as we loop through each child. 

    Below is a pared down excerpt of the code

    set @i = @startLoop

    IF @DEBUG = 1
    BEGIN
    PRINT 'Starting SET @execString ProviderEduction Inserts for NPI ' + @NPI + ' on loop ' + @i
    END

    set @execString =
    '
    WITH XMLNAMESPACES(' + '''urn:GeoAccess.PDE.Provider''' + ' as hr
    ,DEFAULT' + '''http:////www.w3.org/2001/XMLSchema''' +
    ')

    INSERT INTO ProviderEducation
    SELECT Distinct
      T.rows.value(' + '''(/hr:Provider/hr:NPI)[1]''' + ',' +'''varchar(40)''' + ') AS [NPI]
      , T.rows.value(' + '''(/hr:Provider/hr:LastName)[1] ''' + ',' + '''varchar(40)''' + ') AS [LastName]
      ,T.rows.value('+ '''(/hr:Provider/hr:Education/hr:InstitutionName)[' + @i + ']''' + ',' +'''varchar(500)''' + ') AS [InstitutionName]
      ,T.rows.value('+ '''(/hr:Provider/hr:Education/hr:City)[' + @i + ']''' + ',' +'''varchar(500)''' + ') AS [City]
      ,T.rows.value('+ '''(/hr:Provider/hr:Education/hr:State)[' + @i + ']''' + ',' +'''varchar(500)''' + ') AS [State]
      ,T.rows.value(' + '''(/hr:Provider/hr:Education/hr:StartDate)[' + @i + ']''' + ',' + '''datetime''' + ') AS [StartDate]
      ,T.rows.value(' + '''(/hr:Provider/hr:Education/hr:EndDate)[' + @i + ']''' + ',' +'''datetime''' + ') AS [EndDate]
      ,T.rows.value(' + '''(/hr:Provider/hr:Education/hr:CompletionDate)[' + @i + ']''' + ',' + '''datetime''' + ') AS [CompletionDate]
        ,T.rows.value(' + '''(/hr:Provider/hr:Education/hr:Degree/hr:DegreeAbbreviation)[' + @i + ']''' + ',' + '''varchar(40)''' + ') AS [DegreeAbbreviation]
        ,T.rows.value(' + '''(/hr:Provider/hr:Education/hr:ProgramCompletedFlag)[' + @i + ']''' + ',' + '''varchar(40)''' + ') AS [ProgramCompletedFlag]
        , ' + @i + ' AS [DegreeNumber]
    FROM
      [dbo].[CAQH_Return_XML]
    CROSS APPLY
      FileInfo.nodes(' + '''/hr:Provider/hr:Education/hr:InstitutionName''' + ') T(rows)
      where T.rows.value(' + '''(/hr:Provider/hr:NPI)[1]''' + ',' + '''varchar(40)''' + ') = ' + '''' + @NPI + '''' + ';'

    SET @startLoop = @startLoop + 1
    End 


    All of this is working very well for me except for one problem.. The xml we receive is inconsistent about what elements are in each record.. When the process does not find an element it's expecting in the record it's working on, it drops to the next record that has that element to retrieve a value, and then skips back up to the original record it was working on.. so the relational output ends up being a combination of values from multiple records..

    Any suggestions as to how I can either convert missing elements to elements with null or blank values OR skip the element if it is not there and parse the next element in the record?

    Quick disclaimer.. This is pretty much my first time trying to shred xml.. so speak slowly please...

  • It's because all of your values are defined using absolute paths instead of relative paths.  Only the nodes definition should use an absolute path.  Changing to relative paths would allow you to get rid of the loop and the dynamic sql.

    Also, your code is showing with emoticons in some places.  I believe it's because you did not use the SQL Code tags that are recommended.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Wednesday, February 22, 2017 3:50 PM

    It's because all of your values are defined using absolute paths instead of relative paths.  Only the nodes definition should use an absolute path.  Changing to relative paths would allow you to get rid of the loop and the dynamic sql.

    Also, your code is showing with emoticons in some places.  I believe it's because you did not use the SQL Code tags that are recommended.

    Drew

    Hi Drew, Thank you so much for reading and replying to my post.. I apologize for the in line code and weird emos, I used the SQL Code tags to box up the code, but can't seem to get the emos to go away..

    As I mentioned, I'm pretty dense about shredding..so I have some follow on questions for you..

    The dynamic code is so it will iterate through each instance of the Education node for a single provider record.. The attached files are an XML example of what this code is trying to shred, and the results of shredding that example xml.. Note that there are 5 instances of Education for this one provider record, we want to shred all 5 in to a SQL table, then move on to the next provider.

    Since I am querying a blob in a sql table rather than the xml doc itself, I'm not sure I understand your comment regarding absolute vs relative path in this context.. here is that same code in a static (not dynamic) version.

    When I run this code, as expected because of the static singleton, I get only the first instance of the education node.. no way to iterate through.  If I remove the singleton, the code breaks., can you show me how this code should look using relative path and without the singleton?

    Thanks again..
    R


    WITH XMLNAMESPACES('urn:GeoAccess.PDE.Provider' as hr
    ,DEFAULT'http:////www.w3.org/2001/XMLSchema')

    INSERT INTO ProviderEducationTest
    SELECT Distinct
      T.rows.value('(/hr:Provider/hr:NPI)[1]','varchar(40)') AS [NPI]
      , T.rows.value('(/hr:Provider/hr:LastName)[1] ','varchar(40)') AS [LastName]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:InstitutionName)[1]','varchar(500)') AS [InstitutionName]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:City)[1]','varchar(500)') AS [City]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:State)[1]','varchar(500)') AS [State]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:StartDate)[1]','datetime') AS [StartDate]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:EndDate)[1]','datetime') AS [EndDate]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:CompletionDate)[1]','datetime') AS [CompletionDate]
        ,T.rows.value('(/hr:Provider/hr:Education/hr:Degree/hr:DegreeAbbreviation)[1]','varchar(40)') AS [DegreeAbbreviation]
        ,T.rows.value('(/hr:Provider/hr:Education/hr:ProgramCompletedFlag)[1]','varchar(40)') AS [ProgramCompletedFlag]
        , 1 AS [DegreeNumber]
    FROM
      [dbo].[CAQH_Return_XML]
    CROSS APPLY
      FileInfo.nodes('/hr:Provider/hr:Education/hr:InstitutionName') T(rows)
      where T.rows.value('(/hr:Provider/hr:NPI)[1]','varchar(40)') = 'xxxxxxxxxx';

  • Please post readily usable code.  A picture of the xml doesn't help. I have added the Provider root, as well as the NPI and LastName nodes to your sample data.

    The following code shreds your sample xml data

    declare @T table (FileInfo xml);

    insert into @T(FileInfo)
    values (N'
    <Provider>
    <NPI>123</NPI>
    <LastName>xxx</LastName>
    <Education ID="1000">
    <InstitutionName>Oregon Graduate School of Professional Psychology</InstitutionName>
    <Address>2043 College Way</Address>
    <City>Forest grove</City>
    <State>OR</State>
    <PostalCode>97116</PostalCode>
    <StartDate>1983-09-01T00:00:00</StartDate>
    <EndDate>1987-08-01T00:00:00</EndDate>
    <ProgramCompletedFlag>1</ProgramCompletedFlag>
    <CompletionDate>1987-08-01T00:00:00</CompletionDate>
    <PhoneNumber>5033526151</PhoneNumber>
    <EducationTypeName>Professional School</EducationTypeName>
    <Degree>
      <DegreeAbbreviation>PSYD</DegreeAbbreviation>
    </Degree>
    </Education>
    <Education ID="1001">
    <InstitutionName>University of Oregon</InstitutionName>
    <Address>1585 E 13th Avenue</Address>
    <City>Eugene</City>
    <State>OR</State>
    <PostalCode>97403</PostalCode>
    <StartDate>1977-09-01T00:00:00</StartDate>
    <EndDate>1979-06-01T00:00:00</EndDate>
    <ProgramCompletedFlag>0</ProgramCompletedFlag>
    <IncompleteExplanation>another school for Doctorate</IncompleteExplanation>
    <PhoneNumber>5413461000</PhoneNumber>
    <EducationTypeName>Professional School</EducationTypeName>
    <Degree>
      <DegreeAbbreviation>MS</DegreeAbbreviation>
    </Degree>
    </Education>
    <Education ID="1002">
    <InstitutionName>Psychological resident:Private Practice</InstitutionName>
    <Address>4320 S.W. 110th</Address>
    <City>Beaverton</City>
    <Province>Oregon</Province>
    <State>OR</State>
    <PostalCode>97225</PostalCode>
    <StartDate>1987-08-01T00:00:00</StartDate>
    <EndDate>1989-05-01T00:00:00</EndDate>
    <HospitalDepartmentName>Private practice</HospitalDepartmentName>
    <TrainingArea>Psychological resident:private practice</TrainingArea>
    <ProgramCompletedFlag>1</ProgramCompletedFlag>
    <CompletionDate>1989-05-01T00:00:00</CompletionDate>
    <ProgramType>Pther</ProgramType>
    <EducationTypeName>Internship</EducationTypeName>
    <Country>
      <CountryName>United States</CountryName>
    </Country>
    </Education>
    </Provider>
    ');

    SELECT
      T.rows.value('(../NPI/text())[1]','varchar(40)') AS [NPI]
    , T.rows.value('(../LastName/text())[1] ','varchar(40)') AS [LastName]
    , T.rows.value('(InstitutionName/text())[1]','varchar(500)') AS [InstitutionName]
    , T.rows.value('(City/text())[1]','varchar(500)') AS [City]
    , T.rows.value('(State/text())[1]','varchar(500)') AS [State]
    , T.rows.value('(StartDate/text())[1]','date') AS [StartDate]
    , T.rows.value('(EndDate/text())[1]','date') AS [EndDate]
    , T.rows.value('(CompletionDate/text())[1]','date') AS [CompletionDate]
    , T.rows.value('(Degree/DegreeAbbreviation/text())[1]','varchar(40)') AS [DegreeAbbreviation]
    , T.rows.value('(ProgramCompletedFlag/text())[1]','varchar(40)') AS [ProgramCompletedFlag]
    , 1 AS [DegreeNumber]
    FROM @T
    CROSS APPLY FileInfo.nodes('Provider/Education') T(rows);

  • Robin Riversong - Thursday, February 23, 2017 1:11 PM

    drew.allen - Wednesday, February 22, 2017 3:50 PM

    It's because all of your values are defined using absolute paths instead of relative paths.  Only the nodes definition should use an absolute path.  Changing to relative paths would allow you to get rid of the loop and the dynamic sql.

    Also, your code is showing with emoticons in some places.  I believe it's because you did not use the SQL Code tags that are recommended.

    Drew

    Hi Drew, Thank you so much for reading and replying to my post.. I apologize for the in line code and weird emos, I used the SQL Code tags to box up the code, but can't seem to get the emos to go away..

    As I mentioned, I'm pretty dense about shredding..so I have some follow on questions for you..

    The dynamic code is so it will iterate through each instance of the Education node for a single provider record.. The attached files are an XML example of what this code is trying to shred, and the results of shredding that example xml.. Note that there are 5 instances of Education for this one provider record, we want to shred all 5 in to a SQL table, then move on to the next provider.

    Since I am querying a blob in a sql table rather than the xml doc itself, I'm not sure I understand your comment regarding absolute vs relative path in this context.. here is that same code in a static (not dynamic) version.

    When I run this code, as expected because of the static singleton, I get only the first instance of the education node.. no way to iterate through.  If I remove the singleton, the code breaks., can you show me how this code should look using relative path and without the singleton?

    Thanks again..
    R


    WITH XMLNAMESPACES('urn:GeoAccess.PDE.Provider' as hr
    ,DEFAULT'http:////www.w3.org/2001/XMLSchema')

    INSERT INTO ProviderEducationTest
    SELECT Distinct
      T.rows.value('(/hr:Provider/hr:NPI)[1]','varchar(40)') AS [NPI]
      , T.rows.value('(/hr:Provider/hr:LastName)[1] ','varchar(40)') AS [LastName]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:InstitutionName)[1]','varchar(500)') AS [InstitutionName]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:City)[1]','varchar(500)') AS [City]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:State)[1]','varchar(500)') AS [State]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:StartDate)[1]','datetime') AS [StartDate]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:EndDate)[1]','datetime') AS [EndDate]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:CompletionDate)[1]','datetime') AS [CompletionDate]
        ,T.rows.value('(/hr:Provider/hr:Education/hr:Degree/hr:DegreeAbbreviation)[1]','varchar(40)') AS [DegreeAbbreviation]
        ,T.rows.value('(/hr:Provider/hr:Education/hr:ProgramCompletedFlag)[1]','varchar(40)') AS [ProgramCompletedFlag]
        , 1 AS [DegreeNumber]
    FROM
      [dbo].[CAQH_Return_XML]
    CROSS APPLY
      FileInfo.nodes('/hr:Provider/hr:Education/hr:InstitutionName') T(rows)
      where T.rows.value('(/hr:Provider/hr:NPI)[1]','varchar(40)') = 'xxxxxxxxxx';

    The .nodes method gives you a node for every path specified, e.g., Education, so there is no need to iterate, and, thus, no need for dynamic sql.  Look at the code that DesNorton published.  It should give you a place to start, although, I might have had the Provider as the first path and then added another .nodes with a path for the education.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Robin Riversong - Thursday, February 23, 2017 1:11 PM

    drew.allen - Wednesday, February 22, 2017 3:50 PM

    It's because all of your values are defined using absolute paths instead of relative paths.  Only the nodes definition should use an absolute path.  Changing to relative paths would allow you to get rid of the loop and the dynamic sql.

    Also, your code is showing with emoticons in some places.  I believe it's because you did not use the SQL Code tags that are recommended.

    Drew

    Hi Drew, Thank you so much for reading and replying to my post.. I apologize for the in line code and weird emos, I used the SQL Code tags to box up the code, but can't seem to get the emos to go away..

    As I mentioned, I'm pretty dense about shredding..so I have some follow on questions for you..

    The dynamic code is so it will iterate through each instance of the Education node for a single provider record.. The attached files are an XML example of what this code is trying to shred, and the results of shredding that example xml.. Note that there are 5 instances of Education for this one provider record, we want to shred all 5 in to a SQL table, then move on to the next provider.

    Since I am querying a blob in a sql table rather than the xml doc itself, I'm not sure I understand your comment regarding absolute vs relative path in this context.. here is that same code in a static (not dynamic) version.

    When I run this code, as expected because of the static singleton, I get only the first instance of the education node.. no way to iterate through.  If I remove the singleton, the code breaks., can you show me how this code should look using relative path and without the singleton?

    Thanks again..
    R


    WITH XMLNAMESPACES('urn:GeoAccess.PDE.Provider' as hr
    ,DEFAULT'http:////www.w3.org/2001/XMLSchema')

    INSERT INTO ProviderEducationTest
    SELECT Distinct
      T.rows.value('(/hr:Provider/hr:NPI)[1]','varchar(40)') AS [NPI]
      , T.rows.value('(/hr:Provider/hr:LastName)[1] ','varchar(40)') AS [LastName]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:InstitutionName)[1]','varchar(500)') AS [InstitutionName]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:City)[1]','varchar(500)') AS [City]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:State)[1]','varchar(500)') AS [State]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:StartDate)[1]','datetime') AS [StartDate]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:EndDate)[1]','datetime') AS [EndDate]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:CompletionDate)[1]','datetime') AS [CompletionDate]
        ,T.rows.value('(/hr:Provider/hr:Education/hr:Degree/hr:DegreeAbbreviation)[1]','varchar(40)') AS [DegreeAbbreviation]
        ,T.rows.value('(/hr:Provider/hr:Education/hr:ProgramCompletedFlag)[1]','varchar(40)') AS [ProgramCompletedFlag]
        , 1 AS [DegreeNumber]
    FROM
      [dbo].[CAQH_Return_XML]
    CROSS APPLY
      FileInfo.nodes('/hr:Provider/hr:Education/hr:InstitutionName') T(rows)
      where T.rows.value('(/hr:Provider/hr:NPI)[1]','varchar(40)') = 'xxxxxxxxxx';

  • DesNorton - Thursday, February 23, 2017 3:09 PM

    Please post readily usable code.  A picture of the xml doesn't help. I have added the Provider root, as well as the NPI and LastName nodes to your sample data.

    The following code shreds your sample xml data

    declare @T table (FileInfo xml);

    insert into @T(FileInfo)
    values (N'
    <Provider>
    <NPI>123</NPI>
    <LastName>xxx</LastName>
    <Education ID="1000">
    <InstitutionName>Oregon Graduate School of Professional Psychology</InstitutionName>
    <Address>2043 College Way</Address>
    <City>Forest grove</City>
    <State>OR</State>
    <PostalCode>97116</PostalCode>
    <StartDate>1983-09-01T00:00:00</StartDate>
    <EndDate>1987-08-01T00:00:00</EndDate>
    <ProgramCompletedFlag>1</ProgramCompletedFlag>
    <CompletionDate>1987-08-01T00:00:00</CompletionDate>
    <PhoneNumber>5033526151</PhoneNumber>
    <EducationTypeName>Professional School</EducationTypeName>
    <Degree>
      <DegreeAbbreviation>PSYD</DegreeAbbreviation>
    </Degree>
    </Education>
    <Education ID="1001">
    <InstitutionName>University of Oregon</InstitutionName>
    <Address>1585 E 13th Avenue</Address>
    <City>Eugene</City>
    <State>OR</State>
    <PostalCode>97403</PostalCode>
    <StartDate>1977-09-01T00:00:00</StartDate>
    <EndDate>1979-06-01T00:00:00</EndDate>
    <ProgramCompletedFlag>0</ProgramCompletedFlag>
    <IncompleteExplanation>another school for Doctorate</IncompleteExplanation>
    <PhoneNumber>5413461000</PhoneNumber>
    <EducationTypeName>Professional School</EducationTypeName>
    <Degree>
      <DegreeAbbreviation>MS</DegreeAbbreviation>
    </Degree>
    </Education>
    <Education ID="1002">
    <InstitutionName>Psychological resident:Private Practice</InstitutionName>
    <Address>4320 S.W. 110th</Address>
    <City>Beaverton</City>
    <Province>Oregon</Province>
    <State>OR</State>
    <PostalCode>97225</PostalCode>
    <StartDate>1987-08-01T00:00:00</StartDate>
    <EndDate>1989-05-01T00:00:00</EndDate>
    <HospitalDepartmentName>Private practice</HospitalDepartmentName>
    <TrainingArea>Psychological resident:private practice</TrainingArea>
    <ProgramCompletedFlag>1</ProgramCompletedFlag>
    <CompletionDate>1989-05-01T00:00:00</CompletionDate>
    <ProgramType>Pther</ProgramType>
    <EducationTypeName>Internship</EducationTypeName>
    <Country>
      <CountryName>United States</CountryName>
    </Country>
    </Education>
    </Provider>
    ');

    SELECT
      T.rows.value('(../NPI/text())[1]','varchar(40)') AS [NPI]
    , T.rows.value('(../LastName/text())[1] ','varchar(40)') AS [LastName]
    , T.rows.value('(InstitutionName/text())[1]','varchar(500)') AS [InstitutionName]
    , T.rows.value('(City/text())[1]','varchar(500)') AS [City]
    , T.rows.value('(State/text())[1]','varchar(500)') AS [State]
    , T.rows.value('(StartDate/text())[1]','date') AS [StartDate]
    , T.rows.value('(EndDate/text())[1]','date') AS [EndDate]
    , T.rows.value('(CompletionDate/text())[1]','date') AS [CompletionDate]
    , T.rows.value('(Degree/DegreeAbbreviation/text())[1]','varchar(40)') AS [DegreeAbbreviation]
    , T.rows.value('(ProgramCompletedFlag/text())[1]','varchar(40)') AS [ProgramCompletedFlag]
    , 1 AS [DegreeNumber]
    FROM @T
    CROSS APPLY FileInfo.nodes('Provider/Education') T(rows);

    Hello Des ,  

    Your solution works brilliantly, thank you so much! Sorry about the image of xml.. wanted to use color coding to illustrate the issue.. Also although I troll forums from time to time, I hardly ever post, so still figuring out forum etiquette.

    I can see what I need to do next (load the Table Variable from the blob rather than from static xml)  but because I am new to shredding, I have no idea how to accomplish this. 

    The xml is VERY inconsistent, I can see that there is an opening tag that contains an ID attribute <Education ID="1000"> for each instance of this node. Most of time, but not all of the time, the first ID attribute is 1000.. but not always.. 

    So, as near as I can figure, I want to load the table variable with every instance of the xml that follows the Education ID attribute opening tag through the last closing tag </Education>  Alternately, every education node does appear to consistently have an <InstitutionName> element that appears to always have a value (you can see the guy I inherited this code from was using the CROSS APPLY to get the nodes associated with the <InstitiutionName> element in the loop through)

    Do you have any suggestions as to how to load the Table Variable from the blob stored in a SQL table (CAQH_Return_XML) with just the nodes I'm interested in.. in this case every node that starts with an education ID attribute  (ex: ) <Education ID="1000">? or that contains the <InstitutionName> element?

    Thanks again for your help on this.. invaluable..

  • Robin Riversong - Friday, February 24, 2017 12:16 PM

    Hello Des ,  

    Your solution works brilliantly, thank you so much! Sorry about the image of xml.. wanted to use color coding to illustrate the issue.. Also although I troll forums from time to time, I hardly ever post, so still figuring out forum etiquette.

    I can see what I need to do next (load the Table Variable from the blob rather than from static xml)  but because I am new to shredding, I have no idea how to accomplish this. 

    The xml is VERY inconsistent, I can see that there is an opening tag that contains an ID attribute <Education ID="1000"> for each instance of this node. Most of time, but not all of the time, the first ID attribute is 1000.. but not always.. 

    So, as near as I can figure, I want to load the table variable with every instance of the xml that follows the Education ID attribute opening tag through the last closing tag </Education>  Alternately, every education node does appear to consistently have an <InstitutionName> element that appears to always have a value (you can see the guy I inherited this code from was using the CROSS APPLY to get the nodes associated with the <InstitiutionName> element in the loop through)

    Do you have any suggestions as to how to load the Table Variable from the blob stored in a SQL table (CAQH_Return_XML) with just the nodes I'm interested in.. in this case every node that starts with an education ID attribute  (ex: ) <Education ID="1000">? or that contains the <InstitutionName> element?

    Thanks again for your help on this.. invaluable..

    You do not need to use a table variable. I simply used one to provide a sample in a similar format to your original question.

    Try using the code directly against your table.

    SELECT
      T.rows.value('(../NPI/text())[1]','varchar(40)') AS [NPI]
    , T.rows.value('(../LastName/text())[1] ','varchar(40)') AS [LastName]
    , T.rows.value('(InstitutionName/text())[1]','varchar(500)') AS [InstitutionName]
    , T.rows.value('(City/text())[1]','varchar(500)') AS [City]
    , T.rows.value('(State/text())[1]','varchar(500)') AS [State]
    , T.rows.value('(StartDate/text())[1]','date') AS [StartDate]
    , T.rows.value('(EndDate/text())[1]','date') AS [EndDate]
    , T.rows.value('(CompletionDate/text())[1]','date') AS [CompletionDate]
    , T.rows.value('(Degree/DegreeAbbreviation/text())[1]','varchar(40)') AS [DegreeAbbreviation]
    , T.rows.value('(ProgramCompletedFlag/text())[1]','varchar(40)') AS [ProgramCompletedFlag]
    , 1 AS [DegreeNumber]
    FROM dbo.CAQH_Return_XML
    CROSS APPLY FileInfo.nodes('Provider/Education') T(rows);

    If this doesn't work, please provide us with a complete section of XML.  It must contain enough for us to see the path into the specific nodes in question.

  • Hi Des

    First let me say, thank you for taking an interest in this.. there is a fine looking dent in my office wall from banging my head against this one.. 

    So.. 
    Using the SELECT without namespace directly against the table runs without error, but returns 0 rows 

    SELECT
    T.rows.value('(../NPI/text())[1]','varchar(40)') AS [NPI]
    , T.rows.value('(../LastName/text())[1] ','varchar(40)') AS [LastName]
    , T.rows.value('(InstitutionName/text())[1]','varchar(500)') AS [InstitutionName]
    , T.rows.value('(City/text())[1]','varchar(500)') AS [City]
    , T.rows.value('(State/text())[1]','varchar(500)') AS [State]
    , T.rows.value('(StartDate/text())[1]','date') AS [StartDate]
    , T.rows.value('(EndDate/text())[1]','date') AS [EndDate]
    , T.rows.value('(CompletionDate/text())[1]','date') AS [CompletionDate]
    , T.rows.value('(Degree/DegreeAbbreviation/text())[1]','varchar(40)') AS [DegreeAbbreviation]
    , T.rows.value('(ProgramCompletedFlag/text())[1]','varchar(40)') AS [ProgramCompletedFlag]
    , 1 AS [DegreeNumber]
    FROM CAQH_Return_XML
    CROSS APPLY FileInfo.nodes('Provider/Education') T(rows)
    WHERE T.rows.value('(Provider/NPI)[1]','varchar(40)') = @NPI;


    Using the SELECT with namespace directly against the table works, but only returns the first of many rows (how ever many instances of the Education Node that record holds)

    WITH XMLNAMESPACES('urn:GeoAccess.PDE.Provider' as hr
    ,DEFAULT'http:////www.w3.org/2001/XMLSchema')

    --INSERT INTO ProviderEducation
    SELECT Distinct
      T.rows.value('(/hr:Provider/hr:NPI)[1]','varchar(40)') AS [NPI]
      , T.rows.value('(/hr:Provider/hr:LastName)[1] ','varchar(40)') AS [LastName]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:InstitutionName)[1]','varchar(500)') AS [InstitutionName]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:City)[1]','varchar(500)') AS [City]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:State)[1]','varchar(500)') AS [State]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:StartDate)[1]','datetime') AS [StartDate]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:EndDate)[1]','datetime') AS [EndDate]
      ,T.rows.value('(/hr:Provider/hr:Education/hr:CompletionDate)[1]','datetime') AS [CompletionDate]
        ,T.rows.value('(/hr:Provider/hr:Education/hr:Degree/hr:DegreeAbbreviation)[1]','varchar(40)') AS [DegreeAbbreviation]
        ,T.rows.value('(/hr:Provider/hr:Education/hr:ProgramCompletedFlag)[1]','varchar(40)') AS [ProgramCompletedFlag]
        , 1 AS [DegreeNumber]
    FROM
      [dbo].[CAQH_Return_XML]
    CROSS APPLY
      FileInfo.nodes('/hr:Provider/hr:Education') T(rows)
      where T.rows.value('(/hr:Provider/hr:NPI)[1]','varchar(40)') = @NPI;

    I'm attaching a full record per your request if the latest suggestion is not successful 

    The big issue for me is that if I use your method to interrogate a static explicit xml record it works great... but I have to parse through hundreds of xml documents a week, and the guy I inherited this from kindly put something in place to parse each record in to a blob in a sql table (CAQH_Return_XML,fileinfo), His method of iterating the singleton and looping through dynamically gets me all the instances of the nodes I'm interested in (in this case Education), but when it comes across a missing tag (and the source data is going to keep coming in with missing tags), it creatively goes and finds a value from the next available instance of that tag, which of course is from the next instance of that node where that tag occurs, creating a record that is conflated.

    When I try to interrogate the blob without the dynamic looping and resetting the singleton, I get only the first record, and the conflation still occurs, The only time so far that I have been able to get all the available node instances AND avoid the conflation problem is using your method against a single static explicit xml record that is presented as a Value, which of course is not a practical solution.

    Will gratefully accept all knowledge, wisdom, or best guesses at this point..

    thanks again..

  • I used the following code, and get 4 results from the sample XML that you posted.


    DECLARE @CAQH_Return_XML TABLE (FileInfo XML);

    INSERT INTO @CAQH_Return_XML(FileInfo)
    VALUES (N'Your Long XML here');

    DECLARE @NPI VARCHAR(40) = '1881697555';

    WITH XMLNAMESPACES('urn:GeoAccess.PDE.Provider' AS hr, DEFAULT'http:////www.w3.org/2001/XMLSchema')
    SELECT DISTINCT
      T.rows.value('(../hr:NPI/text())[1]', 'varchar(40)') AS NPI
    , T.rows.value('(../hr:LastName/text())[1] ', 'varchar(40)') AS LastName
    , T.rows.value('(hr:InstitutionName/text())[1]', 'varchar(500)') AS InstitutionName
    , T.rows.value('(hr:City/text())[1]', 'varchar(500)') AS City
    , T.rows.value('(hr:State/text())[1]', 'varchar(500)') AS State
    , T.rows.value('(hr:StartDate/text())[1]', 'datetime') AS StartDate
    , T.rows.value('(hr:EndDate/text())[1]', 'datetime') AS EndDate
    , T.rows.value('(hr:CompletionDate/text())[1]', 'datetime') AS CompletionDate
    , T.rows.value('(hr:Degree/hr:DegreeAbbreviation/text())[1]', 'varchar(40)') AS DegreeAbbreviation
    , T.rows.value('(hr:ProgramCompletedFlag/text())[1]', 'varchar(40)') AS ProgramCompletedFlag
    , 1 AS DegreeNumber
    FROM @CAQH_Return_XML
    CROSS APPLY FileInfo.nodes('/hr:Provider/hr:Education') T(rows)
    WHERE T.rows.value('(../hr:NPI/text())[1]', 'varchar(40)') = @NPI;

  • Hi Des, 

    4 results seem correct, I can get that too if I manually copy the xml in to the script, however the act of querying the long xml isn't the objective. Using this method would entail manually copying and pasting well formed xml one at a time from each of hundreds of xml document into this script for each xml record that I want to insert in to a relational db table, this is clearly not a workable solution. The objective is to be able to interrogate the xml which is stored as a SINGLE_BLOB in a sql table. My original (inherited) dynamic solution does that, but has an issue conflating data when an expected element is missing from a node. 

    So, I have to figure out how to interrogate the xml residing as a SINGLE_BLOB in the SQL table CAQH_Return_XML, with out conflating records when data elements are missing, 

    For your test bed, try creating a table with an npi field (varchar(10)), and a fileinfo field (xml). Then insert the npi into the npi field, and convert the long xml to a single_blob and stash it in the fileinfo field and try to interrogate it from there. 

    From that point, if you can demonstrate a method to parse that SINGLE_BLOB value that lives in the xml field (fileinfo) so that it will output all 4 rows, AND does not conflate data when one of the nodes is missing a tag/element, I will be a very grateful and happy camper!

  • Robin Riversong - Wednesday, March 1, 2017 2:02 PM

    For your test bed, try creating a table with an npi field (varchar(10)), and a fileinfo field (xml). Then insert the npi into the npi field, and convert the long xml to a single_blob and stash it in the fileinfo field and try to interrogate it from there. 

    From that point, if you can demonstrate a method to parse that SINGLE_BLOB value that lives in the xml field (fileinfo) so that it will output all 4 rows, AND does not conflate data when one of the nodes is missing a tag/element, I will be a very grateful and happy camper!

    Adding the extra npi field in the table has no bearing on this case. You are having difficulty schredding your xml.

    Based on the xml that you provided (which contains PI data), I was able to shred the data and get what appears to be correct results. Due to the PI data, I am not even going to take a second look at the provided sample data.

    Looking back at your original post, you are trying to inspect the values from the InstitutionName level. The other data is at levels above the starting point.
    CROSS APPLY FileInfo.nodes('/hr:Provider/hr:Education/hr:InstitutionName') T(rows)

    You need to start at a higher point, and work down
    CROSS APPLY FileInfo.nodes('/hr:Provider/hr:Education') T(rows)

    Below I have built some sample data to show the idea of multiple blobs with missing tags, and query them as a whole.  You can also add filters.

    DECLARE @CAQH_Return_XML TABLE (NPI VARCHAR(10), FileInfo XML);

    INSERT INTO @CAQH_Return_XML( NPI, FileInfo )
    VALUES
      ( '12345', N'<Provider xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="urn:GeoAccess.PDE.Provider" ProviderID="11133923" ProviderAttestID="97210274" AttestDate="2016-11-21T14:06:39" ID="999">
      <LastName>Some Last Name</LastName>
      <NPI>12345</NPI>
      <Education ID="999">
      <InstitutionName>An Institution</InstitutionName>
      <CompletionDate>2003-06-01T00:00:00</CompletionDate>
      <City>Some City</City>
      </Education>
      <Education ID="998">
      <StartDate>2000-09-01T00:00:00</StartDate>
      <InstitutionName>Another Institution</InstitutionName>
      <City>Another City</City>
      <ProgramCompletedFlag>1</ProgramCompletedFlag>
      </Education>
      <Education ID="997">
      <StartDate>2004-03-01T00:00:00</StartDate>
      <InstitutionName>Another Institution</InstitutionName>
      <Degree>
       <DegreeAbbreviation>MD</DegreeAbbreviation>
      </Degree>
      </Education>
    </Provider>')
    , ( '987654', N'<Provider xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns="urn:GeoAccess.PDE.Provider" ProviderID="11133923" ProviderAttestID="97210274" AttestDate="2016-11-21T14:06:39" ID="777">
      <LastName>Another Last Name</LastName>
      <NPI>987654</NPI>
      <Education ID="799">
      <InstitutionName>Some Other Institution</InstitutionName>
      <CompletionDate>2004-06-01T00:00:00</CompletionDate>
      <City>Some City</City>
      </Education>
      <Education ID="798">
      <StartDate>2001-09-01T00:00:00</StartDate>
      <InstitutionName>Yet Another Institution</InstitutionName>
      <City>Another City</City>
      <ProgramCompletedFlag>1</ProgramCompletedFlag>
      </Education>
      <Education ID="797">
      <StartDate>2003-03-01T00:00:00</StartDate>
      <InstitutionName>Another Institution</InstitutionName>
      <Degree>
       <DegreeAbbreviation>BA</DegreeAbbreviation>
      </Degree>
      </Education>
    </Provider>');

    DECLARE @NPI VARCHAR(10) = '987654';

    WITH XMLNAMESPACES('urn:GeoAccess.PDE.Provider' AS hr, DEFAULT'http:////www.w3.org/2001/XMLSchema')
    SELECT DISTINCT
      T.rows.value('(../hr:NPI/text())[1]', 'varchar(40)') AS NPI
      , T.rows.value('(../hr:LastName/text())[1] ', 'varchar(40)') AS LastName
      , T.rows.value('(hr:InstitutionName/text())[1]', 'varchar(500)') AS InstitutionName
      , T.rows.value('(hr:City/text())[1]', 'varchar(500)') AS City
      , T.rows.value('(hr:State/text())[1]', 'varchar(500)') AS State
      , T.rows.value('(hr:StartDate/text())[1]', 'datetime') AS StartDate
      , T.rows.value('(hr:EndDate/text())[1]', 'datetime') AS EndDate
      , T.rows.value('(hr:CompletionDate/text())[1]', 'datetime') AS CompletionDate
      , T.rows.value('(hr:Degree/hr:DegreeAbbreviation/text())[1]', 'varchar(40)') AS DegreeAbbreviation
      , T.rows.value('(hr:ProgramCompletedFlag/text())[1]', 'varchar(40)') AS ProgramCompletedFlag
      , 1 AS DegreeNumber

    FROM @CAQH_Return_XML
    /*** NOTE that the starting point is Provider/Education, NOT Provider/Education/InstitutionName ***/
    CROSS APPLY FileInfo.nodes('/hr:Provider/hr:Education') T(rows)

    /*** Choose whether to filter by the external column, or on the actual data from teh XML blob ***/
    --WHERE T.rows.value('(../hr:NPI/text())[1]', 'varchar(40)') = @NPI;
    --WHERE NPI = @NPI;

  • Hi Des, 

    Thanks for hanging in there with me, and for being sensitive to PI issues!

    Just to be clear, all PI data in the txt xml file I attached should be obscured .. I had one of our analysts review and approve it before I uploaded it.. you should see PI data looking like this.

    BirthDate>####-##-##T00:00:00</BirthDate>
    <SSN>#########</SSN>
    <DEAFlag>1</DEAFlag>
    <CDSFlag>0</CDSFlag>
    <UPIN>A#####</UPIN>

    Everything that remains visible should be information that is publicly available.. 

    I'm working with your most recent post right now which with a few mods seems to possibly be exactly what I needed.. but I wanted to get back to you ASAP about PI because we are very serious about that kind of thing..

  • Hey Des,  I meant to get back to you sooner.. your help got us over the hump, and things are working well.. thanks so much for the assist!  Closing this thread..

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply