Stumped trying to shred XML data into table rows

  • Hello experts,

    I have the following table:
    MyTable (
    MyKey  varchar(100) not null primary key,
    ColXml varchar(max) null
    )

    The ColXml column has xml data with the following kind of structure. I modified it to anonymize and included | and - symbols to approximate the hierarchy, so apologies for any errors. I'm trying to shred the XML data into tables and rows in order to extract the embedded Lookup data lists inside.

    /*
    <GetResponse>
    |- <GetResult>
        |-- <Result>
            |---<ResultMessages>
            |--- <DetailResults>
                    |---- <CustomCol>
                        |----- <ColName>Col Name</ColName>
                        |----- <DataType>Lookup</DataType
                        |----- <ColValues>
                            |------ <string>Val 1</string>
                            |------ <string>Val 2</string>
                    |---- <CustomCol>
                    |---- <CustomCol>
                        |----- <ColName>Col Name</ColName>
                        |----- <DataType>Lookup</DataType
                        |----- <ColValues>
                            |------ <string>Val 1</string>
                            |------ <string>Val 2</string>
                    |---- <CustomCol>
            |--- </DetailResults>
            |---</ResultMessages>
        |-- </Result>
    |- </GetResult>
    </GetResponse>
    */

    Each row of the table has xml like the above, stored as varchar(max) in the ColXml column, and is pretty large (>300000 characters). 

    Here is my problem:
    1. When I select the XmlCol as is, I get what looks like correct XML (with < and > brackets) but the value is truncated to about 164000 characters, far fewer than 300000 characters returned.
    ColXML 
    from MyTable
    where ColVarchar = 'Value' 

    2. But when I cast the XmlCol value according to a page that I googled, the XML is not truncated but returns with the < and > brackets converted to entities &lt; and &gt;
    select cast('<![CDATA[' + ColXML + ']]>' as xml)
    from MyTable
    where ColVarchar = 'Value'

    (Sample output)
    &lt;GetResponse xmlns="http://www.mysite.com/"&gt;&lt;GetResult&gt; ...

    I don't even want to get into exact syntax yet and waste everyone's time. But my questions just for starters are:

    1. Does anyone know why these two different things happen (truncated xml vs entitized xml)? 

    2. How should I go about building a t-sql query that returns all of the 300000+ xml characters but with proper XML brackets < and >  so I can load that XML into table rows? 

    Thanks for any help. Full disclosure: I haven't worked a lot with XML. So this is driving me nuts especially because I think there's an obvious solution that I just don't know how to build.

    -Thanks again,
    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • webrunner - Tuesday, December 11, 2018 2:23 PM

    Here is my problem:
    1. When I select the XmlCol as is, I get what looks like correct XML (with < and > brackets) but the value is truncated to about 164000 characters, far fewer than 300000 characters returned.
    ColXML 
    from MyTable
    where ColVarchar = 'Value' 

    2. But when I cast the XmlCol value according to a page that I googled, the XML is not truncated but returns with the < and > brackets converted to entities &lt; and &gt;
    select cast('<![CDATA[' + ColXML + ']]>' as xml)
    from MyTable
    where ColVarchar = 'Value'

    (Sample output)
    &lt;GetResponse xmlns="http://www.mysite.com/"&gt;&lt;GetResult&gt; ...

    I don't even want to get into exact syntax yet and waste everyone's time. But my questions just for starters are:

    1. Does anyone know why these two different things happen (truncated xml vs entitized xml)? 

    2. How should I go about building a t-sql query that returns all of the 300000+ xml characters but with proper XML brackets < and >  so I can load that XML into table rows? 

    Thanks for any help. Full disclosure: I haven't worked a lot with XML. So this is driving me nuts especially because I think there's an obvious solution that I just don't know how to build.

    -Thanks again,
    webrunner

    The truncation is due to a limit imposed by SSMS on non-XML data.  On my system, it will only return the first 65K characters, but this can be adjusted.  XML also has a limit, but it is 2GB.

    The reason that your XML conversion is causing your characters to be entitized, is because you are telling it that you are passing in DATA rather than an XML document.  The correct query is

    SELECT CAST( ColXML as XML)
    FROM MyTable
    WHERE ColVarChar = 'Value'

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Recommend that you use CONVERT with the style parameter matching the content when converting character string to XML, CAST is equal to the default (0)
    😎

  • Eirikur Eiriksson - Wednesday, December 12, 2018 1:39 AM

    Recommend that you use CONVERT with the style parameter matching the content when converting character string to XML, CAST is equal to the default (0)
    😎

    Thank you, Drew and Eirikur, for your help.

    I tried CAST and CONVERT and am getting the error below.
    Example query:
    declare @xml xml;
    select @xml = convert(xml,ColXML,0)
    FROM MyTable
    WHERE ColVarChar = 'Value'

    Error:
    Msg 9420, Level 16, State 1, Line 3
    XML parsing: line 1291, character 70, illegal xml character

    This error seems to happen because the CAST and CONVERT return only part of the XML data - what I denoted as problem 1  in my original post. So without the remaining XML characters the nodes are incomplete and what's left gets interpreted as illegal characters.

    Do you think this is because the source column of XML data is being stored as varchar(max)? It seems odd that the XML would be truncated this way in a CAST or CONVERT if the issue were only an SSMS setting as Drew had suggested.

    Thanks again!
    - webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • webrunner - Wednesday, December 12, 2018 11:11 AM

    Eirikur Eiriksson - Wednesday, December 12, 2018 1:39 AM

    Recommend that you use CONVERT with the style parameter matching the content when converting character string to XML, CAST is equal to the default (0)
    😎

    Thank you, Drew and Eirikur, for your help.

    I tried CAST and CONVERT and am getting the error below.
    Example query:
    declare @xml xml;
    select @xml = convert(xml,ColXML,0)
    FROM MyTable
    WHERE ColVarChar = 'Value'

    Error:
    Msg 9420, Level 16, State 1, Line 3
    XML parsing: line 1291, character 70, illegal xml character

    This error seems to happen because the CAST and CONVERT return only part of the XML data - what I denoted as problem 1  in my original post. So without the remaining XML characters the nodes are incomplete and what's left gets interpreted as illegal characters.

    Do you think this is because the source column of XML data is being stored as varchar(max)? It seems odd that the XML would be truncated this way in a CAST or CONVERT if the issue were only an SSMS setting as Drew had suggested.

    Thanks again!
    - webrunner

    Can you please post an example that fails in the conversion?
    😎

    Storing XML as varchar is a bad practice, even if the column uses compression of any kind.

  • Eirikur Eiriksson - Wednesday, December 12, 2018 11:19 AM

    webrunner - Wednesday, December 12, 2018 11:11 AM

    Eirikur Eiriksson - Wednesday, December 12, 2018 1:39 AM

    Recommend that you use CONVERT with the style parameter matching the content when converting character string to XML, CAST is equal to the default (0)
    😎

    Thank you, Drew and Eirikur, for your help.

    I tried CAST and CONVERT and am getting the error below.
    Example query:
    declare @xml xml;
    select @xml = convert(xml,ColXML,0)
    FROM MyTable
    WHERE ColVarChar = 'Value'

    Error:
    Msg 9420, Level 16, State 1, Line 3
    XML parsing: line 1291, character 70, illegal xml character

    This error seems to happen because the CAST and CONVERT return only part of the XML data - what I denoted as problem 1  in my original post. So without the remaining XML characters the nodes are incomplete and what's left gets interpreted as illegal characters.

    Do you think this is because the source column of XML data is being stored as varchar(max)? It seems odd that the XML would be truncated this way in a CAST or CONVERT if the issue were only an SSMS setting as Drew had suggested.

    Thanks again!
    - webrunner

    Can you please post an example that fails in the conversion?
    😎

    Storing XML as varchar is a bad practice, even if the column uses compression of any kind.

    Especially since there is a XML data type.

  • webrunner - Wednesday, December 12, 2018 11:11 AM

    Eirikur Eiriksson - Wednesday, December 12, 2018 1:39 AM

    Recommend that you use CONVERT with the style parameter matching the content when converting character string to XML, CAST is equal to the default (0)
    😎

    Thank you, Drew and Eirikur, for your help.

    I tried CAST and CONVERT and am getting the error below.
    Example query:
    declare @xml xml;
    select @xml = convert(xml,ColXML,0)
    FROM MyTable
    WHERE ColVarChar = 'Value'

    Error:
    Msg 9420, Level 16, State 1, Line 3
    XML parsing: line 1291, character 70, illegal xml character

    This error seems to happen because the CAST and CONVERT return only part of the XML data - what I denoted as problem 1  in my original post. So without the remaining XML characters the nodes are incomplete and what's left gets interpreted as illegal characters.

    Do you think this is because the source column of XML data is being stored as varchar(max)? It seems odd that the XML would be truncated this way in a CAST or CONVERT if the issue were only an SSMS setting as Drew had suggested.

    Thanks again!
    - webrunner

    No, the error is exactly what it says "illegal XML character."  If it were caused by truncation, the error would say "unexpected end of input."

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • aren't there supposed to be some more tags like </customcol> and </colvals> ? just inquiring minds and all!

  • Thanks, everyone, for the feedback.

    Regarding the questions/issues:

    Can you please post an example that fails in the conversion?


    Working on this. I will post a self-contained, runnable example once I have it ready.

    Storing XML as varchar is a bad practice, even if the column uses compression of any kind.


    Yes, I will look into this. I will discuss it with the application owner to ask if I can test changing the data type to xml.

    No, the error is exactly what it says "illegal XML character." If it were caused by truncation, the error would say "unexpected end of input."

    Drew


    Thanks, apologies for my mistaken assumption. Does that error not happen below because the data has been converted to CDATA text (even though cast as xml), which is not checked for XML correctness? (Rather than for end of input/closing tags)? (See problem 2 in my original post.)
    select cast('<![CDATA[' + ColXML + ']]>' as xml)
    from MyTable
    where ColVarchar = 'Value'

    Thanks again,
    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • webrunner - Wednesday, December 12, 2018 12:31 PM

    Thanks, everyone, for the feedback.

    Regarding the questions/issues:

    Can you please post an example that fails in the conversion?


    Working on this. I will post a self-contained, runnable example once I have it ready.

    Storing XML as varchar is a bad practice, even if the column uses compression of any kind.


    Yes, I will look into this. I will discuss it with the application owner to ask if I can test changing the data type to xml.

    No, the error is exactly what it says "illegal XML character." If it were caused by truncation, the error would say "unexpected end of input."

    Drew


    Thanks, apologies for my mistaken assumption. Does that error not happen below because the data has been converted to CDATA text (even though cast as xml), which is not checked for XML correctness? (Rather than for end of input/closing tags)? (See problem 2 in my original post.)
    select cast('<![CDATA[' + ColXML + ']]>' as xml)
    from MyTable
    where ColVarchar = 'Value'

    Thanks again,
    webrunner

    I'll say it again, the error means EXACTLY what it says "illegal XML character."  It does NOT mean that the message got truncated.  It does NOT mean that it contains invalid CDATA text.  It does NOT mean that it contains an unparseable XML fragment.  It means that it contains an illegal XML character. PERIOD.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I'll say it again, the error means EXACTLY what it says "illegal XML character." It does NOT mean that the message got truncated. It does NOT mean that it contains invalid CDATA text. It does NOT mean that it contains an unparseable XML fragment. It means that it contains an illegal XML character. PERIOD.

    I understand that is what must be the case, but do you know why the query does NOT throw that illegal XML character error with the query below? Should it not if the same illegal XML character is in that same column?

    select cast('<![CDATA[' + ColXML + ']]>' as xml)
    from MyTable
    where ColVarchar = 'Value'

    Thanks,
    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Lots of things that aren't valid XML are allowed within CDATA sections, that's kind of the point of CDATA. So by wrapping it in that, you're basically not really converting it to XML.

  • andycadley - Wednesday, December 12, 2018 3:17 PM

    Lots of things that aren't valid XML are allowed within CDATA sections, that's kind of the point of CDATA. So by wrapping it in that, you're basically not really converting it to XML.

    Thanks, Andy. I would prefer not to use CDATA, but for some reason when I don't cast the varchar(max) column as xml using that CDATA concatenation, the resulting output is cut off about halfway through.

    On the other hand, when I do use CDATA for the XML cast, I get entire output, but the output has "entitized" the XML brackets from < and > to &lt; and &gt;. 

    What I have not been able to do so far is (1) return the entire XML string, as XML, and (2) with the correct < and > brackets.

    Clearly I am missing something here, so I am going to post a self-contained runnable t-sql example by tomorrow that people can use to try to reproduce what I am seeing.

    Thanks again,
    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

  • Have you tried using TRY_CAST( ColXML AS XML) to determine which value(s) cannot be converted to XML, and then seeing if there are any obvious characters in those values that could be causing the problem?

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen - Thursday, December 13, 2018 10:46 AM

    Have you tried using TRY_CAST( ColXML AS XML) to determine which value(s) cannot be converted to XML, and then seeing if there are any obvious characters in those values that could be causing the problem?

    Drew

    Thanks, Drew, I had not considered that. Still working on an example to post here, but I'll use TRY_CAST to see what I get.

    Thanks again,
    webrunner

    -------------------
    A SQL query walks into a bar and sees two tables. He walks up to them and asks, "Can I join you?"
    Ref.: http://tkyte.blogspot.com/2009/02/sql-joke.html

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply