Legal / Illegal XML Characters list

  • I have an error complaining about an illegal XML character, so I've spent the entire morning on Google searching for the legal / illegal XML character list. I've learned that any character not on the "legal" list is illegal, which is good. Unfortunately, the only references to the "legal" list I find, I can't interpret.

    Here's the link all the articles keep referring back to: http://www.w3.org/TR/2004/REC-xml11-20040204/#NT-Char. But how do I interpret #x1-#xD7FF ? What characters are these?

    Is there an actual legal XML character list in ... well, Roman characters like the ones I'm currently typing?

    Or is there a T-SQL shortcut for translating that list so I can see what characters those references mean?

    I ask because I need to manually go through my expected data to find this illegal character, which is a bit difficult if I don't know what I'm looking for.

    Any help would be greatly appreciated. Thanks in advance!

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • What a nightmare of a document!!

    From what I can gather the #x???? is a Hex reference to a unicode/ascii character; with those illegal characters being non-printable or refering to characters used in XML such as etc.

    These are the hex references of the legal characters from the document:

    #x9 | #xA | #xD | #x20-#xD7FF | #xE000-#xFFFD | #x10000-#x10FFFF

    SELECT '#x9', CHAR(CAST(0x9 AS INT)) -- Tab

    SELECT '#xA', CHAR(CAST(0xA AS INT)) -- Line Feed

    SELECT '#xD', CHAR(CAST(0xD AS INT)) -- Carriage Return

    SELECT '#x20', CHAR(CAST(0x20 AS INT)) -- Space

    I currently can't think of a better way of checking for the illegal character than looping through each one in turn and checking that it's hex value falls within those hex references above?!?

    I'm also guessing at any reference over the normal ASCII character code range is for Unicode?!?



    Ade

    A Freudian Slip is when you say one thing and mean your mother.
    For detail-enriched answers, ask detail-enriched questions...[/url]

  • I've located a list of all Unicode references available from

    http://www.alanwood.net/unicode/#links

    to help to translate the characters outside of the ASCII range.



    Ade

    A Freudian Slip is when you say one thing and mean your mother.
    For detail-enriched answers, ask detail-enriched questions...[/url]

  • And remember - they're "illegal" unless you put them in a CDATA tag. At that point pretty much anything goes....

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks, Guys.

    One of my coworkers found the issue by pulling the fields into a XML datatype variable. It was a hidden character with some bizarre value that we'd never seen before.

    Strange... and invisible too.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I tripped over that issue as well;

    below I pasted a function i made for encoding "illegal" characters, it might help you out to clean up the data, or get the data in a format allowable for xml:

    Edited: this requires a Jeff Moden Tally table; performance is quick with it.

    snip-

    the forum doesn't like the pasted code, as it contains html characters like {ampersand}nbsp;

    I've attached instead.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • here's the function:

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • What is "Tally.N" in that function?

  • The Tally Table is a reference to a single column table from 1 to {some large number}

    the example shown only needs 8000 chars;

    this was the origianl definition from years ago:

    SELECT TOP 11000 --equates to more than 30 years of dates

    IDENTITY(INT,1,1) AS N

    INTO dbo.Tally

    FROM Master.dbo.SysColumns sc1, Master.dbo.SysColumns sc2

    --===== Add a Primary Key to maximize performance

    ALTER TABLE dbo.Tally

    ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N) WITH FILLFACTOR = 100

    --===== Allow the general public to use it

    GRANT SELECT ON dbo.Tally TO PUBLIC

    it's an old example, you could replace it with a cTE like this:

    ;with

    a1 as (select 1 as N union all select 1 union all

    select 1 union all select 1 union all

    select 1 union all select 1 union all

    select 1 union all select 1 union all

    select 1 union all select 1),

    a2 as (select 1 as N from a1 as a cross join a1 as b),

    a3 as (select 1 as N from a2 as a cross join a2 as b),

    a4 as (select 1 as N from a3 as a cross join a2 as b),

    Tally as (select top (11000) row_number() over (order by N) as N from a4)

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks. That (and some googling) showed me something very useful for future reference.

    Unfortunately, that function didn't help in this instance, as it only made things far worse (since it messed up all the XML by replacing all the angle brackets, etc. so that SQL Server couldn't even parse the first name/identifier).

    I'm having this same issue as the OP of this thread... I cannot find the "problem character" at all. It's at "line 554, character 18", but no matter how I do things, I can't find 550 lines in this XML. I pasted it into Notepad++ and formatted as XML with linebreaks, and Notepad++ thinks it's well formed with no errors, and there's only 520 lines.

    I'm pulling my hair out here... the XML is known to be good. It's rehydrated by C# code just fine. Notepad++ thinks it's fine when you use its XML validity checking functions. Only SQL Server is complaining, and it's not giving me ANY useful information about what the actual problem is. If it told me the Nth character position (from the beginning) and the actual character (or character code) that it was choking on, that would help.

    So far, nothing I've tried has gotten me any closer to a solution...

    EDIT: Worse... if I copy and paste what is in Notepad++ (which is just copied/pasted from selecting the data from the database) into a sample script and try to convert it... it converts fine. So the very act of copy/paste and/or NotePad++ doing something is "fixing" the problem. AUGH!

    EDIT: Took Notepad++ out of the equation... copying from the grid output directly into the test script, the XML works fine. SELECTING directly from the database yields the error. I have no idea what to do now.

  • Did you try pulling it into an XML data type?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • The data is stored in an image column as a varchar string (not my design, not something I can change).

    I pull it out, converting it to varchar, into a varchar string. If I copy the contents of that and paste it into a SQL Server Query Window, assigning it to a varchar(max) variable, I can convert that to XML.

    However, I cannot DIRECTLY convert the column to varchar(max) and then to xml, because I always get the error when I try that. Or even indirectly by putting it into a varchar(max) variable and then trying to convert that to assign to an xml variable.

    The act of copy/paste from the grid results window into a Query window (i.e. I type SET @vchar = '', and then paste the contents in between the two ' marks), suddenly everything works.

    I'm flummoxed.

  • Paul Bradshaw (3/19/2015)


    The data is stored in an image column as a varchar string (not my design, not something I can change).

    I pull it out, converting it to varchar, into a varchar string. If I copy the contents of that and paste it into a SQL Server Query Window, assigning it to a varchar(max) variable, I can convert that to XML.

    However, I cannot DIRECTLY convert the column to varchar(max) and then to xml, because I always get the error when I try that. Or even indirectly by putting it into a varchar(max) variable and then trying to convert that to assign to an xml variable.

    So have you tried creating a temp table or another perm table with a varchar column and shoving the string in there? Then selecting from that into the XML datatype?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Paul Bradshaw (3/18/2015)


    I'm having this same issue as the OP of this thread... I cannot find the "problem character" at all. It's at "line 554, character 18", but no matter how I do things, I can't find 550 lines in this XML. I pasted it into Notepad++ and formatted as XML with linebreaks, and Notepad++ thinks it's well formed with no errors, and there's only 520 lines.

    Shot in the dark here...does your source data contain an ASCII NULL (CHAR(0)) perhaps?

    You say that SQL Server is complaining about an error in your XML on a line that doesn't appear to exist, but maybe it actually does exist. ASCII NULL is often treated as an "end of file" or "end of string", so your text editor (and the SQL Server editor) may be seeing a NULL and assuming that the NULL is the "end of the file" and is truncating the value at that point.

    You might try looking at the source data in a binary editor instead of a text editor if you can. If your data contains an ASCII NULL then you should be able to see it that way. If it does have a NULL Then you'll have to escape it somehow, but I'm not sure exactly how you're going to accomplish that. If it were me and it was an option, I'd go back to whomever is giving me the data and ask them to encode the data in a format that's more XML friendly (e.g. base-64 maybe).

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply