XML File Encoding before insert into XML Column

  • Hello,

    I have multiple XML files that I am importing into xml columns. At one of the files I got the error of an invalid character, the space character, to be exact.

    Because I have multiple files I am looking for a way to add an encoding from T-SQL, from my stored procedure to every XML file, before importing it into my XML column.

    Also, after the information are imported into my XML column, they are after parsed so I can compare the information from an xml tag to some other information from another table. So I need the space character to be saved into the column in the same way.

    As an example:

    In one of my XML files, I have multiple lines that look like this:

    <image Path="folder1\img_name_year X- XIV_001.jpg" Name="img_name_year X- XIV_001.jpg" />

    The illegal character in this case is the space character after 'X-'

    That is another thing that I do not understand: I have a space character after 'year' also, but that seems to work fine. I get the error only after the 'X-'.

    What I need to do is to save every path and name for every image into a table in my database so it can be further processed and compared with other paths, and I need to do this from SQL. And I need to have the path, in its exact form, as it is in my XML : "folder1\img_name_year X- XIV_001.jpg"

    Thank you and any help would be much appreciated!

  • What is giving you the error? I can save the tag you show as XML just fine, so I think your error is likely elsewhere.

    I've often found that XML parsers can't always accurately pinpoint where the ACTUAL error is: a syntax problem early on in a file (tags not closed right, missing quotes, etc...) might not get picked up until the checker runs out of parsing options a bit later in the file.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Quick questions:

    A. how are you importing the XML files?

    B. Do the files have as XML declaration tag (<?xml version="1.0" encoding="UTF-8" standalone="no" ?>)?

    C. Can you post a more complete sample that causes an error?

    😎

  • Which way have you figured out the symbol in error? Note in the message type of

    XML parsing: line 1, character 77, illegal xml character

    reported position is the position in the XML string, contrary to the position in the SQL statement in other messages.

    If you specify XML encoding it should be consistent with the real encoding at the moment server starts XML parsing. For example, this will fail

    select x = CAST(

    N'<?xml version="1.0" encoding="UTF-8" standalone="no" ?> <myData myAttr="123" />' as XML)

    because server will first build UTF-16 string and then will try to parse XML respecting explicitly specified UTF-8 encoding. I.e. it will try to convert the UTF-16 string from UTF-8 to internal representation which is always UTF-16.

  • found the error 🙂

    The XML files are created with a Python script. In my path, the first &nbsp is recognised, but the second one is actually a \xa0 character(Unicode). I changed my encoding in the text editor to show all of the characters and found the specific char.

    I modified my script to import the file into a nvarchar(max) column. When I convert the nvarchar(max) column into an xml column I replace the char(160) character with a &nbsp (' '), because \xa0 character is a non-breaking space in Latin1 (ISO 8859-1).

    With the encoding I tried, different encodings, but that didn't work.

    Thank you so much for your answers!

  • OK.

    But when file is converted to NVARCHAR, you may live with CHAR(160) as well.

    select col.value('myData[1]/@myAttr', 'nvarchar(50)')

    from

    (select CAST(N'<myData myAttr="12'+char(160)+ N'34" />' as XML)) as t(col)

    where col.value('myData[1]/@myAttr', 'nvarchar(50)') like '%'+char(160) +'%'

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply