January 22, 2010 at 8:07 pm
Microsoft SQL Server 2008 (RTM) - 10.0.1600.22 (X64) Jul 9 2008 14:17:44 Copyright (c) 1988-2008 Microsoft Corporation Standard Edition (64-bit) on Windows NT 6.0 <X64> (Build 6002: Service Pack 2) (VM)
VM has 4 processors assigned
----------------------------------------
I have a table where XML docs are stored in a TEXT column. All xml is validated against an XSD before it is inserted or updated to the database (ASP.NET app).
We want to start using full text search, but the iFilter on a TEXT column cannot be overridden for full text indexing to recognize the column as XML data. There is significant development that would have to be done if we just changed the datatype to XML. Therefore, we decided just to duplicate the data as an XML type in another column or table. The data is sensitive, so I can't really publish it.
Table schema:
CREATE TABLE XmlReport
(
ReportId INT NOT NULL IDENTITY (1,1) PRIMARY KEY
, XmlReport TEXT NULL
)
When I run the query below, I qet the following error:
Msg 9420, Level 16, State 1, Line 1
XML parsing: line 13, character 959, illegal xml character
SELECT ReportId, CAST(XmlReport AS XML) TrueXml
FROM TF_XMLReports
WHERE XmlReport IS NOT NULL
I can click on the offending XML column in the results tab and SQL Server opens the XML doc in a new window without complaining. I can copy and paste the offending data in an online XML validator with no problem.
I can one by one filter out line 13 (row 13) and repeat the process. The error is always on line 13, character 959 even though the ReportId on line 13 is different than the one I filtered out. And, I get different rowcounts. Below is the filter I was using to one by one filter out offending rows, the commented numbers after them are the rowcounts after each was subsequently filtered. When I didn't use any filter, the query returned 16 rows:
AND ReportId NOT IN
(
112390 -- 13
--, 112313 --15
--, 112301 --14
--, 112299 -- 12
--, 112287 --11
--, 102123 -- 10
--, 95032 -- 9
)
I tried various filters on the ReportId, and got similar results.
I've used (MAXDOP 1) to make sure I only had one thread running, and got similar results.
Can someone tell me what the heck is going on here? Is this some obscure Win 2008 64/ SQL Server 64/ VM bug?
January 23, 2010 at 3:26 am
If you can't post the real data that's causing the problem you'd need to come up with some modified sample data that will show the same effect.
We need to be able to reproduce that effect on our side in order to be sure that the solution we might come up with will actually resolve your issue.
What you could do is to modify the data in line 13 before character 959 (e.g. replace "Smith" with "Smith1" and another test with "Smit") and check if the character that fails will move to 960 rsp. 958. In that case you'd need to get to that character and see what it is...
January 25, 2010 at 3:44 pm
Unfortunately, I think I will have to open a Microsoft incident, aaarrrgghhh.
The unfiltered query returns 16 rows, out of 3,800, before erroring out, but it tells me the offending XML is on row 13.
SELECT
ReportId
, CAST(XmlReport AS XML) TrueXml
FROM TF_XMLReports
WHERE XmlReport IS NOT NULL
ORDER BY ReportId
OPTION (MAXDOP 1)
This is a truncated version of the data:
RowReportIdTrueXML
183228<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
294159<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
394160<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
494161<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
594208<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
694211<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
795028<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
895030<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
995031<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
1095032<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
11102123<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
12112287<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
13112299<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
14112301<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
15112313<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
16112390<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
When I run this query which returns only the offending row, the resultset is retruned without error:
SELECT
ReportId
, CAST(XmlReport AS XML) TrueXml
FROM TF_XMLReports
WHERE XmlReport IS NOT NULL
AND ReportId = 112299
ORDER BY ReportId
OPTION (MAXDOP 1)
RowReportIdTrueXML
1112299<Report xmlns:xsi="http://www.w3.org/2001/XMLSchem
February 2, 2010 at 1:35 pm
Ok, I finally found out what was going on. The line number that was included with the message had no bearing on the real issue as we strip all whitespace from our XML which includes CR and LF. So, it was misleading. The real problem is the curly quotes that Word inserts through Autocorrect. For some reason, SQL Server doesn't like the curly quotes and considers them invalid XML, even though they validate in Visual Studio and online XML validators.
Viewing 4 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply