Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

How to Skip Invalid XML Expand / Collapse
Author
Message
Posted Monday, November 12, 2012 11:35 PM
SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Monday, April 29, 2013 11:38 PM
Points: 79, Visits: 251
Dear All,
I have a Prod table with a million on rows having an ntext column to store XML.
I need to process something from that XML, but my query crashed in between as there are some invalid xml exists in some rows. So, i need to skip those. how can i?.

create table #temp (id int, xmldoc ntext)
insert into #temp values (1, '<parent><child></child></parent>')
insert into #temp values (2, '<parent><child>')
insert into #temp values (3, '<parent><child></child></parent>')

select * from #temp
where convert(xml,convert(nvarchar(max),xmldoc)).exist('(/parent)') = 1

I expect this to skip 2nd row instead an error.

Appreciating your helps.
Post #1383950
Posted Wednesday, November 14, 2012 11:47 AM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Tuesday, May 21, 2013 7:19 AM
Points: 283, Visits: 1,239
This is a cursor-based "RBAR" function so if this is going to be a task that runs often it may be necessary to rewrite it using a Tally Table for improved performance...but I'll leave that up to someone else. I also include an optional XML "cleanup" function (below) that removes tabs and spaces. If that is not necessary, then obviously leaving it out will also improve performance.


/* Test data */
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
DROP TABLE #Temp

CREATE TABLE #temp (id INT,xmldoc NTEXT)
INSERT INTO #temp
VALUES
(1,'<parent><child1></child1></parent>')
INSERT INTO #temp
VALUES
(2,'<parent><child>')
INSERT INTO #temp
VALUES
(3,'<parent><child2></child2></parent>')


/* Process data */

SET NOCOUNT ON

IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL
DROP TABLE #TempTable

CREATE TABLE #TempTable (
[id] INT NOT NULL,
[xmldoc] NVARCHAR(4000) NULL,
PRIMARY KEY (ID))

DECLARE
@idoc INT
,@rootnode NVARCHAR(255)
,@ID INT
,@strXML NVARCHAR(4000)
,@errMsg NVARCHAR(255)

SET @rootnode = '/parent'

DECLARE ValidateXML CURSOR
FOR
SELECT
id
,xmldoc
FROM
#temp

OPEN ValidateXML
FETCH NEXT FROM ValidateXML INTO
@ID
,@strXML

WHILE @@FETCH_STATUS = 0
BEGIN

/* Optional but recommended to remove excess spaces and tabs. */
/* However, this may slow down the process considerably so use with care. */
SET @strXML = dbo.svfRemoveExcessSpacesFromXML(@strXML)

BEGIN TRY

EXEC sp_xml_preparedocument @idoc OUTPUT, @strXML

INSERT INTO #TempTable
SELECT
id
,xmldoc
FROM
#temp
WHERE
ID = @ID
AND CONVERT(XML,CONVERT(NVARCHAR(MAX),@strXML)).exist('(/parent)') = 1

END TRY
BEGIN CATCH

SELECT @errMsg = ERROR_MESSAGE()

/* Comment out this insert to just skip rows with invalid XML */

INSERT INTO #TempTable
SELECT
@ID AS id
,'Invalid XML: '+@errMsg AS xmldoc

END CATCH

SET @strXML = NULL

FETCH NEXT FROM ValidateXML INTO
@ID
,@strXML

END

CLOSE ValidateXML
DEALLOCATE ValidateXML

SELECT
id
,xmldoc
FROM
#TempTable


OUTPUT:

id	xmldoc
1 <parent><child1></child1></parent>
2 Invalid XML: The error description is 'The following tags were not closed: parent, child.'.
3 <parent><child2></child2></parent>


Optional XML cleanup function:


CREATE FUNCTION dbo.svfRemoveExcessSpacesFromXML
(
@strXML NVARCHAR(MAX)
)
RETURNS NVARCHAR(MAX)
BEGIN

DECLARE
@str NVARCHAR(MAX)
,@xml XML

SET @str = @strXML
SET @str = REPLACE(@str,CHAR(9),' ') -- convert tabs to spaces

SET @str =
REPLACE(
REPLACE(
REPLACE(
@str
,' ',' '+CHAR(7))
,CHAR(7)+' ','')
,CHAR(7),'')

SET @str = REPLACE(@str,'> <','><')

RETURN @str

/*

SELECT dbo.svfRemoveExcessSpacesFromXML(' hi this xxxx is test for onr jh1 lljkl l l jl l ')
SELECT dbo.svfRemoveExcessSpacesFromXML('<soapenv:Body> <BackgroundReports xmlns="http://www.cpscreen.com/schemas"> <BackgroundReportPackage type="report"> <ProviderReferenceId>WPS-6308077</ProviderReferenceId> <PackageInformation> <ClientReferences />')

*/

END


Post #1384788
Posted Wednesday, November 14, 2012 12:43 PM


SSCrazy Eights

SSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy EightsSSCrazy Eights

Group: General Forum Members
Last Login: Yesterday @ 2:33 PM
Points: 8,620, Visits: 8,261
SQL Mad Rafi (11/12/2012)
Dear All,
I have a Prod table with a million on rows having an ntext column to store XML.
I need to process something from that XML, but my query crashed in between as there are some invalid xml exists in some rows. So, i need to skip those. how can i?.

create table #temp (id int, xmldoc ntext)
insert into #temp values (1, '<parent><child></child></parent>')
insert into #temp values (2, '<parent><child>')
insert into #temp values (3, '<parent><child></child></parent>')

select * from #temp
where convert(xml,convert(nvarchar(max),xmldoc)).exist('(/parent)') = 1

I expect this to skip 2nd row instead an error.

Appreciating your helps.


No offense but why in the world would you use ntext to hold xml data? We have an xml datatype and ntext is deprecated. You need to get this data into a table that will be supportable in the long run. The RBAR example is exactly what I would come up with for this. I would change the datatype to XML for your xml data. Run that to get your data in the proper datatypes, fix up the garbage that can't convert and change the process that loads this data.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Post #1384815
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse