|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Monday, April 29, 2013 11:38 PM
Points: 79,
Visits: 251
|
|
Dear All, I have a Prod table with a million on rows having an ntext column to store XML. I need to process something from that XML, but my query crashed in between as there are some invalid xml exists in some rows. So, i need to skip those. how can i?.
create table #temp (id int, xmldoc ntext) insert into #temp values (1, '<parent><child></child></parent>') insert into #temp values (2, '<parent><child>') insert into #temp values (3, '<parent><child></child></parent>')
select * from #temp where convert(xml,convert(nvarchar(max),xmldoc)).exist('(/parent)') = 1
I expect this to skip 2nd row instead an error.
Appreciating your helps.
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 7:19 AM
Points: 283,
Visits: 1,239
|
|
This is a cursor-based "RBAR" function so if this is going to be a task that runs often it may be necessary to rewrite it using a Tally Table for improved performance...but I'll leave that up to someone else. I also include an optional XML "cleanup" function (below) that removes tabs and spaces. If that is not necessary, then obviously leaving it out will also improve performance.
/* Test data */ IF OBJECT_ID('tempdb..#Temp') IS NOT NULL DROP TABLE #Temp
CREATE TABLE #temp (id INT,xmldoc NTEXT) INSERT INTO #temp VALUES (1,'<parent><child1></child1></parent>') INSERT INTO #temp VALUES (2,'<parent><child>') INSERT INTO #temp VALUES (3,'<parent><child2></child2></parent>')
/* Process data */ SET NOCOUNT ON
IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL DROP TABLE #TempTable
CREATE TABLE #TempTable ( [id] INT NOT NULL, [xmldoc] NVARCHAR(4000) NULL, PRIMARY KEY (ID))
DECLARE @idoc INT ,@rootnode NVARCHAR(255) ,@ID INT ,@strXML NVARCHAR(4000) ,@errMsg NVARCHAR(255)
SET @rootnode = '/parent'
DECLARE ValidateXML CURSOR FOR SELECT id ,xmldoc FROM #temp OPEN ValidateXML FETCH NEXT FROM ValidateXML INTO @ID ,@strXML
WHILE @@FETCH_STATUS = 0 BEGIN
/* Optional but recommended to remove excess spaces and tabs. */ /* However, this may slow down the process considerably so use with care. */ SET @strXML = dbo.svfRemoveExcessSpacesFromXML(@strXML) BEGIN TRY
EXEC sp_xml_preparedocument @idoc OUTPUT, @strXML
INSERT INTO #TempTable SELECT id ,xmldoc FROM #temp WHERE ID = @ID AND CONVERT(XML,CONVERT(NVARCHAR(MAX),@strXML)).exist('(/parent)') = 1
END TRY BEGIN CATCH SELECT @errMsg = ERROR_MESSAGE()
/* Comment out this insert to just skip rows with invalid XML */
INSERT INTO #TempTable SELECT @ID AS id ,'Invalid XML: '+@errMsg AS xmldoc END CATCH
SET @strXML = NULL FETCH NEXT FROM ValidateXML INTO @ID ,@strXML
END
CLOSE ValidateXML DEALLOCATE ValidateXML
SELECT id ,xmldoc FROM #TempTable
OUTPUT:
id xmldoc 1 <parent><child1></child1></parent> 2 Invalid XML: The error description is 'The following tags were not closed: parent, child.'. 3 <parent><child2></child2></parent>
Optional XML cleanup function:
CREATE FUNCTION dbo.svfRemoveExcessSpacesFromXML ( @strXML NVARCHAR(MAX) ) RETURNS NVARCHAR(MAX) BEGIN
DECLARE @str NVARCHAR(MAX) ,@xml XML SET @str = @strXML SET @str = REPLACE(@str,CHAR(9),' ') -- convert tabs to spaces
SET @str = REPLACE( REPLACE( REPLACE( @str ,' ',' '+CHAR(7)) ,CHAR(7)+' ','') ,CHAR(7),'') SET @str = REPLACE(@str,'> <','><')
RETURN @str /* SELECT dbo.svfRemoveExcessSpacesFromXML(' hi this xxxx is test for onr jh1 lljkl l l jl l ') SELECT dbo.svfRemoveExcessSpacesFromXML('<soapenv:Body> <BackgroundReports xmlns="http://www.cpscreen.com/schemas"> <BackgroundReportPackage type="report"> <ProviderReferenceId>WPS-6308077</ProviderReferenceId> <PackageInformation> <ClientReferences />') */
END
|
|
|
|
|
SSCrazy Eights
        
Group: General Forum Members
Last Login: Yesterday @ 2:33 PM
Points: 8,620,
Visits: 8,261
|
|
SQL Mad Rafi (11/12/2012) Dear All, I have a Prod table with a million on rows having an ntext column to store XML. I need to process something from that XML, but my query crashed in between as there are some invalid xml exists in some rows. So, i need to skip those. how can i?.
create table #temp (id int, xmldoc ntext) insert into #temp values (1, '<parent><child></child></parent>') insert into #temp values (2, '<parent><child>') insert into #temp values (3, '<parent><child></child></parent>')
select * from #temp where convert(xml,convert(nvarchar(max),xmldoc)).exist('(/parent)') = 1
I expect this to skip 2nd row instead an error.
Appreciating your helps.
No offense but why in the world would you use ntext to hold xml data? We have an xml datatype and ntext is deprecated. You need to get this data into a table that will be supportable in the long run. The RBAR example is exactly what I would come up with for this. I would change the datatype to XML for your xml data. Run that to get your data in the proper datatypes, fix up the garbage that can't convert and change the process that loads this data.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Moden's splitter.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
|
|
|
|