I was working recently on the following problem. The value
stored in a number of different columns is an XML document which needed to be
replaced with a yet another XML data It was due to some unfinished nightly processes. The normal REPLACE
statement against TEXT type columns is not working, turning the whole thing in a
bit of a challenge. There was an additional complication - the tag may or may
not exist in the saved XML string. Because
the large number of rows the data should be updated only for the columns where
information is not matching with the data in master table. In this article I
will demonstrate how to parse, update partially or replace completely text
columns data, or to update text columns with the same or different values.
Let me point out that the whole idea of processing XML
changes by hand is really preposterous, and the approach of “parsing” for a
tag by simply treating an XML document as a string of symbols is dangerous and,
based on my experience, should never be allowed. But this is how company’s
processes were done, XML data processed and as a DBA I hardly had any other
choices to make a quick data fix. In generally, this article is how effectively
works with TEXT columns and XML document in my case provides simple and an
effective way for the illustration.
First, the data from the text column should be parsed. Both
SUBSTRING and PATINDEX functions can be used to parse data with less than 8000
bytes. This is the case for most XML tags or any other forms of parsable data.
SUBSTRING returns a part of a character, binary, text, or
image expression. For more information about the valid Microsoft® SQL Server™
data types that can be used with this function consult the manual. PATINDEX
returns the starting position of the first occurrence of a pattern in a
specified expression, or zeros if the pattern is not found, on all valid text
and character data types.
Create table T_XMLData (client_id int, XMLTXT text)
Create table MAINDATA (client_id int, name varchar(25), address varchar(100), rating int)
Let assume that XML was prepared by using the MAINDATA table, the XML tag is called <rating> and the end of the tag is </rating>. To find the rating for each client in XML:Declare @ratings table (client_id int, rating varchar(1000) ) Declare @tag varchar(50), @tagPAT varchar(50), @tagend varchar(50), @tagendPAT varchar(50) select @tag = '<rating>' , @tagPAT = '%<rating>%' , @tagend = '</rating>' , @tagendPAT = '%</rating>%' insert into @ratings ( client_id , rating) SELECT client_id, SUBSTRING (XMLTXT, -- where to start PATINDEX(@tagPAT, XMLTXT) + len(@tag) , -- how many characters to take PATINDEX(@tagendPAT, XMLTXT) - PATINDEX(@tagPAT, XMLTXT) - len(@tag) ) FROM T_XMLData where PATINDEX(@tagPAT, XMLTXT) > 0
Now we can do comparison between actual data and MAINDATA tables. All we need the records which have the rating in MAINDATA table and different from @ratings table.declare @difrating table (client_id int primary key, rating int) Insert into @difrating(client_id, rating) Select m.client_id, m.rating From MAINDATA m Left outer join @ratings r on r.client_id = m.client_id and cast(r.rating as int) <> m.rating Where m.rating is not null
Next step is to update text column with new XML tag value. To update one row the following information is necessary for UPDATETEXT function:
Text pointer value (returned by the TEXTPTR function) that points to the text data to be updated; zero-based starting position for the update; length of data to delete from the existing text column;and the data to be inserted into the existing text column at the specified location.
In addition, it is necessary to know the other tag name after which the <rating> tag will be inserted in case of <rating> tag is not exist in XML string.Declare @InsertAftertagName varchar(50), @InsertAftertagName_PAT varchar(50) , @minid int, @maxid int, @PtrVar varbinary(100) , @InsertPos int, @DeleteLen int, @replacetext varchar(6000) declare @tag1 table(tid int identity(1,1), client_id int primary key , ptr varbinary(100), InsertPos int, deletelen int , replacetext varchar(6000)) select @InsertAftertagName = '</name>', @InsertAftertagName_PAT = '%</name>%' Insert into @tag1(ptr, client_id, InsertPos, deletelen, replacetext) SELECT TEXTPTR(xd.XMLTXT), -- text pointer xd.client_id, -- insert position CASE WHEN PATINDEX(@tagPAT, xd.XMLTXT) = 0 THEN PATINDEX(@InsertAftertagName_PAT, XMLTXT) + len(@InsertAftertagName) - 1 -- no tag ELSE PATINDEX(@tagPAT, XMLTXT) - 1 -- tag exist END , -- delete length CASE WHEN PATINDEX(@tagPAT, XMLTXT) = 0 THEN 0 -- no tag ELSE PATINDEX(@tagendPAT, XMLTXT) + len(@tagend) - PATINDEX(@tagPAT, XMLTXT) END, -- value @tag + cast(r.Rating as varchar) + @tagend FROM T_XMLDATA xd inner join @ratings r on r.client_id = xd.client_id where -- When no @tag tag then another existing tag. Insertion will be done after the existing tag CASE WHEN PATINDEX(@tagPAT, XMLTXT) = 0 THEN PATINDEX(@InsertAftertagName_PAT, xd.XMLTXT) ELSE PATINDEX(@tagendPAT, XMLTXT) END > 0
As you can see from the script the update will be done to update the existing <rating> tag with data or to add <rating> tag to the XML string if tag <rating> is not exist. Now let’s update row by row
the XML statement.select @minid = min(tid),@maxid = max(tid) from @tag1 while (@minid <= @maxid) begin select @PtrVar = ptr , @InsertPos = InsertPos , @DeleteLen = DeleteLen , @replacetext = replacetext from @tag1 where tid = @minid UPDATETEXT T_xmlData --table.column @PtrVar --pointervalue @InsertPos --start @DeleteLen --length to delete data your replacing @replacetext set @minid = @minid + 1 end
Note that UPDATETEXT does not cause triggers to fire like a normal UPDATE statement would. It is a good and a bad news at the same time. The good news is that you don't have to disable a trigger, if there is a trigger based auditing mechanism. The bad news is that if the change is required to be audited then you have to create additional statement for the audit and place both UPDATETEXT and INSERT into an audit table in transactional batch or a stored procedure.
Another note that function SUBSTRING is taking only 8000 characters. This means, if you need to get information from text field and you don’t know if the same string may appeared multiple times, you need to parse the whole text field in 8000 bytes chunks. For example, the same tag can be repeated many times in XML string.Declare @datalen int, @parsedlen int, @str varchar(100) , @parsedval varchar(8000), @datalocation int Set @parsedlen = 1 Set @str = 'ratings' Select @datalen = DATALENGTH(XMLTXT) from T_XMLData where client_id = 100 WHILE (@parsedlen <= @datalen + 8000) Begin Select @parsedval = substring(XMLTXT, @parsedlen, 8000) from T_XMLData where client_id = 100 Set @datalocation = CHARINDEX(@str, @parsedval, 1) -- *********************************************************************************************** -- in this section you can save the position of the string in a temporary table and then use it -- for UPDATETEXT statement select @datalocation --************************************************************************************************ Set @parsedlen = @parsedlen + 8000 End
As you can see, it is possible to parse TEXT columns but it requires additional work to do. And the parsing can be done for the whole table not an individual row.Declare @datalen int, @parsedlen int, @str varchar(100) , @parsedval varchar(8000), @datalocation int Declare @datalentbl table (client_id int primary key, datalen int, XMLTXT text, tid int identity(1,1)) Declare @parsevaltbl table (client_id int, datalocation int) Create table #parsed8000 (client_id int , parsedval varchar(8000) ) Set @parsedlen = 1, @str = 'rating' -- get length of the text column for all rows Insert into @datalentbl (client_id, datalen, XMLTXT ) Select client_id, DATALENGTH(XMLTXT), XMLTXT from T_XMLData Select @datalen = max(datalen) from @datalentbl WHILE (@parsedlen <= @datalen) Begin Insert into #parsed8000 (client_id, parsedval) Select client_id, substring(XMLTXT, @parsedlen, 8000) from @datalentbl Insert into @parsevaltbl (client_id , datalocation ) Select client_id, CHARINDEX(@str, parsedval, 1) from #parsed8000 truncate table #parsed8000 Set @parsedlen = @parsedlen + 8000 End
At the end the table @parsevaltbl has all the positions for the specified string in a text field for all client id’s. Now, you can use it for the TEXTUPDATE or any other purposes.
What if an application or a back end process is required to update completely the value of a text field? There are few ways you can do it.
WRITETEXT function permits nonlogged, interactive updating of an existing text, ntext, or image
column. This statement completely overwrites any existing data in the column it affects. UPDATETEXT can be used as well to update the value completely from position 1 to position defined by function DATALENGTH. And the last choice is the simple UPDATE function. Modifying a text, ntext, or image column with UPDATE initializes the column, assigns a valid text pointer to it, and allocates at least one data page unless updating the column with NULL. The UPDATE statement is logged. If you are replacing or modifying large blocks of text, ntext, or image data, it is better to use the WRITETEXT or UPDATETEXT statement instead of the UPDATE statement. The WRITETEXT and UPDATETEXT statements are not logged.
Now, there is no way to define the variable type of text. So, how we can INSERT or UPDATE a string value with more than 8000 characters? You cannot declare or manipulate a variable of TEXT data-type in a function or stored-procedure. But you can have a TEXT variable as a parameter to the stored procedure. For example,Create procedure proc_1 @client_id int, @textvar text as Begin Insert into T_XMLDATA (client_id, XMLTXT) values (@client_id, @textvar) End
There is another method, even though it should be used with caution. For example, lets assume you need to create an XML document from a note stored in the following table :Create table notes (note_id int, note_seq_no int, note varchar(8000) )
As you can see, the note length can be over 8000 characters. You can create a temporary table with text column and update the notes in chunks of 8000 bytes. At the very end update or insert the values into permanent table with one UPDATE/INSERT statement.
There are many ways to process TEXT data. In many cases there are no big performance degradation in text vs varchar columns if data selected/updated by id column and update is done for one column only. But you can’t update multiple text columns altogether and should do it one by one. I would not recommend using text data type for the table column instead of varchar unless it is absolutely necessary. Let me point out again that the whole idea of processing XML changes by hand is really preposterous, and the approach of “parsing” for a tag by simply treating an XML document as a string of symbols is dangerous.