SQLServerCentral Article

Manipulating Data in TEXT Type Columns

,

I was working recently on the following problem. The value

stored in a number of different columns is an XML document which needed to be

replaced with a yet another XML data It was due to some unfinished nightly processes. The normal REPLACE

statement against TEXT type columns is not working, turning the whole thing in a

bit of a challenge. There was an additional complication - the tag may or may

not exist in the saved XML string. Because

the large number of rows the data should be updated only for the columns where

information is not matching with the data in master table. In this article I

will demonstrate how to parse, update partially or replace completely text

columns data, or to update text columns with the same or different values.

Let me point out that the whole idea of processing XML

changes by hand is really preposterous, and the approach of “parsing” for a

tag by simply treating an XML document as a string of symbols is dangerous and,

based on my experience, should never be allowed. But this is how company’s

processes were done, XML data processed and as a DBA I hardly had any other

choices to make a quick data fix. In generally, this article is how effectively

works with TEXT columns and XML document in my case provides simple and an

effective way for the illustration.

First, the data from the text column should be parsed. Both

SUBSTRING and PATINDEX functions can be used to parse data with less than 8000

bytes. This is the case for most XML tags or any other forms of parsable data.

SUBSTRING returns a part of a character, binary, text, or

image expression. For more information about the valid Microsoft® SQL Server™

data types that can be used with this function consult the manual. PATINDEX

returns the starting position of the first occurrence of a pattern in a

specified expression, or zeros if the pattern is not found, on all valid text

and character data types.

Create table T_XMLData (client_id int, XMLTXT text)

Create table MAINDATA (client_id int, name varchar(25), address varchar(100), rating int)

Let assume that XML was prepared by using the MAINDATA table, the XML tag is called <rating> and the end of the tag is </rating>. To find the rating for each client in XML:

Declare @ratings table (client_id int, rating varchar(1000) )
Declare @tag varchar(50), @tagPAT varchar(50), @tagend varchar(50), @tagendPAT varchar(50) 
select @tag = '<rating>'
    ,  @tagPAT = '%<rating>%' 
 , @tagend = '</rating>'
 ,  @tagendPAT = '%</rating>%' 
insert into @ratings ( client_id , rating)
SELECT client_id, SUBSTRING (XMLTXT,
                  -- where to start
                  PATINDEX(@tagPAT, XMLTXT) + len(@tag) ,
                  -- how many characters to take
                  PATINDEX(@tagendPAT, XMLTXT) - PATINDEX(@tagPAT,
                  XMLTXT)  - len(@tag) 
                  )
 FROM T_XMLData    
 where PATINDEX(@tagPAT, XMLTXT) > 0 

Now we can do comparison between actual data and MAINDATA tables. All we need the records which have the rating in MAINDATA table and different from @ratings table.

declare @difrating table (client_id int primary key, rating int) 
Insert into @difrating(client_id, rating)
Select m.client_id, m.rating
 From MAINDATA m  
      Left outer join @ratings r
    on r.client_id = m.client_id 
          and cast(r.rating as int) <> m.rating
 Where m.rating is not null 

Next step is to update text column with new XML tag value. To update one row the following information is necessary for UPDATETEXT function:

Text pointer value (returned by the TEXTPTR function) that points to the text data to be updated; zero-based starting position for the update; length of data to delete from the existing text column;and the data to be inserted into the existing text column at the specified location.

In addition, it is necessary to know the other tag name after which the <rating> tag will be inserted in case of <rating> tag is not exist in XML string.

Declare @InsertAftertagName varchar(50), @InsertAftertagName_PAT varchar(50)
      , @minid int, @maxid int, @PtrVar varbinary(100)
, @InsertPos int, @DeleteLen int, @replacetext varchar(6000)
declare @tag1 table(tid int identity(1,1), client_id int primary key
    , ptr varbinary(100), InsertPos int, deletelen int
 , replacetext varchar(6000)) 
select @InsertAftertagName = '</name>',  @InsertAftertagName_PAT = '%</name>%' 
Insert into @tag1(ptr, client_id, InsertPos, deletelen, replacetext)
  SELECT  TEXTPTR(xd.XMLTXT),  -- text pointer
          xd.client_id,                     
          -- insert position
          CASE  
             WHEN PATINDEX(@tagPAT, xd.XMLTXT) = 0 
       THEN PATINDEX(@InsertAftertagName_PAT, XMLTXT) + len(@InsertAftertagName) - 1  
 -- no tag
             ELSE PATINDEX(@tagPAT, XMLTXT) - 1           
             -- tag exist           
          END ,
          -- delete length
 CASE
             WHEN PATINDEX(@tagPAT, XMLTXT) = 0 
    THEN 0 
                -- no tag
             ELSE PATINDEX(@tagendPAT, XMLTXT) + len(@tagend) - PATINDEX(@tagPAT, XMLTXT)
    END,
          -- value
          @tag + cast(r.Rating as varchar) + @tagend          
 FROM T_XMLDATA xd    
      inner join @ratings r   
        on r.client_id = xd.client_id
 where
    -- When no @tag tag then another existing tag. Insertion will be done after the existing tag 
    CASE
       WHEN PATINDEX(@tagPAT, XMLTXT) = 0 
   THEN PATINDEX(@InsertAftertagName_PAT, xd.XMLTXT) 
 ELSE PATINDEX(@tagendPAT, XMLTXT)      
 END > 0 

As you can see from the script the update will be done to update the existing <rating> tag with data or to add <rating> tag to the XML string if tag <rating> is not exist. Now let’s update row by row

the XML statement.

select @minid = min(tid),@maxid = max(tid) from @tag1 
while (@minid <= @maxid)
 begin
   select @PtrVar = ptr
   ,   @InsertPos = InsertPos
,  @DeleteLen = DeleteLen
,  @replacetext = replacetext
    from @tag1   where tid = @minid
   UPDATETEXT T_xmlData  --table.column
      @PtrVar          --pointervalue
      @InsertPos      --start
      @DeleteLen         --length to delete data your replacing
      @replacetext     
set @minid = @minid + 1
 end

Note that UPDATETEXT does not cause triggers to fire like a normal UPDATE statement would. It is a good and a bad news at the same time. The good news is that you don't have to disable a trigger, if there is a trigger based auditing mechanism. The bad news is that if the change is required to be audited then you have to create additional statement for the audit and place both UPDATETEXT and INSERT into an audit table in transactional batch or a stored procedure.

Another note that function SUBSTRING is taking only 8000 characters. This means, if you need to get information from text field and you don’t know if the same string may appeared multiple times, you need to parse the whole text field in 8000 bytes chunks. For example, the same tag can be repeated many times in XML string.

Declare @datalen int, @parsedlen int, @str varchar(100)
      , @parsedval varchar(8000), @datalocation int
Set @parsedlen = 1
Set @str = 'ratings' 
 
Select @datalen = DATALENGTH(XMLTXT) from T_XMLData where client_id = 100 
WHILE (@parsedlen <= @datalen + 8000)
 Begin
   Select @parsedval = substring(XMLTXT, @parsedlen, 8000)
 from T_XMLData 
 where client_id = 100 
   Set @datalocation = CHARINDEX(@str, @parsedval, 1) 
   -- ***********************************************************************************************
   -- in this section you can save the position of the string in a temporary table and then use it 
   -- for UPDATETEXT statement
   select @datalocation
   --************************************************************************************************
   Set @parsedlen = @parsedlen + 8000
 End 

As you can see, it is possible to parse TEXT columns but it requires additional work to do. And the parsing can be done for the whole table not an individual row.

Declare @datalen int, @parsedlen int, @str varchar(100)
      , @parsedval varchar(8000), @datalocation int
Declare @datalentbl table (client_id int primary key, datalen int, XMLTXT text, tid int identity(1,1))
Declare @parsevaltbl table (client_id int, datalocation int)
Create table #parsed8000 (client_id int , parsedval varchar(8000) ) 
Set @parsedlen = 1,  @str =  'rating'
 -- get length of the text column for all rows
Insert into @datalentbl (client_id, datalen, XMLTXT )
 Select client_id, DATALENGTH(XMLTXT), XMLTXT
  from T_XMLData
Select @datalen = max(datalen) from @datalentbl 
WHILE (@parsedlen <= @datalen)
 Begin
   Insert into #parsed8000 (client_id, parsedval)  
Select client_id, substring(XMLTXT, @parsedlen, 8000) 
 from @datalentbl 
   Insert into @parsevaltbl (client_id , datalocation )  
Select client_id, CHARINDEX(@str, parsedval, 1) 
 from #parsed8000    
 
   truncate table #parsed8000
   
   Set @parsedlen = @parsedlen + 8000 
 End 

At the end the table @parsevaltbl has all the positions for the specified string in a text field for all client id’s. Now, you can use it for the TEXTUPDATE or any other purposes.

What if an application or a back end process is required to update completely the value of a text field? There are few ways you can do it.

WRITETEXT function permits nonlogged, interactive updating of an existing text, ntext, or image

column. This statement completely overwrites any existing data in the column it affects. UPDATETEXT can be used as well to update the value completely from position 1 to position defined by function DATALENGTH. And the last choice is the simple UPDATE function. Modifying a text, ntext, or image column with UPDATE initializes the column, assigns a valid text pointer to it, and allocates at least one data page unless updating the column with NULL. The UPDATE statement is logged. If you are replacing or modifying large blocks of text, ntext, or image data, it is better to use the WRITETEXT or UPDATETEXT statement instead of the UPDATE statement. The WRITETEXT and UPDATETEXT statements are not logged.

Now, there is no way to define the variable type of text. So, how we can INSERT or UPDATE a string value with more than 8000 characters? You cannot declare or manipulate a variable of TEXT data-type in a function or stored-procedure. But you can have a TEXT variable as a parameter to the stored procedure. For example,

Create procedure proc_1  
    @client_id int, @textvar text
as
Begin
   Insert into T_XMLDATA (client_id, XMLTXT) values (@client_id, @textvar)
End

There is another method, even though it should be used with caution. For example, lets assume you need to create an XML document from a note stored in the following table :

 
Create table notes (note_id int, note_seq_no int, note varchar(8000) )

As you can see, the note length can be over 8000 characters. You can create a temporary table with text column and update the notes in chunks of 8000 bytes. At the very end update or insert the values into permanent table with one UPDATE/INSERT statement.

Conclusion

There are many ways to process TEXT data. In many cases there are no big performance degradation in text vs varchar columns if data selected/updated by id column and update is done for one column only. But you can’t update multiple text columns altogether and should do it one by one. I would not recommend using text data type for the table column instead of varchar unless it is absolutely necessary. Let me point out again that the whole idea of processing XML changes by hand is really preposterous, and the approach of “parsing” for a tag by simply treating an XML document as a string of symbols is dangerous.

Rate

5 (3)

You rated this post out of 5. Change rating

Share

Share

Rate

5 (3)

You rated this post out of 5. Change rating