Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

Manipulating Data in TEXT Type Columns

By Leo Peysakhovich,

I was working recently on the following problem. The value stored in a number of different columns is an XML document which needed to be replaced with a yet another XML data It was due to some unfinished nightly processes. The normal REPLACE statement against TEXT type columns is not working, turning the whole thing in a bit of a challenge. There was an additional complication - the tag may or may not exist in the saved XML string. Because the large number of rows the data should be updated only for the columns where information is not matching with the data in master table. In this article I will demonstrate how to parse, update partially or replace completely text columns data, or to update text columns with the same or different values.

Let me point out that the whole idea of processing XML changes by hand is really preposterous, and the approach of “parsing” for a tag by simply treating an XML document as a string of symbols is dangerous and, based on my experience, should never be allowed. But this is how company’s processes were done, XML data processed and as a DBA I hardly had any other choices to make a quick data fix. In generally, this article is how effectively works with TEXT columns and XML document in my case provides simple and an effective way for the illustration.

First, the data from the text column should be parsed. Both SUBSTRING and PATINDEX functions can be used to parse data with less than 8000 bytes. This is the case for most XML tags or any other forms of parsable data.

SUBSTRING returns a part of a character, binary, text, or image expression. For more information about the valid Microsoft® SQL Server™ data types that can be used with this function consult the manual. PATINDEX returns the starting position of the first occurrence of a pattern in a specified expression, or zeros if the pattern is not found, on all valid text and character data types.

Create table T_XMLData (client_id int, XMLTXT text)
Create table MAINDATA (client_id int, name varchar(25), address varchar(100), rating int) 

Let assume that XML was prepared by using the MAINDATA table, the XML tag is called <rating> and the end of the tag is </rating>. To find the rating for each client in XML:

Declare @ratings table (client_id int, rating varchar(1000) )
Declare @tag varchar(50), @tagPAT varchar(50), @tagend varchar(50), @tagendPAT varchar(50) 

select @tag = '<rating>'
    ,  @tagPAT = '%<rating>%' 
	 , @tagend = '</rating>'
	 ,  @tagendPAT = '%</rating>%' 

insert into @ratings ( client_id , rating)
SELECT client_id, SUBSTRING (XMLTXT,
                  -- where to start
                  PATINDEX(@tagPAT, XMLTXT) + len(@tag) ,
                  -- how many characters to take
                  PATINDEX(@tagendPAT, XMLTXT) - PATINDEX(@tagPAT,
                  XMLTXT)  - len(@tag) 
                  )
 FROM T_XMLData    
 where PATINDEX(@tagPAT, XMLTXT) > 0 

Now we can do comparison between actual data and MAINDATA tables. All we need the records which have the rating in MAINDATA table and different from @ratings table.

declare @difrating table (client_id int primary key, rating int) 

Insert into @difrating(client_id, rating)
Select m.client_id, m.rating
 From MAINDATA m  
      Left outer join @ratings r
		    on r.client_id = m.client_id 
          and cast(r.rating as int) <> m.rating
 Where m.rating is not null 

Next step is to update text column with new XML tag value. To update one row the following information is necessary for UPDATETEXT function:

Text pointer value (returned by the TEXTPTR function) that points to the text data to be updated; zero-based starting position for the update; length of data to delete from the existing text column;and the data to be inserted into the existing text column at the specified location.

In addition, it is necessary to know the other tag name after which the <rating> tag will be inserted in case of <rating> tag is not exist in XML string.

Declare @InsertAftertagName varchar(50), @InsertAftertagName_PAT varchar(50)
      , @minid int, @maxid int, @PtrVar varbinary(100)
		, @InsertPos int, @DeleteLen int, @replacetext varchar(6000)

declare @tag1 table(tid int identity(1,1), client_id int primary key
	    , ptr varbinary(100), InsertPos int, deletelen int
		 , replacetext varchar(6000)) 

select @InsertAftertagName = '</name>',  @InsertAftertagName_PAT = '%</name>%' 

Insert into @tag1(ptr, client_id, InsertPos, deletelen, replacetext)
  SELECT  TEXTPTR(xd.XMLTXT),  -- text pointer
          xd.client_id,                     
          -- insert position
          CASE  
             WHEN PATINDEX(@tagPAT, xd.XMLTXT) = 0 
   			    THEN PATINDEX(@InsertAftertagName_PAT, XMLTXT) + len(@InsertAftertagName) - 1  
					 -- no tag
             ELSE PATINDEX(@tagPAT, XMLTXT) - 1           
             -- tag exist           
          END ,
          -- delete length
			 CASE
             WHEN PATINDEX(@tagPAT, XMLTXT) = 0 
				    THEN 0 
                -- no tag
             ELSE PATINDEX(@tagendPAT, XMLTXT) + len(@tagend) - PATINDEX(@tagPAT, XMLTXT)
		    END,
          -- value
          @tag + cast(r.Rating as varchar) + @tagend          
 FROM T_XMLDATA xd    
      inner join @ratings r   
        on r.client_id = xd.client_id
 where
    -- When no @tag tag then another existing tag. Insertion will be done after the existing tag 
    CASE
       WHEN PATINDEX(@tagPAT, XMLTXT) = 0 
		   THEN PATINDEX(@InsertAftertagName_PAT, xd.XMLTXT) 
		 ELSE PATINDEX(@tagendPAT, XMLTXT)      
	 END > 0 

As you can see from the script the update will be done to update the existing <rating> tag with data or to add <rating> tag to the XML string if tag <rating> is not exist. Now let’s update row by row the XML statement.

select @minid = min(tid),@maxid = max(tid) from @tag1 

while (@minid <= @maxid)
 begin
   select @PtrVar = ptr
	   ,   @InsertPos = InsertPos
		,  @DeleteLen = DeleteLen
		,  @replacetext = replacetext
    from @tag1   where tid = @minid

   UPDATETEXT T_xmlData  --table.column
      @PtrVar      	    --pointervalue
      @InsertPos     	 --start
      @DeleteLen         --length to delete data your replacing
      @replacetext     

	set @minid = @minid + 1
 end

Note that UPDATETEXT does not cause triggers to fire like a normal UPDATE statement would. It is a good and a bad news at the same time. The good news is that you don't have to disable a trigger, if there is a trigger based auditing mechanism. The bad news is that if the change is required to be audited then you have to create additional statement for the audit and place both UPDATETEXT and INSERT into an audit table in transactional batch or a stored procedure.

Another note that function SUBSTRING is taking only 8000 characters. This means, if you need to get information from text field and you don’t know if the same string may appeared multiple times, you need to parse the whole text field in 8000 bytes chunks. For example, the same tag can be repeated many times in XML string.

Declare @datalen int, @parsedlen int, @str varchar(100)
      , @parsedval varchar(8000), @datalocation int

Set @parsedlen = 1
Set @str = 'ratings' 
 
Select @datalen = DATALENGTH(XMLTXT) from T_XMLData where client_id = 100 

WHILE (@parsedlen <= @datalen + 8000)
 Begin
   Select @parsedval = substring(XMLTXT, @parsedlen, 8000)
	 from T_XMLData 
	 where client_id = 100 

   Set @datalocation = CHARINDEX(@str, @parsedval, 1) 

   -- ***********************************************************************************************
   -- in this section you can save the position of the string in a temporary table and then use it 
   -- for UPDATETEXT statement
   select @datalocation
   --************************************************************************************************
   Set @parsedlen = @parsedlen + 8000
 End 

As you can see, it is possible to parse TEXT columns but it requires additional work to do. And the parsing can be done for the whole table not an individual row.

Declare @datalen int, @parsedlen int, @str varchar(100)
      , @parsedval varchar(8000), @datalocation int

Declare @datalentbl table (client_id int primary key, datalen int, XMLTXT text, tid int identity(1,1))
Declare @parsevaltbl table (client_id int, datalocation int)
Create table #parsed8000 (client_id int , parsedval varchar(8000) ) 

Set @parsedlen = 1,  @str =  'rating'
 -- get length of the text column for all rows
Insert into @datalentbl (client_id, datalen, XMLTXT )
 Select client_id, DATALENGTH(XMLTXT), XMLTXT
  from T_XMLData
Select @datalen = max(datalen) from @datalentbl 

WHILE (@parsedlen <= @datalen)
 Begin
   Insert into #parsed8000 (client_id, parsedval)  
	Select client_id, substring(XMLTXT, @parsedlen, 8000) 
	 from @datalentbl 

   Insert into @parsevaltbl (client_id , datalocation )  
	Select client_id, CHARINDEX(@str, parsedval, 1) 
	 from #parsed8000		    
 
   truncate table #parsed8000
   
   Set @parsedlen = @parsedlen + 8000 
 End 

At the end the table @parsevaltbl has all the positions for the specified string in a text field for all client id’s. Now, you can use it for the TEXTUPDATE or any other purposes.

What if an application or a back end process is required to update completely the value of a text field? There are few ways you can do it.

WRITETEXT function permits nonlogged, interactive updating of an existing text, ntext, or image column. This statement completely overwrites any existing data in the column it affects. UPDATETEXT can be used as well to update the value completely from position 1 to position defined by function DATALENGTH. And the last choice is the simple UPDATE function. Modifying a text, ntext, or image column with UPDATE initializes the column, assigns a valid text pointer to it, and allocates at least one data page unless updating the column with NULL. The UPDATE statement is logged. If you are replacing or modifying large blocks of text, ntext, or image data, it is better to use the WRITETEXT or UPDATETEXT statement instead of the UPDATE statement. The WRITETEXT and UPDATETEXT statements are not logged.

Now, there is no way to define the variable type of text. So, how we can INSERT or UPDATE a string value with more than 8000 characters? You cannot declare or manipulate a variable of TEXT data-type in a function or stored-procedure. But you can have a TEXT variable as a parameter to the stored procedure. For example,

Create procedure proc_1  
    @client_id int, @textvar text
as
Begin
   Insert into T_XMLDATA (client_id, XMLTXT) values (@client_id, @textvar)
End

There is another method, even though it should be used with caution. For example, lets assume you need to create an XML document from a note stored in the following table :

 
Create table notes (note_id int, note_seq_no int, note varchar(8000) )

As you can see, the note length can be over 8000 characters. You can create a temporary table with text column and update the notes in chunks of 8000 bytes. At the very end update or insert the values into permanent table with one UPDATE/INSERT statement.

Conclusion

There are many ways to process TEXT data. In many cases there are no big performance degradation in text vs varchar columns if data selected/updated by id column and update is done for one column only. But you can’t update multiple text columns altogether and should do it one by one. I would not recommend using text data type for the table column instead of varchar unless it is absolutely necessary. Let me point out again that the whole idea of processing XML changes by hand is really preposterous, and the approach of “parsing” for a tag by simply treating an XML document as a string of symbols is dangerous.

Total article views: 16337 | Views in the last 30 days: 8
 
Related Articles
FORUM

Updating varchar(max) column

Updating varchar(max) column

FORUM

SQL Native Client issue with varchar(max)

SQL Native Client issue with varchar(max)

SCRIPT

Aggregrating varchar columns

Concatenate columns grouping on Index column (approach for summing varchar columns by grouping on in...

FORUM

updating a column

updating a column

FORUM

Altering column from varchar(2048) to varchar(max) performance

Altering column datatype to varchar(max)

Tags
advanced querying    
t-sql    
 
Contribute

Join the most active online SQL Server Community

SQL knowledge, delivered daily, free:

Email address:  

You make SSC a better place

As a member of SQLServerCentral, you get free access to loads of fresh content: thousands of articles and SQL scripts, a library of free eBooks, a weekly database news roundup, a great Q & A platform… And it’s our huge, buzzing community of SQL Server Professionals that makes it such a success.

Join us!

Steve Jones
Editor, SQLServerCentral.com

Already a member? Jump in:

Email address:   Password:   Remember me: Forgotten your password?
Steve Jones