March 31, 2009 at 10:29 am
I have a table in a SQL Server 2000 DB that contains emails in HTML stored in as TEXT data. I am trying to parse out the HTML tags to leave only the text. I've tried various methods to no avail, and came up with the below code. I appreciate that I'm using a cursor (which is probably bad coding on my behalf) but the construct works when used with one record. What I am seeing is twofold; first the process seems to run fine for thousands of records, then for no apparent reason, it completely stops working, yet the query continues on through the records (but not removing the HTML tags). What is particulalry odd with this is that if I go back to the row that stopped working, and rerun the process for that emailID only, it may take a few attempts to actually have any impact on the text. The other problem is that on a few rows, the whole procedure errors out. The error is:
Server: Msg 7135, Level 16, State 1, Line 57
Deletion length -7908 is not in the range of available text, ntext, or image data.
The statement has been terminated.
The strange thing about the error is that (in the above case) that 7908 is the value of the position of the next pattern match and there IS a valid '>' after that match.
SET NOCOUNT ON
DECLARE cur_emails CURSOR
READ_ONLY
FOR
SELECT emailID
FROM emails_NOHTML
ORDER BY emailID
DECLARE @emailID int
DECLARE @ptrval binary(16)
DECLARE @startPos int
DECLARE @strLen int
OPEN cur_emails
FETCH NEXT FROM cur_emails INTO @emailID
WHILE (@@fetch_status <> -1)
BEGIN
IF (@@fetch_status <> -2)
BEGIN
SELECT @ptrval = TEXTPTR(EmailMessage)
FROM emails_NoHTML
WHERE EmailID = @emailID
SET @startPos = 0
SET @strLen = 0
SELECT
@startPos = patindex('% %', EmailMessage)
FROM emails_NoHTML
WHERE EmailID = @emailID
WHILE @startPos > 0
BEGIN
SELECT
@startPos = patindex('% %', EmailMessage) -1 ,
@strLen = charindex('>', EmailMessage, @startPos) - @startPos
FROM emails_NoHTML
WHERE EmailID = @emailID
UPDATETEXT
emails_NoHTML.EmailMessage
@ptrval
@startPos
@strLen
SELECT
@startPos = patindex('% %', EmailMessage) -1
FROM emails_NoHTML
WHERE EmailID = @emailID
END
END
FETCH NEXT FROM cur_emails INTO @emailID
END
CLOSE cur_emails
DEALLOCATE cur_emails
GO
Any help, or alternate approaches would be great appreciated.
Steve
March 31, 2009 at 9:41 pm
I figured it out. CHARINDEX is subject to the 8k limit (even though is works on the TEXT data type). Once my PATINDEX was finding pattern matches after 8k, the CHARINDEX was returning 0, so the UPDATETEXT was getting a minus value for one of the parameters.
I've rewritten the code to check for 0 results from the CHARINDEX and to move on if it does.
Anyone know of a way to have PATINDEX start searching at a certain spot (like CHARINDEX does)...?
Thanks,
Steve
Viewing 2 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply