UPDATETEXT giving inconsistent results

  • I have a table in a SQL Server 2000 DB that contains emails in HTML stored in as TEXT data. I am trying to parse out the HTML tags to leave only the text. I've tried various methods to no avail, and came up with the below code. I appreciate that I'm using a cursor (which is probably bad coding on my behalf) but the construct works when used with one record. What I am seeing is twofold; first the process seems to run fine for thousands of records, then for no apparent reason, it completely stops working, yet the query continues on through the records (but not removing the HTML tags). What is particulalry odd with this is that if I go back to the row that stopped working, and rerun the process for that emailID only, it may take a few attempts to actually have any impact on the text. The other problem is that on a few rows, the whole procedure errors out. The error is:

    Server: Msg 7135, Level 16, State 1, Line 57

    Deletion length -7908 is not in the range of available text, ntext, or image data.

    The statement has been terminated.

    The strange thing about the error is that (in the above case) that 7908 is the value of the position of the next pattern match and there IS a valid '>' after that match.

    SET NOCOUNT ON

    DECLARE cur_emails CURSOR

    READ_ONLY

    FOR

    SELECT emailID

    FROM emails_NOHTML

    ORDER BY emailID

    DECLARE @emailID int

    DECLARE @ptrval binary(16)

    DECLARE @startPos int

    DECLARE @strLen int

    OPEN cur_emails

    FETCH NEXT FROM cur_emails INTO @emailID

    WHILE (@@fetch_status <> -1)

    BEGIN

    IF (@@fetch_status <> -2)

    BEGIN

    SELECT @ptrval = TEXTPTR(EmailMessage)

    FROM emails_NoHTML

    WHERE EmailID = @emailID

    SET @startPos = 0

    SET @strLen = 0

    SELECT

    @startPos = patindex('% %', EmailMessage)

    FROM emails_NoHTML

    WHERE EmailID = @emailID

    WHILE @startPos > 0

    BEGIN

    SELECT

    @startPos = patindex('% %', EmailMessage) -1 ,

    @strLen = charindex('>', EmailMessage, @startPos) - @startPos

    FROM emails_NoHTML

    WHERE EmailID = @emailID

    UPDATETEXT

    emails_NoHTML.EmailMessage

    @ptrval

    @startPos

    @strLen

    SELECT

    @startPos = patindex('% %', EmailMessage) -1

    FROM emails_NoHTML

    WHERE EmailID = @emailID

    END

    END

    FETCH NEXT FROM cur_emails INTO @emailID

    END

    CLOSE cur_emails

    DEALLOCATE cur_emails

    GO

    Any help, or alternate approaches would be great appreciated.

    Steve

  • I figured it out. CHARINDEX is subject to the 8k limit (even though is works on the TEXT data type). Once my PATINDEX was finding pattern matches after 8k, the CHARINDEX was returning 0, so the UPDATETEXT was getting a minus value for one of the parameters.

    I've rewritten the code to check for 0 results from the CHARINDEX and to move on if it does.

    Anyone know of a way to have PATINDEX start searching at a certain spot (like CHARINDEX does)...?

    Thanks,

    Steve

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply