String occurence in a ntext column

  • I'm trying to find a way to count how many times a string 'xyz' occurs in a ntext column. The column is full-text indexed but that just gives whether the string is present or not. Also, I can't use CHARINDEX to evaluate the length as it doesn't work with this datatype. Any suggestions ?

    We use this column to store MS Word documents in a SQL Server 7 database (alas ... no iFilter for Word before SQL 2000).

  • One way

    SELECT COUNT(*)

    FROM

    a

    INNER JOIN [Number] n

    ON SUBSTRING(a.[ntextcolumn],n.N,3) = N'xyz'

    AND n.N < (DATALENGTH(a.x) / 2)

    providing number table contains numbers upto max len of column

    not sure how efficient it would be on large data

    Far away is close at hand in the images of elsewhere.
    Anon.

  • I'm confused when you say Charindex can't be used on this datatype. The following script returns 15.

    CREATE TABLE tblNText

     (

     ntxColumn ntext

      )

    DECLARE

     @vchSearchString varchar(50)

     ,@bitIsDone   bit

     ,@intPosition  int

     ,@intOccurrences int

    SET @vchSearchString = 'SQL'

    INSERT INTO

     tblNText

    VALUES

     (

     'This SQL insert, which is a SQL command, into a SQL column in a SQL table is an SEO gaming SQL string. SQL is as SQL does, which is something that all SQL experts, which are people who know SQL well, and even SQL moderates, as well as SQL neophytes knows. SQL SQL SQL, I can''t seem to get enough SQL.'

      )

    SET @bitIsDone = 0

    SET @intOccurrences = 0

    SET @intPosition = 0

    WHILE @bitIsDone = 0

    BEGIN

     

     SET @intPosition = (SELECT Charindex(@vchSearchString,ntxColumn,@intPosition + 1) FROM tblNText)

     IF @intPosition > 0

      SET @intOccurrences = @intOccurrences + 1

     ELSE

      SET @bitIsDone = 1

    END

    SELECT @intOccurrences

  • quoteI'm confused when you say Charindex can't be used on this datatype

    Because CHARINDEX only works with short 'character data types' (char,varchar,nchar,nvarchar) and SQL will do an implicit CAST to nvarchar and truncate the data at 8000 bytes (4000 chars)

    Far away is close at hand in the images of elsewhere.
    Anon.

  • Ah, I wasn't aware that that conversion takes place (and that's one reason that I hate implicit conversions), and BOL doesn't note that fact in the CHARINDEX entry. I did do testing, and you are indeed correct. My count stops at 199 no matter how many times I repeat my sample string.

     

  • Thank you all for your help. I think I'm going to abandon the idea of implementing this new criteria in our search engine as it proves to be to slow (the search runs through 80'000 word documents which can contain up to 10 pages) and would, in conjunction with the other criteria, cause a timeout when the query runs.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply