Full-Text query gotcha's

  • ckempste

    SSCoach

    Points: 17983

    Hi all

    Some gotchas I have come accross to date:

    a) Server: Msg 7619, Level 16, State 1, Procedure SearchOne_BusinessName, Line 106 A clause of the query contained only ignored words.

    This is a classic, the CONTAINSTABLE etc clauses will fail with this

    error when the noise word analysis applies its fix to the query clause.

    I like to use t-sql stored procs to run the queries, unfortunatly though, with

    its total lack of exception handling (unlike oracles pl-sql) its hard to trap such errors elegantly. In the mean time, I have used a table of my own noise words, and so something like:

    exec ApplyNoiseWordFix @what, @what output

    where @what is what the user was searching for. I then do a simple check to determine if the string is null or blank and apply appropriate business logic.

    b) Using CONTAINS or C0NTAINSTABLE

    I have a stored proc: sp_SearchOverKeywords @what

    if @what => cafes

    then this works all well and fine, but

    if @what => baby clothes

    then the query will crash and burn. You need to simple wrapper the @what in "". This brings about some interesting questions, such as:

    is @what meaning "baby AND clothes"

    or "baby NEAR clothes"

    or "baby OR clothes"

    and typically you end up writing an "advanced query" function on your web-site/app or thorough help to allow more experienced users to play.

    I apply a range of string manipulation functions to alter the search to the following:

    "'baby*' NEAR 'cloth*'"

    apparently the inflectional clause can be applied by using ** rather than the INFLECTIONAL clause, but it doesnt seem to work. I have yet to work out how to effectively use it and have the above query flexibility. Any help on this area would be great.

    b) I would recommend CONTAINS and CONTAINSTABLE as the key method of quering in virtually all cases. FULLTEXT is ok for sentence type queries or long one-liners, but for keyword searches its poor. I have had a variety of cases where unless you spend the time (like above) with * etc, you can get some strange results back with unusually high rankings.

    More to come as I explore Full-text

    Cheers

    Chris


    Chris Kempster
    www.chriskempster.com
    Author of "SQL Server Backup, Recovery & Troubleshooting"
    Author of "SQL Server 2k for the Oracle DBA"

  • Hilary

    Grasshopper

    Points: 11

    a) to get around the noise word problem you have to parse at the client. I have a javascript search page that does this. Please contact me off line if you are interested. The interesting thing is that the noise word lists that Microsoft gives you are really inadequate for any industrial strenght search solution. If you run a perl script against your content and count the number for words that occur in your table and then remove all words that your customers might ever search on your get 1) faster response 2) smaller indexes 3) you get hundereds more noise words problems. My noise word lists are in excess of 14,000 words for larger tables, so I parse the search string on the server as well. Again contant me off line if you are interested in this solution.

    The prefix predicate is not equal to the inflectional predicate. The inflectional predicate will return matches to all declanations of the verb, or noun forms of the noun, or forms of the adjective. Prefix will match word fragments - ie run will match with runt using a wildcard, but not with an inflectional search as runt is not a part of the very run.

    b) for phrases you have to do this

    contains(*,'"baby" and "clothes"'). To get the inflectional query working you have to use FormsOf(inflectional, search_term) or you can simply use a Freetext which under the covers does an inflectional search. Keep in mind of course that the FreeText Search many times slower than the contains. However with a more efficient index - adding huge amounts of noise words to your noise list, you will find that the freetext predicate can approach the speed of the contains predicate.

    I recommend only using FreeTextTable or ContainsTable as the single item killing performance on FTS is the number of rows returned. The fewer rows you return the faster the response. FreetextTable and Containstable allow you to limit your results set.

    Another thing that will increase performance is if you use compression on your indexes. I find I get a 10% increase doing this.

  • jwiner

    SSCrazy

    Points: 2241

    Full text indexing uses a file called noise.enu. You can edit this file to remove or add noise words.

    Check out the article by Jon Winer about full-text indexing tips here on our site.

Viewing 3 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply