Full Text Search query and noise words

  • Hi coders (and administrators 😀 ),

    I am working on query with CONTAINSTABLE. Everything works well if search terms doesn't contain any noise word.

    If user searches for terms: Tom Jerry, it will return rows where these two terms are existing. So far so good.

    But, if user searches for, let say: Tom and Jerry, the CONTAINSTABLE will return 0 results.

    I understand why it happens, word "and" is so called noise word and not indexed in full text index and thus query cannot find it.

    But, user obviously expects to see something in results and query should return some rows.

    It would be much better IMHO :-), if noise words are ignored if visitor writes some in search text box, rather than they destroy search and returns 0 results.

    I am aware that there is an option to change noise words on SQL Server. I am trying to avoid that.

    Is there any way to make query which ignores noise words?

    How this problem is solved commonly?

    Currently, I have .Net code which removes noise words before query is built. I wonder if it is possible in T-SQL directly. :unsure:

    Thanks!

  • "Commonly" in SQL 2005 it's done by editing content of the Noise-word files.

    You should find these files in $SQL_Server_Install_Path\Microsoft SQL Server\MSSQL.1\MSSQL\FTDATAI'm not sure if you read this one yet:

    http://msdn.microsoft.com/en-us/library/ms142551(v=sql.90).aspx

    It's also quite common when these files are emptied, so you can perform full-text searches on, let say, Company names in combination with company's contacts for example (seen this kind of implementation in few CRM systems).

    In SQL2008 and 2012, it's managed via full-text search stoplists...

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • Eugene Elutin (11/28/2012)


    "Commonly" in SQL 2005 it's done by editing content of the Noise-word files.

    Hi Eugene, thanks for response!

    It looks like noise words are practically useless.

    I can't ask website visitors to not insert any noise words in search terms. Visitors don't know even that noise words exist. They just place some search terms and expect to get correct results.

    But, if visitor possibly writes some noise word, the search will return 0 results. That's crazy and confusing for user. 🙂

    Currently, I have a code which removes noise words from search terms before query is built, and that works well.

    On SQL Server side, it looks like only solution is to remove noise words completely and on that way add them to full text index.

    Ok, that's fine too, but I wonder is there any use of noise words. Logically, they should be ignored. Those are common words which are probably best to ignore. Because they are ignored, full text index will not index them. That part is fine.

    But on the other side, they are not ignored by CONTAINSTABLE queries. That is illogical and makes complete idea useless.

  • I just posted a parser procedure and full-text search procedure in another thread that may be relevant:

    http://www.sqlservercentral.com/Forums/Topic1389626-391-1.aspx#bm1391439

     

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply