FULL TEXT query - specifying multiple words to search using variable

  • Hi,

    I've been reading up on how to use full-text indexing in SQL Server 2005 following on from my previous post earlier today because I was using LIKE '%' + @Param1 + '%' to perform searching and I'm looking for ways to improve performance.

    After following Microsoft's documentation, I've managed to enable full text indexing on the database, create the full text catalogue and create a full text index on the table/field that I'm searching. I've changed my stored procedure WHERE clause from:

    [font="Courier New"]WHERE Field1 LIKE '%' + @Param1 + '%'[/font]

    to:

    [font="Courier New"]WHERE CONTAINS(Field1, @Param1)[/font]

    This worked great until @Param1 contained more than 1 word - then I get an 'incorrect syntax' error. Full text indexing seems to be really clever in the way it performs searching, but in this instance I want to be really simple and just say find me all records with this string in. I've found examples in the documentation when not using variables as follows:

    [font="Courier New"]WHERE CONTAINS(Field1, ' "My search words" ')[/font]

    but in order to do this with my parameter, I'll need to concatenate a string and execute it won't I?

    I'd really appreciate it if anyone could spare a couple of moments to impart their superior knowledge on this subject.

    Kind regards,

    James

  • Sorry, just had a thought; should I be formatting the search string before calling the stored procedure? So in the code that calls the stored proc, I could change 'My search text' to 'My AND search AND text'

  • Hi,

    If you want use multiple words inside contains keyword then your string should be like

    where contains(field1,' "firstword" or "secondword"')

    So if you want to use parameterized value then you have to concatenate the string using your logic before using inside the contains keyword.

    If you are passing the value from external program before pass it to procedure please do your logic and send the concatenate string to the procedure...

    So your passed value should be "firstword" or "secondword". Otherwise you can do your logic inside the procedure also

  • Hi,

    Thanks for your reply! I'll format the string before I pass it into the procedure then.

    One other question if you wouldn't mind answering it; Some of the strings I'm searching contain numbers, for example '... 5 day ...'. If I search using the following:

    WHERE CONTAINS(Field1, ' "5 AND day" ')

    I don't get any matches back. Does the full text index discard numeric information in the same way as puncuation, and if so, is there any way to include it?

    Many thanks in advance,

    James

  • Hi,

    If you want to use contains then you can't use like what you did it.

    where contains(@field1,'"5 AND day"') -- This is wrong, because in full text search every word should be within double quotes.

    So it should be like

    where contains(@field1,' "5" and "day" ') -- This is correct way to match the record for your scenario...

    Please keep in mind when ever if you use contains then every word should be within double quotes and based on the requirement you can " OR or AND" between the keywords.

    And one more thing if you want to search against multiple columns

    you can use

    where contains((@field1,@field2),' "5" and "day" ') -- Columns should in the bracket. It will search keyword matches in both columns.

  • saravanrp (6/2/2008)


    Hi,

    If you want to use contains then you can't use like what you did it.

    where contains(@field1,'"5 AND day"') -- This is wrong, because in full text search every word should be within double quotes.

    So it should be like

    where contains(@field1,' "5" and "day" ') -- This is correct way to match the record for your scenario...

    Please keep in mind when ever if you use contains then every word should be within double quotes and based on the requirement you can " OR or AND" between the keywords.

    And one more thing if you want to search against multiple columns

    you can use

    where contains((@field1,@field2),' "5" and "day" ') -- Columns should in the bracket. It will search keyword matches in both columns.

    Hi,

    Thanks for taking time to respond. Unfortunately I'm getting the same results by placing the "5" and "day" in quotes. If I search for just "Day" I get several records back, for example:

    ...3 day...

    ...4 day...

    ...5 day...

    If I change the search to CONTAINS(Field1, ' "3" AND "day" ') then I don't get any results back. It really doesn't seem to like those numbers. Any ideas?

    Thanks,

    James

  • Hi,

    The reason why you are not getting the result is that the record which matches "day", doesnt have "5" in that record and vice-versa.

    So if you try with OR instead of AND you will get the records both matches the keywords.

    If you your noise file contains "5" in that case it will never match the record.

    You check it inside the noise file

    C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTDataoiseENG.txt

    (Check the root folder where you hv installed SQL server)

    If "5" is present in that file remove it and repopulate the fulltext catalog and then fire the query with

    contains(@field1,'"5" and "day"')

    Am i anwsered to your question?

  • Hi,

    Thanks for taking the time to reply to this - the noise file definitely makes sense, I just didn't know it existed! I'll give this a try when I'm next in work.

    Thanks again,

    James

  • Several things (from experience).

    1. Note that the CONTAINS function can utilize a variable. See the BOL. So you can build the search string in your application and then pass it into the stored procedure. This will help prevent SQL injection, etc.

    2. I strongly suggest that you enable the "transform noise words option" and understand what the option does. See the BOL. You will get results from your queries, but you may also get what is known as "false positives".

    3. You need to be aware of what is contained in the "noise word" list for each language. If the words become problematic, then eliminating them may be your only choice.

    4. In order to invoke stemming (finding plurals, etc.) which is controlled through the INFLECTIONAL word in the FORMSOF option, you'll need to provide a language (LCID is best). But the language will need to match what has been indexed. For example, if you are building a system that will be multi-national, then you'll need to provide a way of passing the language of the client into your stored procedure and then using that with your CONTAINS query.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • JohnG (6/3/2008)


    Several things (from experience).

    1. Note that the CONTAINS function can utilize a variable. See the BOL. So you can build the search string in your application and then pass it into the stored procedure. This will help prevent SQL injection, etc.

    2. I strongly suggest that you enable the "transform noise words option" and understand what the option does. See the BOL. You will get results from your queries, but you may also get what is known as "false positives".

    3. You need to be aware of what is contained in the "noise word" list for each language. If the words become problematic, then eliminating them may be your only choice.

    4. In order to invoke stemming (finding plurals, etc.) which is controlled through the INFLECTIONAL word in the FORMSOF option, you'll need to provide a language (LCID is best). But the language will need to match what has been indexed. For example, if you are building a system that will be multi-national, then you'll need to provide a way of passing the language of the client into your stored procedure and then using that with your CONTAINS query.

    John,

    The transform noise words option is a top tip; thanks for that! Thanks for your other comments too - very helpful.

    Cheers,

    James

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply