A Google-like Full Text Search

  • Comments posted to this topic are about the item A Google-like Full Text Search

  • This is a gr8 stuff.....

  • Hi Mike,

    very useful article, especially code - thank you.

    I tested code and found one problem.

    If I entered phrase:

    president -aluminium

    it wokrs fine.

    If phrase is:

    -aluminium president

    it generates syntax error, phrase cannot start from negation. I checked goole it understand correctly this kind of phrases.

    How can I edit grammar to avoid this kind of errors?

    Thank you again.

    Regards,

    Oleg.

  • More time talking about a c# compiler than the SQL. No talk of the limitations of the FTS.

    Bit of a waste of time really.

    From title of piece I was hoping for something about combining site useage stats stored in SQL with the FTS results to actually make something Google like. Google is not defined by its syntax, which few users use, but by the relevancy of its results.

  • Good article.

    Actually, with Nautilus [/url]you can find a table just writing part of its name and once located you can find a record writing any word which is compared with like if it's a string and with "=" if its an integer.

    The query is built for all the fields:

    select top 10 *

    from dbo.Customers (nolock)

    where CustomerID like '%ana%' or CompanyName like '%ana%' or ContactName like '%ana%'...

    Once the record is located, you can see it's related records (following the FK links).

    There is also a "google like" search where you write a word or number and looks for it in the whole database.

    Not much like google, but it works for developers.

    Thanks, Marcos (http://sourceforge.net/projects/nautilus/[/url])

  • Bit of a waste of time really.

    That's fairly rude of you. The description of the article seems to fit and there are plenty of people that would like to use Google syntax to build search on FTS.

    Good job, Michael, in showing how FTS can be used in conjunction with other tools to help add search to your applications.

  • I 2nd that. I think there will be a demand for this sort of thing. I'm in a hurry nowadays, grappling for a wire to hold on to, so I wasn't able to download the source and test it. But you can bet that I am going to bookmark it. db:)

  • While I am still trying to digest this excellent article, I think I found a slight error.

    In The Grammar section, the article lists 7 rules. One of them is...

    PrimaryExpression ::= Term

    | '-' Term

    | Phrase

    | '(' OrExpression ')'

    | '&lt' ( Term | Phrase )+ '&gt'

    To more closely match the code in the Conversion Engine section, I think it should be more like the following.

    Note the addition of the ExactExpression option.

    PrimaryExpression ::= Term

    | '-' Term

    | ExactExpression

    | '(' OrExpression ')'

    | Phrase

    | '&lt' ( Term | Phrase )+ '&gt'

    A minor point, I know. I just want to make sure that I am not missing something here. 😀

    Paul DB

  • Steve Jones - Editor (10/7/2008)


    Bit of a waste of time really.

    That's fairly rude of you. The description of the article seems to fit and there are plenty of people that would like to use Google syntax to build search on FTS.

    Good job, Michael, in showing how FTS can be used in conjunction with other tools to help add search to your applications.

    Yes Steve, you are right.

    I work with FTS some years, and for me it is important because some customers would like to have this kind of syntx. Thanks.

  • marcosc (10/7/2008)


    Good article.

    Actually, with Nautilus [/url]you can find a table just writing part of its name and once located you can find a record writing any word which is compared with like if it's a string and with "=" if its an integer.

    The query is built for all the fields:

    select top 10 *

    from dbo.Customers (nolock)

    where CustomerID like '%ana%' or CompanyName like '%ana%' or ContactName like '%ana%'...

    Once the record is located, you can see it's related records (following the FK links).

    There is also a "google like" search where you write a word or number and looks for it in the whole database.

    Not much like google, but it works for developers.

    Thanks, Marcos (http://sourceforge.net/projects/nautilus/[/url])

    Sorry, but Full-Text search is another feature than LIKE operator.

  • Oleg (10/7/2008)


    Hi Mike,

    very useful article, especially code - thank you.

    I tested code and found one problem.

    If I entered phrase:

    president -aluminium

    it wokrs fine.

    If phrase is:

    -aluminium president

    it generates syntax error, phrase cannot start from negation. I checked goole it understand correctly this kind of phrases.

    How can I edit grammar to avoid this kind of errors?

    Thank you again.

    Regards,

    Oleg.

    Hi Oleg

    I fought with this issue myself and decided to "keep it simple" by defining a rule that the first token can't be preceded by a minus sign. The issue is that, owing to the simple nature of the recursive AST walker, the grammar maps fairly closely to the iFTS grammar. In IFTS grammar you have to use the AND NOT operator (there's no NOT without AND). Starting an iFTS query string with AND NOT ("AND NOT aluminum AND president" is not allowed in iFTS). You could use a more complex algorithm to rearrange the nodes of the AST to deal with this issue in a more user-friendly way, but I really wanted to keep the sample code simple for the article. I also considered using a more advanced algorithm like applying the visitor pattern to the AST, but again to keep it simple I decided to stick with the simple recursive tree walker.

    Thanks

    Mike C

  • Confucius247 (10/7/2008)


    More time talking about a c# compiler than the SQL. No talk of the limitations of the FTS.

    The article is not about the limitations of FTS, of which information abounds. A lot of these limitations are overcome in SQL 2008 iFTS, however. I didn't feel a strong need to rehash FTS limitations to demonstrate how to provide your users with a simpler interface to FTS.

    Bit of a waste of time really.

    I'm sorry you feel that way.

    From title of piece I was hoping for something about combining site useage stats stored in SQL with the FTS results to actually make something Google like. Google is not defined by its syntax, which few users use, but by the relevancy of its results.

    Combining site usage states stored in SQL with FTS results? I'm not sure what led you to believe this particular article addressed that, and I'm sorry you're disappointed. However, if relevance of results is important (and of course it always is) there are several opportunities to increase relevance in FTS. For instance you can use the proximity search to increase relevance (implemented in the grammar) or weighted seraches (not implemented in the grammar). FTS also provides relevance ranking via CONTAINSTABLE and FREETEXTTABLE. Unfortunately in a corporate intranet setting there's little chance you can duplicate Google's ability to count links back to a specific document or page to calculate a Google-style relevance score.

    As for "few users" using Google syntax, I strongly disagree. Google is the world's most popular search engine, and I would expect that nearly everyone who has used a Web-based search engine has at some point used Google. I would further venture that most users are familiar with the basics of Google syntax, which is why I recommend using it rather than forcing users to learn iFTS syntax to manually enter queries like the following:

    "aluminum" AND (FORMSOF(INFLECTIONAL, fish) OR FORMSOF(THESAURUS, money))

    The reasons for providing users with a simple syntax are discussed in the article.

    Thanks

    Mike C

  • marcosc (10/7/2008)


    Good article.

    Actually, with Nautilus [/url]you can find a table just writing part of its name and once located you can find a record writing any word which is compared with like if it's a string and with "=" if its an integer.

    The query is built for all the fields:

    select top 10 *

    from dbo.Customers (nolock)

    where CustomerID like '%ana%' or CompanyName like '%ana%' or ContactName like '%ana%'...

    Once the record is located, you can see it's related records (following the FK links).

    There is also a "google like" search where you write a word or number and looks for it in the whole database.

    Not much like google, but it works for developers.

    Thanks, Marcos (http://sourceforge.net/projects/nautilus/[/url])

    FTS actually builds an inverted index structure to search text fields, and allows some pretty complex searches that are more efficient than LIKE with leading wildcards, which can't take advantage of any indexes really. For tables with a large number of columns and a large number of rows, most of which aren't indexed at all, I would imagine the LIKE with leading wildcards could lead to serious performance issues.

    This actually sounds like it might be closer to the Google Base Data API Query Language (http://code.google.com/apis/base/query-lang-spec.html). The Base Data API Query Language actually gives you a lot of flexibility in performing searches, similar to what you're suggesting above, but would also have to be translated to SQL code at some point.

    Thanks

    Mike C

  • Paul DB (10/7/2008)


    While I am still trying to digest this excellent article, I think I found a slight error.

    In The Grammar section, the article lists 7 rules. One of them is...

    PrimaryExpression ::= Term

    | '-' Term

    | Phrase

    | '(' OrExpression ')'

    | '&lt' ( Term | Phrase )+ '&gt'

    To more closely match the code in the Conversion Engine section, I think it should be more like the following.

    Note the addition of the ExactExpression option.

    PrimaryExpression ::= Term

    | '-' Term

    | ExactExpression

    | '(' OrExpression ')'

    | Phrase

    | '&lt' ( Term | Phrase )+ '&gt'

    A minor point, I know. I just want to make sure that I am not missing something here. 😀

    I believe you're right. I made some slight changes to the code after writing the article and I probably wiped that line out by mistake. Good catch, and apologies!

    Mike C

  • Thanks for the great article. As soon as I read it. I started to implement full text search on our most used table.

    I have altered the code to use CONTAINSTABLE(tblsite,*,@ftsQuery) because I want to be able to search in many fields.

    But it is not working.

    I have this field list with full text index on (Site (PK), SiteName, Address, HouseNumber, Zip, City)

    I can search for "Vejle" (danish city) and I get result from columns SiteName and Address, so that works.

    Now I want to search for "Vejle" and "17". I want to find the city "vejle" and all streets with housenumber "17" but the result set is empty. Can any one help me out on this? I think I have tried all search combination's

    Best Regards Martin Nyborg - Perfection is a process, not an end-point.

Viewing 15 posts - 1 through 15 (of 166 total)

You must be logged in to reply to this topic. Login to reply