A Google-like Full Text Search

  • mbrading (5/17/2011)


    Hi Mike,

    Very sorry ... a different bookmark for a different download.

    This was an asp script doing a similar conversion job.

    Sorry to waste your time.

    Regards

    Matt

    No problem Matt. I'm still interested in seeing the download you're talking about if you have a link to it.

    Thanks

    Mike C

  • Quite out of date, but *could* have done al lI needed if I'd been able to get it to work on my server.

    http://www.15seconds.com/issue/010423.htm

    Cheers

  • Been a while since I've done anything with Classic ASP/VB, but it appears the only thing that DLL you were looking at is doing is storing a "table" of noise words. If you can eliminate that and references to it you should be good (or convert it to an array, hash table or other structure) -- there's no need to eliminate noise words in the client/front end since the server has it's own noise word lists for FTS ("stopwords" as of SQL 2008 iFTS).

    If you're stuck with Classic ASP, you might still want to look at the code in this article and look through some of the previous comments on this page. One really smart developer posted a message indicating that he's converted it to a SQL CLR function, which might work for you as well -- you could do it all server-side.

    Thanks

    Mike C

  • Thanks Mike,

    Afraid it's all a bit over my head but I was in need of a quick fix to return relevance-ranked results.

    I've been using a stand-alone indexing/search app -- textdb -- but that's starting to struggle with the index creation process due to the database being a whole lot bigger than it used to be and my server in need of an upgrade ...

    I looked at some commercial solutions and they were even further out of my price range so thought I'd try full text searching ... still over my head but getting there. I've got a 'simple' search working, but seems to be a major trade off between slow or accurate? (10 seconds+ to execute the query using multiple joins, weighting columns etc)

    Anyway, once I get the basic version working I'll revisit to checkout the options for 'advanced' searchers.

    Cheers

    Matt

  • I was very interested in this article for my current project.

    However, I have two concerns about the approach taken. First, it relies on the Irony project, which is a large project capable of many functions not really needed here. Second, I don't really like the way it chokes on syntax errors. I don't think syntax errors would be acceptable on sites like Google.

    I ended up writing my own version, very much influenced by this article, which I've posted at http://www.blackbeltcoder.com/Articles/data/easy-full-text-search-queries. My version does not rely on any third-party libraries, and will do the best it can with malformed queries.

  • laptop (6/26/2011)


    I was very interested in this article for my current project.

    However, I have two concerns about the approach taken. First, it relies on the Irony project, which is a large project capable of many functions not really needed here. Second, I don't really like the way it chokes on syntax errors. I don't think syntax errors would be acceptable on sites like Google.

    I ended up writing my own version, very much influenced by this article, which I've posted at http://www.blackbeltcoder.com/Articles/data/easy-full-text-search-queries. My version does not rely on any third-party libraries, and will do the best it can with malformed queries.

    Glad you found it useful as a starting point.

    The sample provided is not "production-ready"; as you point out I kept it simple for purposes of the article, and that means it doesn't include advanced error-handling necessary in a production scenario. In fact, there are several ways to handle errors in user query strings and the feedback I've received was that different people choose different approaches to error handling.

    As for the Irony project, it comes with a lot of code samples that aren't necessary, to be sure (they can be easily removed), but it simplifies creation of LALR parsers and is very efficient.

    I just read your article, and I like what you've done with it. I haven't gone through all your source code yet, but at first glance it looks like you chose L-R parsing over LALR? As for the stopwords, in SQL 2008 you can retrieve a list of stopwords (stoplists) from the database and create and modify custom stoplists.

    Thanks

    Michael

  • Mike C (6/26/2011)

    Glad you found it useful as a starting point.

    Hi Michael,

    Actually, we've discussed this issue in the past and you were kind enough to send me a copy of your FTS book, which I appreciated.

    People do have different ideas about error handling but, for searching a website, I think Google's approach is a good example to look to.

    My parser is very simple--as simple as possible to get the job done. It does parse left to right, but uses a simple expression tree to allow parentheses to easily affect operator precedence.

    Regarding stop words, the ability to read the current stop list would be a good way to pull those words from the query. However, I still think it would be nice (and more efficient) if I could simply tell SQL Server to do that for me.

    Thanks.

  • Hi Jonathan,

    I recall our conversation, but I can't find our old email exchange (I recently had to restore my system, lost a big chunk of emails). I definitely like what you've done with it; I was originally designing this one as an L-R parser after reviewing the functionality in YACC/LEX, Bison, Gold Parser, etc., solutions. Then I ran across Irony and decided it provided the simplest method of implementing an LALR grammar/lexer/parser. L-R parsing is a little less efficient than LALR, but for a grammar this simple and considering the simplicity of most search strings users will provide, I don't think it's going to be noticeable to any degree.

    Another nice thing about the Irony functionality is the grammar is easily extended to encompass more functionality (like recognizing mathematical expressions, etc.) I was working on adding some of that type of functionality at one point, but got sidetracked on other projects.

    If you wrapped your function in a SQL CLR function wrapper you could create the query string server-side and use the DMVs/DMFs locally to eliminate stopwords from the query. You could even execute the query locally using the context connection. Might still be less efficient than a more optimized native solution like the one you've requested, but for simple solutions like these the performance difference will probably be negligible.

    Another alternative might be to read the entire stoplist from SQL Server in advance and persist it in memory locally. That way you can eliminate the burden of supplying another stoplist to the function - also the issue of keeping it in sync with the stoplist on the server.

    Thanks

    Michael

  • Hi Michael,

    Your last suggestion sounds the best to me as long as a native option is not forthcoming. The other suggestions sound interesting but would require a bit of research and more work.

    I also published some code that evaluates expressions, although I'm not sure if you were talking about incorporating that into FTS. It's not terribly slick but may be interesting if you ever go back to that project.

    Cheers!

  • Hi Jonathan,

    I'll take a look at it. I've actually created an expression evaluator using Irony, but I'm interested in taking a look at how you approached it. One thing I'm very interested in is expression evaluation optimizations. I've built a few in like caching the abstract syntax tree to eliminate multiple parsings of the same expression. I'm looking to add some more features like constant folding and multithreading, but haven't had time to address it fully yet. Maybe I can run some ideas past you and get your opinion?

    Thanks

    Michael

  • Someone know a unmanaged C++ version of this code?

    Thanks

  • Hello Mike, thanks for the great sample it's really useful.

    I see that it's an old problem that google-like sentence can't start from negative token.

    When trying to convert "-key1 -key2" sentence - exception is thrown.

    As you told few times before in this thread there is a workaround for this.

    Could you please provide some more detailed explanation of this workaround or may be sample code?

    Thanks in advance.

  • Did you find out how to solve this?

  • Repost from irony.codeplex.com http://irony.codeplex.com/discussions/389099

    I have a question about TermType in the SearchGrammar sample. If this is the wrong forum for this, please forgive.

    It seems to me that TermType will always be Inflectional for AND'ed terms, no matter what TermType you hand ConvertQuery()

    Example with FTS TermType=Exact:

    AND Query: classical music

    Fts: ( FORMSOF (INFLECTIONAL, classical) AND FORMSOF (INFLECTIONAL, music) )

    OR Query: classical | music

    Fts: (classical OR music)

    In line 121 of SearchGrammar.cs, the TermType is set explicitly to Inflectional for AND but not for negation or OR.

    Is there a reason for this, am I missing something?

  • EDIT: think i might have just answered my own question! Post removed!

Viewing 15 posts - 151 through 165 (of 166 total)

You must be logged in to reply to this topic. Login to reply