Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1314151617»»

A Google-like Full Text Search Expand / Collapse
Author
Message
Posted Tuesday, May 17, 2011 7:09 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
mbrading (5/17/2011)
Hi Mike,

Very sorry ... a different bookmark for a different download.

This was an asp script doing a similar conversion job.

Sorry to waste your time.

Regards

Matt


No problem Matt. I'm still interested in seeing the download you're talking about if you have a link to it.

Thanks
Mike C
Post #1110684
Posted Tuesday, May 17, 2011 7:30 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Saturday, February 9, 2013 3:39 PM
Points: 6, Visits: 21
Quite out of date, but *could* have done al lI needed if I'd been able to get it to work on my server.

http://www.15seconds.com/issue/010423.htm

Cheers
Post #1110687
Posted Tuesday, May 17, 2011 8:01 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Been a while since I've done anything with Classic ASP/VB, but it appears the only thing that DLL you were looking at is doing is storing a "table" of noise words. If you can eliminate that and references to it you should be good (or convert it to an array, hash table or other structure) -- there's no need to eliminate noise words in the client/front end since the server has it's own noise word lists for FTS ("stopwords" as of SQL 2008 iFTS).

If you're stuck with Classic ASP, you might still want to look at the code in this article and look through some of the previous comments on this page. One really smart developer posted a message indicating that he's converted it to a SQL CLR function, which might work for you as well -- you could do it all server-side.

Thanks
Mike C
Post #1110695
Posted Tuesday, May 17, 2011 8:23 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Saturday, February 9, 2013 3:39 PM
Points: 6, Visits: 21
Thanks Mike,

Afraid it's all a bit over my head but I was in need of a quick fix to return relevance-ranked results.

I've been using a stand-alone indexing/search app -- textdb -- but that's starting to struggle with the index creation process due to the database being a whole lot bigger than it used to be and my server in need of an upgrade ...

I looked at some commercial solutions and they were even further out of my price range so thought I'd try full text searching ... still over my head but getting there. I've got a 'simple' search working, but seems to be a major trade off between slow or accurate? (10 seconds+ to execute the query using multiple joins, weighting columns etc)

Anyway, once I get the basic version working I'll revisit to checkout the options for 'advanced' searchers.

Cheers

Matt


Post #1110696
Posted Sunday, June 26, 2011 3:52 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, June 26, 2011 6:42 PM
Points: 3, Visits: 5
I was very interested in this article for my current project.

However, I have two concerns about the approach taken. First, it relies on the Irony project, which is a large project capable of many functions not really needed here. Second, I don't really like the way it chokes on syntax errors. I don't think syntax errors would be acceptable on sites like Google.

I ended up writing my own version, very much influenced by this article, which I've posted at http://www.blackbeltcoder.com/Articles/data/easy-full-text-search-queries. My version does not rely on any third-party libraries, and will do the best it can with malformed queries.
Post #1131824
Posted Sunday, June 26, 2011 4:10 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
laptop (6/26/2011)
I was very interested in this article for my current project.

However, I have two concerns about the approach taken. First, it relies on the Irony project, which is a large project capable of many functions not really needed here. Second, I don't really like the way it chokes on syntax errors. I don't think syntax errors would be acceptable on sites like Google.

I ended up writing my own version, very much influenced by this article, which I've posted at http://www.blackbeltcoder.com/Articles/data/easy-full-text-search-queries. My version does not rely on any third-party libraries, and will do the best it can with malformed queries.


Glad you found it useful as a starting point.

The sample provided is not "production-ready"; as you point out I kept it simple for purposes of the article, and that means it doesn't include advanced error-handling necessary in a production scenario. In fact, there are several ways to handle errors in user query strings and the feedback I've received was that different people choose different approaches to error handling.

As for the Irony project, it comes with a lot of code samples that aren't necessary, to be sure (they can be easily removed), but it simplifies creation of LALR parsers and is very efficient.

I just read your article, and I like what you've done with it. I haven't gone through all your source code yet, but at first glance it looks like you chose L-R parsing over LALR? As for the stopwords, in SQL 2008 you can retrieve a list of stopwords (stoplists) from the database and create and modify custom stoplists.

Thanks
Michael
Post #1131828
Posted Sunday, June 26, 2011 5:24 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, June 26, 2011 6:42 PM
Points: 3, Visits: 5
Mike C (6/26/2011)
Glad you found it useful as a starting point.


Hi Michael,

Actually, we've discussed this issue in the past and you were kind enough to send me a copy of your FTS book, which I appreciated.

People do have different ideas about error handling but, for searching a website, I think Google's approach is a good example to look to.

My parser is very simple--as simple as possible to get the job done. It does parse left to right, but uses a simple expression tree to allow parentheses to easily affect operator precedence.

Regarding stop words, the ability to read the current stop list would be a good way to pull those words from the query. However, I still think it would be nice (and more efficient) if I could simply tell SQL Server to do that for me.

Thanks.
Post #1131842
Posted Sunday, June 26, 2011 5:58 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Hi Jonathan,

I recall our conversation, but I can't find our old email exchange (I recently had to restore my system, lost a big chunk of emails). I definitely like what you've done with it; I was originally designing this one as an L-R parser after reviewing the functionality in YACC/LEX, Bison, Gold Parser, etc., solutions. Then I ran across Irony and decided it provided the simplest method of implementing an LALR grammar/lexer/parser. L-R parsing is a little less efficient than LALR, but for a grammar this simple and considering the simplicity of most search strings users will provide, I don't think it's going to be noticeable to any degree.

Another nice thing about the Irony functionality is the grammar is easily extended to encompass more functionality (like recognizing mathematical expressions, etc.) I was working on adding some of that type of functionality at one point, but got sidetracked on other projects.

If you wrapped your function in a SQL CLR function wrapper you could create the query string server-side and use the DMVs/DMFs locally to eliminate stopwords from the query. You could even execute the query locally using the context connection. Might still be less efficient than a more optimized native solution like the one you've requested, but for simple solutions like these the performance difference will probably be negligible.

Another alternative might be to read the entire stoplist from SQL Server in advance and persist it in memory locally. That way you can eliminate the burden of supplying another stoplist to the function - also the issue of keeping it in sync with the stoplist on the server.

Thanks
Michael
Post #1131847
Posted Sunday, June 26, 2011 6:22 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, June 26, 2011 6:42 PM
Points: 3, Visits: 5
Hi Michael,

Your last suggestion sounds the best to me as long as a native option is not forthcoming. The other suggestions sound interesting but would require a bit of research and more work.

I also published some code that evaluates expressions, although I'm not sure if you were talking about incorporating that into FTS. It's not terribly slick but may be interesting if you ever go back to that project.

Cheers!
Post #1131850
Posted Sunday, June 26, 2011 6:34 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Hi Jonathan,

I'll take a look at it. I've actually created an expression evaluator using Irony, but I'm interested in taking a look at how you approached it. One thing I'm very interested in is expression evaluation optimizations. I've built a few in like caching the abstract syntax tree to eliminate multiple parsings of the same expression. I'm looking to add some more features like constant folding and multithreading, but haven't had time to address it fully yet. Maybe I can run some ideas past you and get your opinion?

Thanks
Michael
Post #1131852
« Prev Topic | Next Topic »

Add to briefcase «««1314151617»»

Permissions Expand / Collapse