Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Implementing FULL-TEXT Search in a Google fashion! Expand / Collapse
Author
Message
Posted Saturday, March 29, 2008 5:49 AM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, April 9, 2008 7:03 AM
Points: 2, Visits: 3
OK, my first post on a subject I startd learning a couple days ago! SQL2K5 Full Text Search!

Right....creating the FTS indexes and catalogs was a no-brainer! I even have a SP that I use to query the indexed table. So my question...

What is the best way to search, using FTS, for multiple words?

For example:

In my table I have the columns Title, Description and Detail. If I run...

SELECT Title, Description, Detail
FROM MyTable
WHERE FREETEXT (*, 'computer programming')
GO

it returns (what I think) are all of the records I need. Is this the best way?

I'm not really concerned about server resources for the moment. The SQL machine is seriously under-tasked. I'm more concerned about retreiving everything which contains the search text, in all it's combinations.

TIA


...the man in black fled across the desert...and the gunslinger followed...
Post #476575
Posted Monday, March 31, 2008 5:20 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, July 10, 2014 6:56 AM
Points: 1,176, Visits: 878
Depends. Google does a lot more than phrase matching and ranking.
See also CONTAINS predicate and ISABOUT term.
Post #476884
Posted Tuesday, April 1, 2008 3:11 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 5:34 AM
Points: 2,859, Visits: 3,187
Getting the most out of Full Text can be like walking in a wilderness without a map, compass or satnav. There are some details in BOL, and Google can find some good articles, but the more you know about Full Text the more you realise ow little good advice exists.

A few tips...

1) Work out what queries you need in a query window before you put anything into a SP or CLR code. Your first, second, third, etc attempts are unlikely to get the best out of Full Text.
2) Using CONTAINSTABLE allows you to fine tune your results more than the other access methods. If a simple FREETEXT or CONTAINS does not give you what you need then you need CONTAINSTABLE and probably the ISABOUT clause with weightings.
3) The documentation does not make it clear that ISABOUT('value1', 'value2') really means ISABOUT('value1') OR ISABOUT('value2'). If a match is found with 'value1' then do not expect FTS to also search for a match on 'value2'.
4) If you apply weightings in ISABOUT, the best results are given if the order of the weighting is descending. e.g. if you have a series: ISABOUT('value1' WEIGHT (0.001)) AND ISABOUT('value2' WEIGHT (0.9)) AND ISABOUT('value3' WEIGHT (0.5)) will give a different result to the final rank than if the weights were given in descending order 0.9, 0.5, 0.001. Giving the weights in descending order helps give a higher rank to the clause with the heighest weight.
5) To get good performance, you need to minimise the number of Full Text accesses you make in a single T-SQL query. Aim to get everything you need in a single FREETEXT or CONTAINSTABLE clause.
6) To get good performance, you need to avoid filtering the results from a FTS search by joins to other tables. e.g. SELECT model FROM CONTAINSTABLE(trucks, colour, FORMSOF('red')) as ft JOIN trucks on ft.[KEY] = trucks.key WHERE trucks.wheels = 4 will first select all types of truck where the colour is a shade of red, then filter the list to just give you 4-wheeled trucks that are a shade of red. This filtering can take a long time to complete. You need to look at how you can search for both shades of red and 4 wheels in a single CONTAINSTABLE so FTS only gives back the rows you really need.

Finally, FTS can give fantastic results, but it can take a while to work out how to use it most effectively.


Original author: SQL Server FineBuild 1-click install and best practice configuration of SQL Server 2014, 2012, 2008 R2, 2008 and 2005. 28 July 2014: now over 30,000 downloads.
Disclaimer: All information provided is a personal opinion that may not match reality.
Concept: "Pizza Apartheid" - the discrimination that separates those who earn enough in one day to buy a pizza if they want one, from those who can not.
Post #477473
Posted Tuesday, April 1, 2008 7:27 AM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, July 24, 2014 11:03 AM
Points: 1,182, Visits: 1,970
Read Hilary Cotter's articles first before just "jumping in".
http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features/

Be very careful as to what features you're looking for and how you implement them. You will never be able to get to 100% of what Google does.

I've also worked extensively with Oracle's full-text feature, which has more features and functionality (along with better performance) than SQL Server. Although SQL Server 2008 is reportedly better (per Microsoft) -- we'll see.



(PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.
Post #477621
Posted Tuesday, April 1, 2008 7:31 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Today @ 5:34 AM
Points: 2,859, Visits: 3,187
I agree, Hilary Cotter is one of the few sources of good FTS advice on the web.

Original author: SQL Server FineBuild 1-click install and best practice configuration of SQL Server 2014, 2012, 2008 R2, 2008 and 2005. 28 July 2014: now over 30,000 downloads.
Disclaimer: All information provided is a personal opinion that may not match reality.
Concept: "Pizza Apartheid" - the discrimination that separates those who earn enough in one day to buy a pizza if they want one, from those who can not.
Post #477624
Posted Tuesday, April 1, 2008 7:53 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, May 30, 2014 6:27 PM
Points: 2,808, Visits: 7,175
Also be aware of the noise word list and how this affects the search results when searching for text containing common words.

I also agree that producing the sort of results that Google search has will be very difficult to achieve, especially using SQL exclusively.


Post #477643
Posted Wednesday, April 2, 2008 7:13 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:58 AM
Points: 1, Visits: 28
to make it kind of like google...

distinct lowercase words in one field
I put all "distinct" lowercase words I'm interested in 1 field and build a fulltext index on that. I do this because I don't like inflated ranks based on repeated words or caps. Note: I also modify the noise words because a lot of noise words are important for my searches.

add "filters" into the fulltext field with all the rest of the words
say you only want to get things that are in-stock, add "instock" to the words you're fulltext indexing and then add "instock and..." to your containstable search clause; this performs a lot better than post-results joining and filtering on another table's instock bit. note: i add it first in the field, so I can easily search for it with a like statement as well; "like" performs decent when you're searching from the start, or the end, but not the middle (e.g. TheText LIKE 'instock%' is good).

containstable - and containstable with formsof(inflectional,"word") syntax
I use containstable for more precise matches; if containstable doesn't work straight up, or I need more results, I use contains table with inflectional formsof syntax around appropriate words. freetexttable just performs like crap and returns weird results if you ask me (it uses OR between every word and formsof() variations for everything; it's just too slow).

if variations of containstable yield nothing, use "grams" and "sounds like" strategy.
add another field to your fulltext indexed table that is all n-grams of your words. let's say they are sizes of 3 characters each; here's an example:
"michael crichton"
becomes
"_mi mic ich cha hto ton on_ n_c _cr cri ric ich cht..."
if you want, you can also add the SOUNDEX codes, or even the double-metaphone codes into your grams and call it GramsAndCodes. then take grams and sounds like codes and search against the fulltext grams and sounds like codes with freetexttable; it doesn't perform well, but will get you fuzzy results that are pretty decent. with some text similarity code analysis of the results, often you can predict what the user meant to type and do a "did you mean ???"

use top n rank in clause appropriately
using the top n rank setting can speed things up, but test the results, sometimes I've noticed less than predictable results with using top in ranks with freetexttable. There's also some pre-compute rank setting that you can try to speed up freetext searches, but it returns crappy results shortly after a master-merge ages.

other performance things
if you want to get crazy, you can put the fulltext index on it's own drive, and even limit sql server's ram consumption to ensure fulltext has room to operate.

disclaimer
this works for me, and I combine it with some client side caching to make things faster, but I have to mention that I'm not fulltext indexing a massive amount of text here, so what I do might not work that great for bigger text loads. also, so you know, I use the SQL CLR to make function like GetDistinctWords() and GetGrams() and GetSoundExCodes() and GetDoubleMetaphoneCodes()

Lucene.NET
another option is Lucene.NET, and it's free. if your data you want to fulltext search is in a database, I'd stick with SQL FullText though, you just can beat the load speed and automatic updating for sql text data.
Post #478448
Posted Wednesday, April 9, 2008 7:15 AM


Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, April 9, 2008 7:03 AM
Points: 2, Visits: 3

Fantastic advise folks! :D

Thanks a million. Definately have enough to REALLY do dome damage!

Noise words are causing some minor issues. Seems some people are searching for IT (Information Technology) and it's getting ignored as "it"!!! Nifty!

Also getting some weird results on fields containing XML data. it's correctly returning the searched values, but the XML markup (which should remain invisible to users) sometimes appears after a FTS! Super-Nifty!!

I've already streamlined the SP to remove all joins. I'm now loading "related" data as an array and attaching what's needed in the code instead of in SQL

Thanks again for the resource links everyone. Great help!!


...the man in black fled across the desert...and the gunslinger followed...
Post #482282
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse