RE: Key Word Searches – SQLServerCentral

SSC Enthusiast

Points: 151

February 28, 2011 at 11:16 am

#1 This will be faster than full text search for large loads (hundreds of queries per minute) on a db with millions of records/documents to search.

I did this exact thing in Oracle in 1999. We loaded the entire database of U.S. - Books In Print from the Library of Congress and built a search engine on that for a website and a cash register slash inventory management system for small book sellers.

Full text search was too slow for a high load environment, and I'd bet money on the fact that SQL Server is the same. You just can't push hundreds of full text searches per minute with a high number of documents. The method here will be faster.

------------------------------------------------

#2 This method doesn't allow wildcards

We dynamically built a SQL query for each keyword. This allows for the wildcards, and for special processing to allow wildcard suffixes. (See #3)

------------------------------------------------

#3 By reversing the kewords in the keywords table, you get a new feature no database has (without a full table scan)

One additional thing I did is to reverse all the keywords in the keywords table. Since Wildcard LIKE searches were allowed on the keywords table (like in full text search), having each keyword spelled backwards allowed a wonderful feature:

Search keywords: "hold her %ly"

[search for cheesy love novels: "hold her gently", "hold her gingerly", "hold her tightly"]

Normally for a wildcard search to use an index, the wildcard can't be at the begining (ex: giv%, matching give, giving, giver), otherwise a full table scan results on the keyword table. With a keyword table containing hundreds of thousands of rows that slows things down considerably. Reversing the keywords in the keyword table allows the keyword parser to detect wildcards and construct a where clause that includes the keyword reversed:

select <cols> from document where docid in (

select docid from dockeyword where keywordid IN (SELECT keywordid from keyword where keyword LIKE 'hold')

union

select docid from dockeyword where keywordid IN (SELECT keywordid from keyword where keyword LIKE 'her')

union

select docid from dockeyword where keywordid IN (SELECT keywordid from keyword where reversekeyword LIKE 'yl%' --note keyword spelled backwards)

)

There are number of different ways to do the query above. I just presented the easiest to grok. One other way that is interesting is to do a group by:

select <cols> from document d where docid in (

select docid from dockeyword dkw

inner join keyword kw on dkw.keywordid = kw.keywordid

where keyword = 'hold' or keyword = 'her' or reversekeyword like 'yl%'

group by docid

having count(docid) = 3

select