﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / Article Discussions / Article Discussions by Author / Discuss content posted by Todd Fifield  / Key Word Searches / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Sat, 18 May 2013 13:55:10 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>One reason for doing this instead of built-in Full-Text Search, is that once you have isolated your keywords, you can suggest alternatives when a user has misspelled one. See [url=http://www.sqlservercentral.com/articles/Fuzzy+Match/92822/]my article[/url] on fuzzy-string matching. (I linked this article from the discussion on mine.)</description><pubDate>Tue, 25 Sep 2012 16:29:47 GMT</pubDate><dc:creator>Thomas Keller</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>Another technique to try ...On tables where you have columns that you'd otherwise build a FTS index, instead create a trigger that populates a single table where you string together exactly what you want. I.e. if searching has a set of filter criteria as well as the text, add in 'keywords' for the filters. E.g. you want all people whose name are like 'Fred Smith' that live in California and were born in 1969. You'd create a FTS string like 'Fred Smith zClfrnz zDOB19690501' where zClfrnz and zDOB19690501 are the keywords created by the trigger from attributes that are probably already in memory. When you search your FTS contains clause looks like this: 'Fred AND Smith AND zClfrnz AND zDOB1969*'. This is blisteringly fast.Obviously, this technique falls down when a range search is also required. But not anything like as bad as you'd think. Say you want to search on 'Fred Smith's in California whose age is between 45 and 50. You'd add a column to your new FTS table that inlcuded the DOB, possibly with a specific index or a covering index that has some of the most common range value searches if they are commonly used in conjunction. Another approach where the number of items in a rang is small is simply to string together the range of interest, e.g. 'Fred AND Smith AND zClfrnz AND (zDOB1967* or zDOB1968* or zDOB1969* or zDOB1970* or zDOB1971*)'Another thing that works well is 'joining' FTS searches. In this situation you still have just one FTS table but have triggers on multiple entity tables populating it. I.e. you want to find all sales people whose name is 'Joe Patel' who work in California and who are servicing a 'Fred Smith'. The salesPerson table creates a 'Joe Patel zslsprsnz zClfrnz' and the fred smith row is extended to include this relationship, i.e. red Smith zClfrnz zDOB19690501 zslsprsnzJoeSmith'.At any time to want to add in new search keywords or searchable attributes, you run a dummy update on the source table and the trigger regenerates the search string value. Note you need to get a bit funky finding the existing search value if you're doing updates rather than deletes and inserts.A 2 vCPU, 8GB SQL Server was easily handling 100 AJAX trns per second over a DB with 750k rows in the search table. Two different search application types too. One for building products and one for real estate.The above won't work for all situations but if your searchable data is in database columns it generally works a treat as you can combine the best of relational DB with the best of FTS.</description><pubDate>Sun, 09 Sep 2012 17:19:09 GMT</pubDate><dc:creator>belgarion</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>There is a lot of reinvention of the wheel.  I've worked with full-text in both Oracle (9i, 10g) and SQL Server (2000, 2005, 2008) as well as external, i.e., not-integrated, full-text systems such as AltaVista, etc.Now that SQL Server, in 2008, has an integrated full-text engine similar to what Oracle had in 9i (circa 2000), full-text in SQL Server is pretty robust.  However, it still lacks a lot of Oracle's functionality. We use it for plain text, HTML, and binary (Microsoft Office files, PDFs, etc.). Eliminate the Stop (noise) word filtering as it is so 90's.  I.e., index everything. As for performance and capacity, just look at what Iron Mountain is doing.</description><pubDate>Fri, 07 Sep 2012 13:27:18 GMT</pubDate><dc:creator>Mauve</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>I liked the article. A new technique to add to my toolset. I did play around with it to remove the Temp Tables and at least on my dev server, there was quite a performance increase.[code="sql"]DECLARE @Keywords VARCHAR(8000)SET @Keywords = 'The Red Rider Carbon Action 200 shot range model air rifle' ;SELECT           DISTINCT            P.ProductID           ,P.Name           ,P.Color           ,P.Size           ,P.ListPrice        FROM            -- convert the keywords to a table. Replaces #KW            dbo.DelimitedSplit8KNEW(REPLACE(REPLACE(@Keywords, '/', ' '), ',',                                            ''), ' ') searchterms        LEFT JOIN dbo.KeyWordsToExclude xcl            ON xcl.KeyWord = searchterms.ItemValue        INNER JOIN dbo.ProductKeyWordItems prodkeys            ON [prodkeys].[KeyWord] = searchterms.ItemValue        INNER JOIN [Production].[Product] P            ON P.[ProductID] = prodkeys.[ProductID]        WHERE            xcl.KeyWord IS NULL -- include only records where the keyword search hits on a non-excluded word.[/code]Here are the stats:Table 'KeyWordsToExclude'. Scan count 1, logical reads 79, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'ProductKeyWordItems'. Scan count 11, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Product'. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.SQL Server Execution Times:   CPU time = 0 ms,  elapsed time = 9 ms.SQL Server Execution Times:   CPU time = 0 ms,  elapsed time = 0 ms.</description><pubDate>Fri, 07 Sep 2012 13:02:32 GMT</pubDate><dc:creator>cliffb</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>I liked the article, but if I have to search docments, I do not use SQL (gasp!). It is like cutting wood with a screwdriver and not a saw. Tables are not documetns and do not store bulk data in a format that searches easily, thre is no semantic web support, etc. Lexis, Nexis and West Law ar enot in SQL.  In addition to full text searches, I like KWIC (Keyword in Context) and keyword lists from a limited vocabulary.</description><pubDate>Fri, 07 Sep 2012 11:52:18 GMT</pubDate><dc:creator>CELKO</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>We loaded something like 150 million records from Library of Congress and had a co-located 2 post rack running a quad processor something-or-other with a RAID 5 setup on Oracle 8i. Users ranged into the thousands online at once doing queries. RPS peaked over 50 per second if memory serves (this was 1999).We filtered stop words, and plurals, but not misspellings nor synonyms.Using materialized views is very smart. Be aware of disk space with lots of records. I'm on a project now that has over a billion records in one table. We have to plan very carefully for index growth and maintenance (rebuilds). Adding a materialized view that copies and concatenates text columns (and then builds FTS indexes on that, which might be in addition to FTS indexes on individual columns) could take up considerable space. For 15 million records with an average row length of 65 characters you're talking close to a gigabyte of space for the materialized view. Not a problem for an in-house server, but for a hosted server or a cloud database, that could cost you actual coin per month.My point there isn't to talk you out of materialized views. I love that idea. Just look at the whole picture when designing things, and plan for growth/scale.</description><pubDate>Fri, 07 Sep 2012 09:13:47 GMT</pubDate><dc:creator>quickdraw</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>While the discussion has been about comparing this technique to words and text, it is also a way to handle leading wildcard searches on part numbers.  I have done something like this by building an index of all reasonable suffix substrings of a part number, clipping off one leading character at a time until there's nothing "significant" left.  Then if a user wants to search for, say, '%74%138%', I clip off the leading wild characters and do a LIKE on the suffix index.  (I could further optimize by guessing that '138%' would be a shorter scan than '74%', and applying the original  '%74%138%' pattern to the original part number.)  I quickly get all the matches like 'MC74AC138N', 'SN74LS138D', etc. I don't imagine that FTS would be applicable to this problem.While it sounds like I'd be multiplying the size of my index by the number of characters in a part number, it's really not so bad.  Short part numbers have fewer suffix substrings, and you can add some restrictions if you disallow putting a wildcard character in the middle of certain sequences, like a series of digits.  For about 125,000 part numbers I have less than 600,000 suffixes to part number mappings.</description><pubDate>Fri, 07 Sep 2012 07:29:08 GMT</pubDate><dc:creator>Mark Hill-285393</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>Sharepoint Search service uses SQL Server FTS. Based on results listedhttp://technet.microsoft.com/en-us/library/cc262574(office.12).aspxyou should be able to do 20 requests per second on a 15 million item library with one server.20 RPS is quite a bit, even for a public website. If your getting that many, you probably got a datacenter or two and a team of DBAs</description><pubDate>Wed, 09 Mar 2011 14:18:37 GMT</pubDate><dc:creator>KermitTheRock</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>[quote][b]quickdraw (3/1/2011)[/b][hr]@ Antonio, Just so I can file this away in my brain for later:What is your max sustained throughput (total users and max queries per minute)? How many CPUs?Thanks.[/quote]trying to track just full text queries is tough for me since our app performs a lot of caching.  i'll see if a dba can expose something via tracing.  however, we typically have 150+ users on at any given time and searching is a common task.</description><pubDate>Wed, 09 Mar 2011 13:31:48 GMT</pubDate><dc:creator>antonio.collins</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>[quote][b]KermitTheRock (3/3/2011)[/b][hr]Todd,This is why FTS is hard. splitting some text by whitespace, building a list of "words" and pointers back to the text isn't difficult.Knowing when and how to group a some characters as a word is hard, detecting mispellings is hard. MSSQL handles the first using its parsing methods, but does not handle the second at all. Perhaps all queries need to be spell checked before going to the index. [/quote]Kermit,For the applications I was doing this for, misspellings and plural/singular weren't an issue.Take wine snobs ordering on-line.  If they type in blanc (French) they don't want to see blanco (Spanish).  If they had a typo, they wouldn't get anything back in some cases and realize that they had misspelled the word and re-enter the search criteria.  Since the search is so bloody fast, they haven't complained yet about not finding misspelled words.Stores ordering auto parts.  Some items are very different if plural.  They type in brake drum (singular) and that's what they want to see - all items with brake drum in the description, which would be packaged as a single brake drum.  This item is also sold as a set under a different item number.  They type in brake drums and they see all items sold as a set.In the auto parts application especially the users were very pleased since the query using the LIKE operator used to take over 30 seconds to complete and they would see both single items and kits if they typed brake drum.  Using the technique in the article the average time was 1.5 seconds to return the item list for them to pick from.Todd Fifield</description><pubDate>Fri, 04 Mar 2011 12:59:16 GMT</pubDate><dc:creator>tfifield</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>Todd,This is why FTS is hard. splitting some text by whitespace, building a list of "words" and pointers back to the text isn't difficult.Knowing when and how to group a some characters as a word is hard, detecting mispellings is hard. MSSQL handles the first using its parsing methods, but does not handle the second at all. Perhaps all queries need to be spell checked before going to the index. </description><pubDate>Thu, 03 Mar 2011 14:23:57 GMT</pubDate><dc:creator>KermitTheRock</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>[quote][b]sqlnoobie (2/28/2011)[/b][hr][quote][b]tfifield (2/28/2011)[/b][hr]I'm still interested to know how you're handling plurals, tradenames, abbreviations, etc.?[/quote]sqlnoobie,I wouldn't use this technique if I had to handle abbreviations, trademarks, etc.  This technique works very well for item/product descriptions in a controlled environment.  The first case was item descriptions for wine and the second one was for item descriptions for auto parts.  None of these had abbreviations or other shortened words.The users would enter something like 'cabernet 2001' and expect to get all wines from 2001 that had the word cabernet in the descripton.  Likewise a user would enter 'brake pads' and expect to see a list of all items with brake pads in the description.Todd Fifield</description><pubDate>Thu, 03 Mar 2011 13:48:33 GMT</pubDate><dc:creator>tfifield</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>Actually, Indexed Views ARE available in the Standard edition of SQL Server.  You just have to manage them yourself, using explicit query hints.From [url]http://msdn.microsoft.com/en-us/library/ms187864.aspx[/url]:"Indexed views can be created in any edition of SQL Server 2008. In SQL Server 2008 Enterprise, the query optimizer automatically considers the indexed view. To use an indexed view in all other editions, the NOEXPAND table hint must be used."Kevin</description><pubDate>Wed, 02 Mar 2011 15:39:56 GMT</pubDate><dc:creator>knechod</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>@ KermitTheRock,Indexed views rock! Keep in mind though, you need SS Enterprise to get this feature. Most ISP licences are Standard so unless you paid the big nut for Enterprise you'll have to do something else like a computed column in the table itself. </description><pubDate>Tue, 01 Mar 2011 19:56:00 GMT</pubDate><dc:creator>quickdraw</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>@ Antonio, Just so I can file this away in my brain for later:What is your max sustained throughput (total users and max queries per minute)? How many CPUs?Thanks.</description><pubDate>Tue, 01 Mar 2011 19:52:42 GMT</pubDate><dc:creator>quickdraw</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>Thanks antonio. You've given some great info. Currently I'm using dynamic SQL, but intended to convert it one day. Now I know it can be done. Awesome!!!Your use of FTS on a calculated field in a view is a new concept to me. Contains can query multiple separate fields or a whole table, but requires more setup. Did you test performace?Any chance youd share your normailze query function. I do the same but simplistically. All punctuation except numberics and hyphens are converted to space, then split on spaces to join all the terms with AND. My users search for "IT", so I expand it to "information technology" since its a stop word otherwise.</description><pubDate>Tue, 01 Mar 2011 15:36:06 GMT</pubDate><dc:creator>KermitTheRock</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>[quote][b]quickdraw (2/28/2011)[/b][hr]#1 This will be faster than full text search for large loads (hundreds of queries per minute) on a db with millions of records/documents to search....Full text search was too slow for a high load environment, and I'd bet money on the fact that SQL Server is the same. You just can't push hundreds of full text searches per minute with a high number of documents. The method here will be faster. ------------------------------------------------#2 This method doesn't allow wildcardsWe dynamically built a SQL query for each keyword. This allows for the wildcards, and for special processing to allow wildcard suffixes. (See #3)------------------------------------------------#3 By reversing the kewords in the keywords table, you get a new feature no database has (without a full table scan)One additional thing I did is to reverse all the keywords in the keywords table. Since Wildcard LIKE searches were allowed on the keywords table (like in full text search), having each keyword spelled backwards allowed a wonderful feature:Search keywords: "hold her %ly"[search for cheesy love novels: "hold her gently", "hold her gingerly", "hold her tightly"][/quote]re: #1: We've been using FTS in SS2005 and SS2008 on moderate sized tables (about 4M items) with high concurrency and it performs very well.  The key (for us) was to isolate the search results from the rest of query.  For example:  [code]SELECT ... FROM A    JOIN (SELECT key FROM A WHERE CONTAINS( search )) as S on S.key = A.key[/code]re: #2: The search predicate syntax is a bit involved so we normalize the user supplied text before using it as a search expression.  For example, 'IRON MAN' becomes '("IRON*" and "MAN*") or "IRONMAN*"' and that meets most of business needs.  We also strip out any punctuation and support special prefixes that the user app can send to modify the search behavior (e.g.: '&amp;gt;IRON MAN' becomes "IRON MAN*" which looks for an item with a word starting with IRON and any [i]following word[/i] starting with MAN).  We also let "power users" enter raw search expression.re: #3: Our full text indexes are based on indexed views which merge several columns together and also clean up the indexed text.  [code]CREATE VIEW IndexedConsultant WITH SCHEMA_BINDINGas SELECT *, replace(firstName+ ' '+ lastName+ ' '+ roleTitle+ ' '+ contractReviews + placementNotes,'+',' ')) as combinedText    FROM Consultant ...[/code]This improved search behavior and performance for our app since we only needed to search one column and could locate items where the matches are not necessarily from a single column.  In the following example any consultant's named Jones who are Engineers will be found along with any consultants who might have been placed at Jones Engineering.[code]SELECT [KEY] as consultantId FROM CONTAINSTABLE(IndexedConsultant,dbo.fNomalizeSearch('Jones Engineer'))[/code]You could add your reversing logic to the column definition of the indexed text.All of our full text searches are handled by UDFs with no dynamic SQL required.  The results can then be joined back to their base tables and the logic easily called by user app, included in stored procedures, or included in more complex searches.  So, I suggest you should take another look at the built-in FTS capabilities because it would be difficult to match it's flexibility, speed, and code-clarity.</description><pubDate>Tue, 01 Mar 2011 14:21:12 GMT</pubDate><dc:creator>antonio.collins</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>Useful solution if FTS is not enabled or or is inaccessible for other reasons. Many ISPs in the beginning did not offer FTS, for example. When designed properly, FTS can be the fastest solution with stemming, stop words, synonyms, etc., that can be made into a sophisticated natural language feature. FTS indexes can be offloaded to faster SSDs or similar fast access devices. Lastly, FTS can be fully automated. The only problem I had was migrating an FTS from SQL 2005 to SQL 2008. Since I had all the scripts, I just recreated that part.</description><pubDate>Mon, 28 Feb 2011 13:37:53 GMT</pubDate><dc:creator>sjsubscribe</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>We added a synonyms table that cross referenced common abbreviations, common misspellings, and plural forms. We also built our own table of stop words to strip from the list of keywords before beginning the search.</description><pubDate>Mon, 28 Feb 2011 12:07:03 GMT</pubDate><dc:creator>quickdraw</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>The product I support actually works somewhat similar to the way the article says to do it.  One of the major difference is that the list of excluded words is coded into the proc, not stored in a table.  Another is that we do a sequential check on the keywords.  IE - Get a list of items with keyword 1 then compare to a list of items with keyword2 etc.  It'll be interesting to see what kind of performance bump happens from doing the check in a set based manner instead of procedurally.  We do have a benefit in that the list of terms is delivered to us on a quarterly basis so we get updates already split out so no work is required for us to do that.</description><pubDate>Mon, 28 Feb 2011 11:41:36 GMT</pubDate><dc:creator>cfradenburg</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>[quote][b]quickdraw (2/28/2011)[/b][hr]#3 By reversing the kewords in the keywords table, you get a new feature no database has (without a full table scan)One additional thing I did is to reverse all the keywords in the keywords table. Since Wildcard LIKE searches were allowed on the keywords table (like in full text search), having each keyword spelled backwards allowed a wonderful feature:...Normally for a wildcard search to use an index, the wildcard can't be at the begining (ex: giv%, matching give, giving, giver), otherwise a full table scan results on the keyword table. With a keyword table containing hundreds of thousands of rows that slows things down considerably. Reversing the keywords in the keyword table allows the keyword parser to detect wildcards and construct a where clause that includes the keyword reversed:select &amp;lt;cols&amp;gt; from document where docid in (select docid from dockeyword where keywordid IN (SELECT keywordid from keyword where keyword LIKE 'hold')unionselect docid from dockeyword where keywordid IN (SELECT keywordid from keyword where keyword LIKE 'her')unionselect docid from dockeyword where keywordid IN (SELECT keywordid from keyword where reversekeyword LIKE 'yl%' --note keyword spelled backwards))[/quote]That is a great suggestion, reversing keywords. Nice.</description><pubDate>Mon, 28 Feb 2011 11:34:23 GMT</pubDate><dc:creator>Koen Verbeeck</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>[quote][b]tfifield (2/28/2011)[/b][hr]Thanks everyone for your comments.  I should have mentioned in the article about Full Text searches in SQL Server.As one poster mentioned, it's kind of a pain in the neck to start with.  It's a can of worms that doesn't necessarily need to be opened for just one simple application where you know exactly the type of user input you're going to get  - i.e. very structured user input.  In 12 years of programming SQL Server I have only had 1 client that even had Full Text enabled.  This auto parts dealer is really huge and the didn't have it.It certainly won't handle every kind of search and wouldn't be appropriate for many applications.  For one, if the data has a lot of noise punctuation and you don't really know what all is there so you can't predict it.As I mentioned in the article, this is for item/product type descriptions.  This would be in a very controlled environment where most everything is known about the noise words ahead of time.In any case, I'm grateful for the discussion.Todd Fifield[/quote]I'm still interested to know how you're handling plurals, tradenames, abbreviations, etc.?</description><pubDate>Mon, 28 Feb 2011 11:33:55 GMT</pubDate><dc:creator>sqlnoobie</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>Thanks everyone for your comments.  I should have mentioned in the article about Full Text searches in SQL Server.As one poster mentioned, it's kind of a pain in the neck to start with.  It's a can of worms that doesn't necessarily need to be opened for just one simple application where you know exactly the type of user input you're going to get  - i.e. very structured user input.  In 12 years of programming SQL Server I have only had 1 client that even had Full Text enabled.  This auto parts dealer is really huge and the didn't have it.This technique certainly won't handle every kind of search and wouldn't be appropriate for many applications.  For one, if the data has a lot of noise punctuation and you don't really know what all is there so you can't predict it.As I mentioned in the article, this is for item/product type descriptions.  This would be in a very controlled environment where most everything is known about the noise words ahead of time.In any case, I'm grateful for the discussion.Todd Fifield</description><pubDate>Mon, 28 Feb 2011 11:28:48 GMT</pubDate><dc:creator>tfifield</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>#1 This will be faster than full text search for large loads (hundreds of queries per minute) on a db with millions of records/documents to search.I did this exact thing in Oracle in 1999. We loaded the entire database of U.S. - Books In Print from the Library of Congress and built a search engine on that for a website and a cash register slash inventory management system for small book sellers.Full text search was too slow for a high load environment, and I'd bet money on the fact that SQL Server is the same. You just can't push hundreds of full text searches per minute with a high number of documents. The method here will be faster. ------------------------------------------------#2 This method doesn't allow wildcardsWe dynamically built a SQL query for each keyword. This allows for the wildcards, and for special processing to allow wildcard suffixes. (See #3)------------------------------------------------#3 By reversing the kewords in the keywords table, you get a new feature no database has (without a full table scan)One additional thing I did is to reverse all the keywords in the keywords table. Since Wildcard LIKE searches were allowed on the keywords table (like in full text search), having each keyword spelled backwards allowed a wonderful feature:Search keywords: "hold her %ly"[search for cheesy love novels: "hold her gently", "hold her gingerly", "hold her tightly"]Normally for a wildcard search to use an index, the wildcard can't be at the begining (ex: giv%, matching give, giving, giver), otherwise a full table scan results on the keyword table. With a keyword table containing hundreds of thousands of rows that slows things down considerably. Reversing the keywords in the keyword table allows the keyword parser to detect wildcards and construct a where clause that includes the keyword reversed:select &amp;lt;cols&amp;gt; from document where docid in (select docid from dockeyword where keywordid IN (SELECT keywordid from keyword where keyword LIKE 'hold')unionselect docid from dockeyword where keywordid IN (SELECT keywordid from keyword where keyword LIKE 'her')unionselect docid from dockeyword where keywordid IN (SELECT keywordid from keyword where reversekeyword LIKE 'yl%' --note keyword spelled backwards))There are number of different ways to do the query above. I just presented the easiest to grok. One other way that is interesting is to do a group by:select &amp;lt;cols&amp;gt; from document d where docid in (select docid from dockeyword dkw inner join keyword kw on dkw.keywordid = kw.keywordid where keyword = 'hold' or keyword = 'her' or reversekeyword like 'yl%'group by docidhaving count(docid) = 3select </description><pubDate>Mon, 28 Feb 2011 11:16:27 GMT</pubDate><dc:creator>quickdraw</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>about 400GB or so for 30 days of dataevery day i have vb scripts dump security and application log data from domain controllers, sql and other servers into a central database. there are three tables with an average of 100 million rows for each one. i tried having one table per server but it was a major PITA with joins so i use one table for DC's, one for SQL servers and one for everything else. every morning there are SSRS reports that get fired off to people with app log errors, locked accounts, any data about anyone doing any kind of account manipulation in AD, etc. </description><pubDate>Mon, 28 Feb 2011 09:16:14 GMT</pubDate><dc:creator>alen teplitsky</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>[quote][b]alen teplitsky (2/28/2011)[/b][hr]thxgoing to have to play with this. i have a huge database on a SQL 2005 server with full text indexing enabled and it can really be a PITA a lot of times. sometimes simple searches are extremely slow for some reason and inserting large amounts of data into a FT enabled table is a major FAIL at least with 2005. i have to disable FT, run my inserts and then enable it again. and maintaining the FTI is also a PITA as i've had a few corruptions due to the backup running during the FT maintenance times[/quote]Just curious, how big is "huge"? I haven't had any of the above problems on 2005/8.</description><pubDate>Mon, 28 Feb 2011 09:10:24 GMT</pubDate><dc:creator>sqlnoobie</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>thxgoing to have to play with this. i have a huge database on a SQL 2005 server with full text indexing enabled and it can really be a PITA a lot of times. sometimes simple searches are extremely slow for some reason and inserting large amounts of data into a FT enabled table is a major FAIL at least with 2005. i have to disable FT, run my inserts and then enable it again. and maintaining the FTI is also a PITA as i've had a few corruptions due to the backup running during the FT maintenance times</description><pubDate>Mon, 28 Feb 2011 08:56:17 GMT</pubDate><dc:creator>alen teplitsky</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>[quote][b]KermitTheRock (2/28/2011)[/b][hr]Actually, [url=http://www.sqlservercentral.com/Forums/Topic1052731-391-1.aspx]I wrote[/url] of this just a week ago. The Contains function is often not sufficient for end user search. It doesn't handle misspellings, suggested search, implied "AND", and crashes on use of a noise word.  FTS is not simply an inverted index of words, but is dependent on the query pipelining that happens before the index is accessed. [/quote]Hmmm...well I've worked some time on getting it to work reasonably well for a somewhat specialized database with lots of weird product names like "M&amp;Ms". Not sure I understood what the consensus was on the previous thread, that "only google has the resources to do it", or "we only will do it when we're paid to do it"?</description><pubDate>Mon, 28 Feb 2011 08:43:47 GMT</pubDate><dc:creator>sqlnoobie</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>Actually, [url=http://www.sqlservercentral.com/Forums/Topic1052731-391-1.aspx]I wrote[/url] of this just a week ago. The Contains function is often not sufficient for end user search. It doesn't handle misspellings, suggested search, implied "AND", and crashes on use of a noise word.  FTS is not simply an inverted index of words, but is dependent on the query pipelining that happens before the index is accessed. </description><pubDate>Mon, 28 Feb 2011 08:11:18 GMT</pubDate><dc:creator>KermitTheRock</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>I've been using FTS on a view composed of several tables, and this method works pretty well, as I can return ranked results using "contains table". Not sure if the method above allows for ranked results?Also, you don't mention what happens if someone searches for "Hex Nuts" as opposed to "Hex Nut"...how do you account for that? I've resorted to stemming the search terms when the last characters are "s", "es", etc.Does anyone have any links for "predictive" search solutions in MSSQL, to catch misspellings, etc?</description><pubDate>Mon, 28 Feb 2011 07:47:31 GMT</pubDate><dc:creator>sqlnoobie</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>Or in a lower, cheaper edition.</description><pubDate>Mon, 28 Feb 2011 07:28:51 GMT</pubDate><dc:creator>AndrewJacksonZA</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>I was going to post that the result table and the related methods for this code article are nearly identical to what full-text-search (FTS) does but I see that folks beat me to it. :-)I've not worked with FTS before so I don't know if this article brings anything extra to the table (yet) and I haven't tested the code in this article (yet) BUT... the article appears to be a reasonable study of how FTS may work behind the scenes to build the "index table" and how it uses special tables for "noise words", etc.Why is that good?  Heh... because SQL Server isn't the only database in town.  :-D  This article shows how it could be done either in other database engines which may not have an FTS capability or in a "custom database".</description><pubDate>Mon, 28 Feb 2011 06:34:53 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>[quote][b]AndrewJacksonZA (2/28/2011)[/b][hr]HiPlease include a link to where Jeff Moden posted his function.[b][u][url=http://www.google.com/search?q=jeff%20moden%20DelimitedSplit8KNEW]This search[/url][/u][/b] turned up [u][b][url=http://www.sqlservercentral.com/Forums/FindPost1063955.aspx]this result[/url][/b][/u].Is that where you got the function from?ThanksAndrew[/quote]That's likely it.  I've only posted it a couple of times so far.  As I said in the post, I'm still doing some testing on it for a couple of things including a performance curve in different environments.</description><pubDate>Mon, 28 Feb 2011 06:18:59 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>HiPlease include a link to where Jeff Moden posted his function.[b][u][url=http://www.google.com/search?q=jeff%20moden%20DelimitedSplit8KNEW]This search[/url][/u][/b] turned up [u][b][url=http://www.sqlservercentral.com/Forums/FindPost1063955.aspx]this result[/url][/b][/u].Is that where you got the function from?ThanksAndrew</description><pubDate>Mon, 28 Feb 2011 03:32:21 GMT</pubDate><dc:creator>AndrewJacksonZA</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>All this have sense if your are going to search in non English text, or in the text with mixed language's words, and whant to support your own search rules: for example if you need to support synonyms: Baltic, Boltic, Boltik, Baltijske..</description><pubDate>Mon, 28 Feb 2011 03:04:57 GMT</pubDate><dc:creator>roman.pokrovskij</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>[quote][b]KermitTheRock (2/28/2011)[/b][hr]Does this make sense. Why would you not use the built in full text search. Its good enough to power sharepoints document search. its built in from 2005 on and can easily index any text field.[/quote]My thoughts exactly, why re-invent the wheel if full text search is readily available?</description><pubDate>Mon, 28 Feb 2011 01:38:04 GMT</pubDate><dc:creator>Koen Verbeeck</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>I agree with Kermit--does this technique offer any benefits over just using full-text search? You can even get SQL Express with full-text search capabilities now!</description><pubDate>Mon, 28 Feb 2011 01:37:05 GMT</pubDate><dc:creator>paul.knibbs</dc:creator></item><item><title>RE: Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>Does this make sense. Why would you not use the built in full text search. Its good enough to power sharepoints document search. its built in from 2005 on and can easily index any text field.</description><pubDate>Mon, 28 Feb 2011 00:15:58 GMT</pubDate><dc:creator>KermitTheRock</dc:creator></item><item><title>Key Word Searches</title><link>http://www.sqlservercentral.com/Forums/Topic1070308-2740-1.aspx</link><description>Comments posted to this topic are about the item [B]&lt;A HREF="/articles/String+Manipulation/72540/"&gt;Key Word Searches&lt;/A&gt;[/B]</description><pubDate>Sun, 27 Feb 2011 21:44:51 GMT</pubDate><dc:creator>tfifield</dc:creator></item></channel></rss>