Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234»»

Key Word Searches Expand / Collapse
Author
Message
Posted Monday, February 28, 2011 12:07 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 2:56 PM
Points: 12, Visits: 107
We added a synonyms table that cross referenced common abbreviations, common misspellings, and plural forms. We also built our own table of stop words to strip from the list of keywords before beginning the search.
Post #1070740
Posted Monday, February 28, 2011 1:37 PM
SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, December 20, 2012 1:03 PM
Points: 265, Visits: 589
Useful solution if FTS is not enabled or or is inaccessible for other reasons. Many ISPs in the beginning did not offer FTS, for example.

When designed properly, FTS can be the fastest solution with stemming, stop words, synonyms, etc., that can be made into a sophisticated natural language feature. FTS indexes can be offloaded to faster SSDs or similar fast access devices. Lastly, FTS can be fully automated. The only problem I had was migrating an FTS from SQL 2005 to SQL 2008. Since I had all the scripts, I just recreated that part.
Post #1070803
Posted Tuesday, March 1, 2011 2:21 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Wednesday, August 13, 2014 12:06 PM
Points: 438, Visits: 907
quickdraw (2/28/2011)
#1 This will be faster than full text search for large loads (hundreds of queries per minute) on a db with millions of records/documents to search.
...
Full text search was too slow for a high load environment, and I'd bet money on the fact that SQL Server is the same. You just can't push hundreds of full text searches per minute with a high number of documents. The method here will be faster.

------------------------------------------------
#2 This method doesn't allow wildcards

We dynamically built a SQL query for each keyword. This allows for the wildcards, and for special processing to allow wildcard suffixes. (See #3)

------------------------------------------------
#3 By reversing the kewords in the keywords table, you get a new feature no database has (without a full table scan)

One additional thing I did is to reverse all the keywords in the keywords table. Since Wildcard LIKE searches were allowed on the keywords table (like in full text search), having each keyword spelled backwards allowed a wonderful feature:

Search keywords: "hold her %ly"
[search for cheesy love novels: "hold her gently", "hold her gingerly", "hold her tightly"]


re: #1: We've been using FTS in SS2005 and SS2008 on moderate sized tables (about 4M items) with high concurrency and it performs very well. The key (for us) was to isolate the search results from the rest of query. For example:
SELECT ... FROM A 
JOIN (SELECT key FROM A WHERE CONTAINS( search )) as S on S.key = A.key

re: #2: The search predicate syntax is a bit involved so we normalize the user supplied text before using it as a search expression. For example, 'IRON MAN' becomes '("IRON*" and "MAN*") or "IRONMAN*"' and that meets most of business needs. We also strip out any punctuation and support special prefixes that the user app can send to modify the search behavior (e.g.: '>IRON MAN' becomes "IRON MAN*" which looks for an item with a word starting with IRON and any following word starting with MAN). We also let "power users" enter raw search expression.

re: #3: Our full text indexes are based on indexed views which merge several columns together and also clean up the indexed text.
CREATE VIEW IndexedConsultant WITH SCHEMA_BINDING
as SELECT *, replace(firstName+ ' '+ lastName+ ' '+ roleTitle+ ' '+ contractReviews + placementNotes,'+',' ')) as combinedText
FROM Consultant ...

This improved search behavior and performance for our app since we only needed to search one column and could locate items where the matches are not necessarily from a single column. In the following example any consultant's named Jones who are Engineers will be found along with any consultants who might have been placed at Jones Engineering.
SELECT [KEY] as consultantId FROM CONTAINSTABLE(IndexedConsultant,dbo.fNomalizeSearch('Jones Engineer'))

You could add your reversing logic to the column definition of the indexed text.

All of our full text searches are handled by UDFs with no dynamic SQL required. The results can then be joined back to their base tables and the logic easily called by user app, included in stored procedures, or included in more complex searches. So, I suggest you should take another look at the built-in FTS capabilities because it would be difficult to match it's flexibility, speed, and code-clarity.
Post #1071592
Posted Tuesday, March 1, 2011 3:36 PM


Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, July 30, 2013 8:55 AM
Points: 21, Visits: 96
Thanks antonio. You've given some great info.

Currently I'm using dynamic SQL, but intended to convert it one day. Now I know it can be done. Awesome!!!

Your use of FTS on a calculated field in a view is a new concept to me. Contains can query multiple separate fields or a whole table, but requires more setup. Did you test performace?

Any chance youd share your normailze query function. I do the same but simplistically. All punctuation except numberics and hyphens are converted to space, then split on spaces to join all the terms with AND. My users search for "IT", so I expand it to "information technology" since its a stop word otherwise.





Post #1071644
Posted Tuesday, March 1, 2011 7:52 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 2:56 PM
Points: 12, Visits: 107
@ Antonio,

Just so I can file this away in my brain for later:

What is your max sustained throughput (total users and max queries per minute)? How many CPUs?

Thanks.
Post #1071698
Posted Tuesday, March 1, 2011 7:56 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Wednesday, March 12, 2014 2:56 PM
Points: 12, Visits: 107
@ KermitTheRock,

Indexed views rock! Keep in mind though, you need SS Enterprise to get this feature. Most ISP licences are Standard so unless you paid the big nut for Enterprise you'll have to do something else like a computed column in the table itself.
Post #1071699
Posted Wednesday, March 2, 2011 3:39 PM


SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Thursday, June 12, 2014 6:43 AM
Points: 160, Visits: 266
Actually, Indexed Views ARE available in the Standard edition of SQL Server. You just have to manage them yourself, using explicit query hints.

From http://msdn.microsoft.com/en-us/library/ms187864.aspx:
"Indexed views can be created in any edition of SQL Server 2008. In SQL Server 2008 Enterprise, the query optimizer automatically considers the indexed view. To use an indexed view in all other editions, the NOEXPAND table hint must be used."

Kevin


--
Please upgrade to .sig 2.0
Post #1072254
Posted Thursday, March 3, 2011 1:48 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, June 19, 2014 6:44 PM
Points: 959, Visits: 2,884
sqlnoobie (2/28/2011)
[quote]tfifield (2/28/2011)

I'm still interested to know how you're handling plurals, tradenames, abbreviations, etc.?


sqlnoobie,
I wouldn't use this technique if I had to handle abbreviations, trademarks, etc. This technique works very well for item/product descriptions in a controlled environment. The first case was item descriptions for wine and the second one was for item descriptions for auto parts. None of these had abbreviations or other shortened words.

The users would enter something like 'cabernet 2001' and expect to get all wines from 2001 that had the word cabernet in the descripton. Likewise a user would enter 'brake pads' and expect to see a list of all items with brake pads in the description.
Todd Fifield
Post #1072898
Posted Thursday, March 3, 2011 2:23 PM


Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Tuesday, July 30, 2013 8:55 AM
Points: 21, Visits: 96
Todd,

This is why FTS is hard. splitting some text by whitespace, building a list of "words" and pointers back to the text isn't difficult.

Knowing when and how to group a some characters as a word is hard, detecting mispellings is hard. MSSQL handles the first using its parsing methods, but does not handle the second at all. Perhaps all queries need to be spell checked before going to the index.



Post #1072931
Posted Friday, March 4, 2011 12:59 PM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, June 19, 2014 6:44 PM
Points: 959, Visits: 2,884
KermitTheRock (3/3/2011)
Todd,

This is why FTS is hard. splitting some text by whitespace, building a list of "words" and pointers back to the text isn't difficult.

Knowing when and how to group a some characters as a word is hard, detecting mispellings is hard. MSSQL handles the first using its parsing methods, but does not handle the second at all. Perhaps all queries need to be spell checked before going to the index.


Kermit,
For the applications I was doing this for, misspellings and plural/singular weren't an issue.

Take wine snobs ordering on-line. If they type in blanc (French) they don't want to see blanco (Spanish). If they had a typo, they wouldn't get anything back in some cases and realize that they had misspelled the word and re-enter the search criteria. Since the search is so bloody fast, they haven't complained yet about not finding misspelled words.

Stores ordering auto parts. Some items are very different if plural. They type in brake drum (singular) and that's what they want to see - all items with brake drum in the description, which would be packaged as a single brake drum. This item is also sold as a set under a different item number. They type in brake drums and they see all items sold as a set.

In the auto parts application especially the users were very pleased since the query using the LIKE operator used to take over 30 seconds to complete and they would see both single items and kits if they typed brake drum. Using the technique in the article the average time was 1.5 seconds to return the item list for them to pick from.
Todd Fifield
Post #1073546
« Prev Topic | Next Topic »

Add to briefcase «««1234»»

Permissions Expand / Collapse