SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Owning the Spelling Suggestion Feature


Owning the Spelling Suggestion Feature

Author
Message
Bill Nicolich
Bill Nicolich
SSC-Addicted
SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)

Group: General Forum Members
Points: 497 Visits: 543
Comments posted to this topic are about the item Owning the Spelling Suggestion Feature

Bill Nicolich: www.SQLFave.com.
Daily tweet of what's new and interesting: AppendNow
tcnolan
tcnolan
SSC Eights!
SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)

Group: General Forum Members
Points: 877 Visits: 164
Not a single reference to SOUNDEX? That doesn't seem right in a spelling suggestion article.
oldgoat
oldgoat
SSChasing Mays
SSChasing Mays (604 reputation)SSChasing Mays (604 reputation)SSChasing Mays (604 reputation)SSChasing Mays (604 reputation)SSChasing Mays (604 reputation)SSChasing Mays (604 reputation)SSChasing Mays (604 reputation)SSChasing Mays (604 reputation)

Group: General Forum Members
Points: 604 Visits: 49
So add it!
SOUNDEX doesn't cover, particularly, the typos angle that Bill seems to have concentrated on. Probably a lot better for the search results, given its sometimes 'strange' output.

Seriously though, if you incorporated this with some Ajax, and used that for dynamic spelling correction, like Word does, it would be very nifty.

I recently built an Ajax menu that works on the very simple concept of 'update as you type'. Given folks' ability for typos and fat-finger, this might just get added to second-guess them.



Andy Warren
Andy Warren
SSC-Dedicated
SSC-Dedicated (30K reputation)SSC-Dedicated (30K reputation)SSC-Dedicated (30K reputation)SSC-Dedicated (30K reputation)SSC-Dedicated (30K reputation)SSC-Dedicated (30K reputation)SSC-Dedicated (30K reputation)SSC-Dedicated (30K reputation)

Group: Moderators
Points: 30307 Visits: 2759
My concern would be the number of round trips that might get generated. Other than that, I've always used a 3rd party spell checker (including the one in Word at times) and while they weren't cheap, they typically aren't hugely expensive either.

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter
Mike Dougherty
Mike Dougherty
Old Hand
Old Hand (380 reputation)Old Hand (380 reputation)Old Hand (380 reputation)Old Hand (380 reputation)Old Hand (380 reputation)Old Hand (380 reputation)Old Hand (380 reputation)Old Hand (380 reputation)

Group: General Forum Members
Points: 380 Visits: 952
We went with the premise that people would rather find what they're looking for than be prompted with spelling corrections, so we wrote the search feature to join the product keywords to the misspellings as synonyms of the correct term, and match on whatever the customer entered. We monitor the misses between requested keywords and those already in the search table, so new misspellings are added from those seen in the wild. The programmatic permutations could be easily incorporated.

BTW, an added benefit of the "synonyms" notion is that we can make the plural form of the keyword a synonym of the singular so the content editors need only associate the singular keyword with the product. Also those keywords that are similar in concept can be aliased together so only one of that concept group needs to be applied to the product to allow any of the group to match. (different rank/weightings can be used to resolve exact matches higher in the results than alias matches)
GeorgeCopeland
GeorgeCopeland
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2124 Visits: 914
I think a hash table would be a much better implementation. It would be hugely faster, and if the word doesn't hash, it would be easy to grab all the words near the hash as suggestions.
Mike C
Mike C
SSCertifiable
SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)

Group: General Forum Members
Points: 7795 Visits: 1172
tnolan (7/21/2008)
Not a single reference to SOUNDEX? That doesn't seem right in a spelling suggestion article.


Soundex is horrible. And the SQL Server implementation doesn't match with the NARA Soundex standard anyway. If you want to do phonetic matching, I'd recommend starting with a better phonetic match algorithm. For spell-checking you'll get better accuracy using LCS, edit distance, or n-gram matching. Set-based operations work well with n-gram matching.
Mike C
Mike C
SSCertifiable
SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)SSCertifiable (7.8K reputation)

Group: General Forum Members
Points: 7795 Visits: 1172
gcopeland (7/21/2008)
I think a hash table would be a much better implementation. It would be hugely faster, and if the word doesn't hash, it would be easy to grab all the words near the hash as suggestions.


A ternary search tree might work even better (although it would be hard to develop in SQL without using CLR). In the TST words that are similar tend to "bunch" together as well. BTW, just wondering but what type of hash function would you use to get similar words to group together? If the function is too simple you're going to pay a penalty in access costs.
Bill Nicolich
Bill Nicolich
SSC-Addicted
SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)SSC-Addicted (497 reputation)

Group: General Forum Members
Points: 497 Visits: 543
Regarding number of round trips... In my implementation of spell suggestion, I use a single stored procedure call which uses a single T-SQL statement using a set-based approach even though the search entry is often multiple words. The search entry goes in, empty or filled spelling suggestion message goes out.

Although the performance seems great, I'd like to take the chance later to go into the specifics of the stored procedure including the performance angle and see if Andy and some of the other performance gurus around here will share their optimization tips.

Thank you everybody for taking the time to read. Let me know if you find other helper table techniques!

Bill

Bill Nicolich: www.SQLFave.com.
Daily tweet of what's new and interesting: AppendNow
tcnolan
tcnolan
SSC Eights!
SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)SSC Eights! (877 reputation)

Group: General Forum Members
Points: 877 Visits: 164
Sorry guys, my soundex suggestion was not meant entirely seriously. The typo function provided here is pretty well done, but I think adding phonetic search would add to its overall use. I also do not like SQL SOUNDEX very much. Wink http://microsoft.apress.com/index.php?id=72 or even a conversion of http://everything2.com/node/459981 could easily be worked in to the code here for phonetic search.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search