SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


At what point do stoplists actually start improving performance ?


At what point do stoplists actually start improving performance ?

Author
Message
isuckatsql
isuckatsql
SSC Eights!
SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)

Group: General Forum Members
Points: 867 Visits: 1110
I have two exactly the same tables with 4.5 Million resumes indexed with FTI, and a clustered unique ID.

One table has the standard Microsoft 154 English stopwords.

The other table has 10000 stopwords.

Prior to running each query, i ran an

UPDATE STATISTICS Tablename WITH FULLSCAN

Then i ran a DBCC Freeproccache.

Then i ran the following query:

set statistics time on
--select count(*) from Profiles91313bb
select count(*) from Profiles
where contains(doccontent, 'resume')

and the results are basically identical, around 1.05 seconds.

Is this because 4.5 Million records, with up to 10000 stoplist words per resume, is just not enough data to cause index bloat ?

Thanks
Erland Sommarskog
Erland Sommarskog
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2093 Visits: 872
I am not sure why you would add 10000 stopwords. Stopwords are common words that are useless to search on. For instance words like "then", "for", "and". If you add a word like "chickenpox" to the stoplist, then users cannot search for "chickenpox".

Putting a word like "résumé" into a stoplist may make sense if you have a recruiting database and about every document is a résumé anyway. But 10000 such words?

Erland Sommarskog, SQL Server MVP, www.sommarskog.se
isuckatsql
isuckatsql
SSC Eights!
SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)

Group: General Forum Members
Points: 867 Visits: 1110
The words that would typically be searched are IT related, such as Java, J2ee, JQuery, Livelink etc...

Since i don't know every new technology being developed, it is easier to remove the words that are unlikely to be searched.

What i find interesting is that even with 10k stopwords over 4.5 Million multiple page resumes, i did not get any improvement in performance !

Thanks for your feedback !
Erland Sommarskog
Erland Sommarskog
SSCrazy
SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)SSCrazy (2.1K reputation)

Group: General Forum Members
Points: 2093 Visits: 872
isuckatsql (9/18/2013)
Since i don't know every new technology being developed, it is easier to remove the words that are unlikely to be searched.


I don't think that this is a very good strategy. It can only lead to reduced benefit of the index. The single user that searches for "caramel" will not find that single résumé that includes that word. But you have only made a miniscule reduction of the index size.

The performance should not really matter whether the stopwords are there or not. First, I don't think many of these 10000 words are frequent enough to reduce the size. But more importantly, an index is an index, which means that the size of the index does not matter much when you seek it, since you follow the B-tree (or whatever organisation a fulltext index has.)

Erland Sommarskog, SQL Server MVP, www.sommarskog.se
isuckatsql
isuckatsql
SSC Eights!
SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)SSC Eights! (867 reputation)

Group: General Forum Members
Points: 867 Visits: 1110
Thanks !
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search