Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

At what point do stoplists actually start improving performance ? Expand / Collapse
Author
Message
Posted Monday, September 16, 2013 12:33 PM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Saturday, July 19, 2014 9:15 PM
Points: 537, Visits: 1,110
I have two exactly the same tables with 4.5 Million resumes indexed with FTI, and a clustered unique ID.

One table has the standard Microsoft 154 English stopwords.

The other table has 10000 stopwords.

Prior to running each query, i ran an

UPDATE STATISTICS Tablename WITH FULLSCAN

Then i ran a DBCC Freeproccache.

Then i ran the following query:

set statistics time on
--select count(*) from Profiles91313bb
select count(*) from Profiles
where contains(doccontent, 'resume')

and the results are basically identical, around 1.05 seconds.

Is this because 4.5 Million records, with up to 10000 stoplist words per resume, is just not enough data to cause index bloat ?

Thanks
Post #1495236
Posted Tuesday, September 17, 2013 3:49 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 7:07 AM
Points: 803, Visits: 720
I am not sure why you would add 10000 stopwords. Stopwords are common words that are useless to search on. For instance words like "then", "for", "and". If you add a word like "chickenpox" to the stoplist, then users cannot search for "chickenpox".

Putting a word like "résumé" into a stoplist may make sense if you have a recruiting database and about every document is a résumé anyway. But 10000 such words?


Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1495707
Posted Wednesday, September 18, 2013 5:14 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Saturday, July 19, 2014 9:15 PM
Points: 537, Visits: 1,110
The words that would typically be searched are IT related, such as Java, J2ee, JQuery, Livelink etc...

Since i don't know every new technology being developed, it is easier to remove the words that are unlikely to be searched.

What i find interesting is that even with 10k stopwords over 4.5 Million multiple page resumes, i did not get any improvement in performance !

Thanks for your feedback !
Post #1495852
Posted Wednesday, September 18, 2013 6:30 AM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Today @ 7:07 AM
Points: 803, Visits: 720
isuckatsql (9/18/2013)
Since i don't know every new technology being developed, it is easier to remove the words that are unlikely to be searched.


I don't think that this is a very good strategy. It can only lead to reduced benefit of the index. The single user that searches for "caramel" will not find that single résumé that includes that word. But you have only made a miniscule reduction of the index size.

The performance should not really matter whether the stopwords are there or not. First, I don't think many of these 10000 words are frequent enough to reduce the size. But more importantly, an index is an index, which means that the size of the index does not matter much when you seek it, since you follow the B-tree (or whatever organisation a fulltext index has.)


Erland Sommarskog, SQL Server MVP, www.sommarskog.se
Post #1495888
Posted Wednesday, September 18, 2013 10:48 AM
Mr or Mrs. 500

Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500Mr or Mrs. 500

Group: General Forum Members
Last Login: Saturday, July 19, 2014 9:15 PM
Points: 537, Visits: 1,110
Thanks !
Post #1496021
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse