Thomas Abraham (8/22/2011)
Thanks for the question!
I'm curious about the terminology though. The previously used "noise words" did a good job at conveying the meaning of what was being represented. Seems like "stopwords" is less clear as to purpose. Can anyone enlighten me as to why this terminology was chosen?
I totally agree on the term... Noise words makes sense to me, stop words not so much...
Ditto. Terminology is a bit confusing. I guess it was used to mean "stop creating the full text index when you hit these words" of course the index creation does not actually stop but instead just discards those words that exist in the stoplist. I like noisewords better because I liken it to cleaning out the noise from a recording. For instance if you were transferring an LP to CD you would want to remove as much of the noise from the recording as possible (pops, fuzziness, scratches). This would result in a more optimized recording just as a full text index is more optimized for searches when the noisewords are removed.