Full-Text Search – Stoplists in SQL Server

Steve Jones, 2013-01-07 (first published: 2012-12-31)

Full-text search is an interesting subsystem in SQL Server. It allows you to implement searches through a variety of text formats stored in SQL Server. This is a one of a series of posts that looks at different facets of full-text search.

What is a Stoplist?

A stoplist is a list of stopwords that SQL Server should not include in a full-text index. These are words that are seen as not adding any value to the full-text index. We typically see these words as important in language for structure, but not for content. Examples of stopwords are:

the
a
an
is
are

In previous versions of SQL Server, these were also known as noise words and a noise word list. You can read about the topic in Books Online.

How are they used?

Stoplists are used when building the index. The words that are contained in the text, and also in the stoplist are ignored and not populated inside the index. This makes for a smaller index, and it also means that the stopwords are not

The position of these words in the text being indexed, however, still do matter. This is to be sure that searches using proximity are still correctly carried out.

Each index can have a specific stoplist associated with it. You can associate a stoplist at index creation time, or alter the index later to add or change the stoplist.

Creating a Stoplist

For each language supported in the full-text system, there is a stoplist installed with SQL Server. These are the commonly used words that should be ignored for each language.

You can, however, create your own stoplist of word with the CREATE FULLTEXT STOPLIST command. The creation can be for a new stoplist, or you can copy an existing stoplist.This includes system stoplists, which you can use as a basis for your custom stoplist. The commands are simple, and they are well documented in BOL.

To add or remove words from a stoplist, the ALTER FULLTEXT STOPLIST command is used with the ADD or DROP parameters. Alterations to a stoplist must be for a specific language, which is specified with the LCID or name of the language.

Practical Points

The stoplists are important for limiting the size of the stoplist and making a more efficient index. Full-text indexes are very efficient and scalable in SQL Server, but the less data that needs to be indexed and searched, the most efficient the system will operate.

System stoplists works well for many natural language searches, but are not necessarily adequate for domain specific searches. For example, if I were indexing all white papers on SQL Server, I might want to ignore extremely common words or phrases that are in all documents. For example, I might consider “SQL” to be so common as to be useless in searches. Rather than bloat the size of the index with this word, I may add this to a stoplist for the full-text index and assume it’s a word like “the”, which I would not use for searches of these documents.

I haven’t necessarily found a reason to use custom stoplists in the past, but if my full-text index were extremely large or I had a large volume of searches, I might consider using stoplists to prune down my indexes.

If you have used these in your system, I’d be interested in knowing the reasons and effects.

Filed under: Blog Tagged: full text search, sql server, syndicated

Rate

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

Share

Share

Rate

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

Related content

Full-Text Search – Thesaurus

by Steve Jones

SQLServerCentral.com

I would hope that most of us have used a thesaurus at some point in our careers. These allow us...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

2013-01-25 (first published: 2013-01-21)

5,245 reads

Building a Full Text Index

by Steve Jones

SQLServerCentral.com

I hadn’t used full-text indexing in production throughout my career. We hadn’t had the need in the applications I worked...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

2012-12-31 (first published: 2012-12-19)

3,558 reads

Book Review: Big Red - Voyage of a Trident Submarine

by Andy Warren

SQLServerCentral.com

Blogs

I've grown up reading Tom Clancy and probably most of you have at least seen Red October, so this book caught my eye when browsing used books for a recent trip. It's a fairly human look at what's involved in sailing on a Trident missile submarine...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

2009-03-10

1,439 reads

Database Mirroring FAQ: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup?

by Robert Davis

SQLServerCentral.com

Blogs

Question: Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? This question was sent to me via email. My reply follows. Can a 2008 SQL instance be used as the witness for a 2005 database mirroring setup? Databases to be mirrored are currently running on 2005 SQL instances but will be upgraded to 2008 SQL in the near future.

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

2009-02-23

1,567 reads

Inserting Markup into a String with SQL

by Phil Factor

SQLServerCentral.com

T-SQL

In which Phil illustrates an old trick using STUFF to intert a number of substrings from a table into a string, and explains why the technique might speed up your code...

★ ★ ★ ★ ★ ★ ★ ★ ★ ★

Log in or register to rate

You rated this post out of 5. Change rating

2009-02-18

1,631 reads