• You should be able to full-text index varbinary(max) data that is pure "text". Just specify a value of "txt" for the companion file type column.

    I don't know if the documents you are trying to index are all of a single language/culture (e.g., "en-US") or multi-language. When you extract out the "text" from a document and then index it as plain text, you will lose any language-specific tokenization as the language "meta" tag will be lost. In other words, you cannot have a table containing rows of plain text in mixed languages indexed under the language-specific tokenization rules for the language of the row. This is a very bad SQL Server limitation. BTW, in Oracle, you can as you specify a column that contains the 3-character ISO language code. Granted, you can have separate columns for each language, but that is an ugly mess.

    As for space savings, why not also store the extracted text column of varbinary(max) as FILESTREAM?


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]