• I have a reason why I might want to do this.

    My company specializes in storing and full-text indexing PDF files. I am, however, a little bit suspect of Microsoft's use of Adobe's iFilter. (It is so encapsulated that there is little to do when something goes wrong). If, however, my program were to use PDFBox to extract the text, that would be great. In that case, I could simply add the file to the database in a varbinary(max) FILESTREAM column and add the text in a varchar(max) field or something.

    Another issue raises itself, however, and that is database size. I currently have a customer who has a single database of scanned PDF files that exceeds 100 gigabytes. I know that this size doesn't count against the 4 GB "Express Edition" database size limitation, but the varchar(max) field does, and for this database to be available under SS2008EE, I would somehow have to circumvent the 4 GB of space taken up by the textual contents of the PDF file. IF SQL were able to take text converted to binary and index it, then I could create a full-text index on that varbinary field.

    So, from BLOB --> varbinary(max) (actual file) --> text contents --> varbinar(max) (text contents)

    Is there a way to make this work?

    Thanks!