RE: Searching From Varbinary Data Type

SSC Veteran

Points: 225

August 28, 2009 at 8:10 am

I understand this to be a limitation of the iFilter setup in general. My assertion is, though, that if I move AWAY from the iFilter setup (or at least use the plain text iFilter, which has been much faster and more accurate in my testing), and use, say, PDFBox for text extraction from a PDF, then that process becomes much more granular and controlled by me. The problem with that solution is that it would require the insertion of that text output into the database for indexing. If, however, I can get that text into a binary format, yet still be able to run the text-only iFilter against it, then I could yet again circumvent the SS2008EE database size limitations.

Am I being clear with my question?

Current setup:

PDF Files (on filesystem) --> Database BLOBs (as varbinary(max) FILESTREAM) --> Adobe iFilter --> gives my a database size of roughly (Total Size of PDF files / 20)

Theorhetical setup:

PDF Files (on filesystem) --> extract text using PDFBox --> Now have related PDF file and text contents --> Database BLOB (1 for each) as varbinary(max) FILESTREAM --> Text iFilter (on TEXT varbinary(max), NOT PDF varbinary(max)) --> results in unknown database size

Basically, I'd like to get around using Adobe's iFilter (I don't mind Microsoft's text iFilter) and still maximize the number of records I can store in a database while still coming in under the 4 GB size limit.

Thanks!