Indexing with Ifilters

  • I've scoured all the posts on this and followed all the advice and examples and still can't seem to get SQL to index pdfs or office documents so I figure I must be missing something really basic!

    I have set up a database table for the documents and checked the various filters are installed and enabled (see code below).

    I know that full text is installed and working as ifI upload a text document via a webpage it indexes fine and a containTable picks up the indexed words. If I do the same with a pdf or word doc then there are no errors, and the fulltext properties say that the document has been added but no index terms appear (using SELECT display_term, column_id, document_count FROM sys.dm_fts_index_keywords (DB_ID('test'), OBJECT_ID('documents'))

    Any help at all greatly appreciated as I'm losing marbles over this!

    Ta,

    Jeff

    /* code so far*/

    /*not sure of the order some of these statements should appear in but have tried various permutations.. clearly not the right one! */

    CREATE TABLE [dbo].[Documents]

    (

    [ID] INT IDENTITY(1000000,1) ,

    [Extension] [VARCHAR] (10) NOT NULL ,

    [Content] [VARBINARY] (MAX) NOT NULL ,

    [FileSize] [INT] NOT NULL ,

    [FileName] [NVARCHAR] (500) NOT NULL ,

    [Stamp] [TIMESTAMP] NOT NULL

    )

    ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    GO

    ALTER TABLE [dbo].[Documents] WITH NOCHECK

    ADD CONSTRAINT [PK_Documents] PRIMARY KEY CLUSTERED ([ID])

    GO

    Exec sp_fulltext_service 'load_os_resources',1

    Exec sp_fulltext_service 'verify_signature',0

    EXEC sp_fulltext_service 'update_languages'

    reconfigure with override

    CREATE FULLTEXT CATALOG testcatalog

    GO

    CREATE FULLTEXT INDEX ON [dbo].[Documents]

    (

    content TYPE COLUMN extension Language 1033

    )

    KEY INDEX pk_documents

    ON testcatalog;

    GO

    if (select DATABASEPROPERTY(DB_NAME(), N'IsFullTextEnabled')) <> 1

    exec sp_fulltext_database N'enable'

    GO

    if not exists (select * from dbo.sysfulltextcatalogs where name = N'Documents')

    BEGIN

    SELECT 'Creating new FT Catalogue'

    exec sp_fulltext_catalog N'Documents', N'create'

    end

    GO

    exec sp_fulltext_table N'[dbo].[Documents]', N'activate'

    GO

    /*

    check adobe filter installed

    EXEC sp_help_fulltext_system_components 'filter' --pdf and doc filters show up paths correct!!

    SELECT * from sys.fulltext_document_types

    */

    EXEC sp_fulltext_service 'restart_all_fdhosts' --tried out of desperation - no luck!

  • ok, slight clarification..

    when running the command:

    SELECT display_term, column_id, document_count FROM sys.dm_fts_index_keywords (DB_ID('test'), OBJECT_ID('documents'))

    There are initially no entries as expected but after the first pdf / doc or whatever file is added there is a single entry - Display Term - 'END OF FILE'.

    When adding a txt or csv file then the display terms get populated as expected.

    ..any ideas?

    Thanks!

  • Are the extensions visible in the output of the following:

    SELECT * FROM sys.fulltext_document_types

    If not, the filters need installing.

  • Hi Howard..

    Yes, the iFilters appear to be installed.. The iFilter for pdfs is C:\Program Files\Adobe\Adobe PDF iFilter 11 for 64-bit platforms\bin\PDFFilter.dll.

    I've added this to the path environment list and also checked that the dll actually exists in that location.. All fine apart frm the fact it doesn't work.

    Thanks for the reply, had almost given up!

    Jeff

  • Not sure if this is useful information or not, but it appears that 'doc' files ARE being indexed.. just nit docX files. Hadn't noticed before as all the previous attempts I used docX files..

    All a bit more to type into Google I guess.

  • Aha..!!

    Just checked the docX filter a bit more carefully (using EXEC sp_help_fulltext_system_components 'filter' ) and it reads:

    C:\Windows\system32\"C:\Program Files\Windows NT\Accessories\WordpadFilter.dll"

    ..instead of the correct path:

    C:\Program Files\Windows NT\Accessories\WordpadFilter.dll

    If I can work out how to change this then I hope the docX problem may be solved.. any ideas!?

    Sadly no such fix for the pdf issue as the paths, version numbers etc are all correct.

  • For anyone stumbling upon this post, the fix for the 'docX' problem is to download the latest Ofice iFilters from here

    Then reload the filters and restart the service..

    EXEC sp_fulltext_service 'load_os_resources',1

    EXEC sp_fulltext_service 'verify_signature',0

    EXEC sp_fulltext_service 'restart_all_fdhosts'

    GO

    All docX (and presumably xlsX etc) now indexing ok..

    Still no pdfs though!

    Grr!

  • After finding this post:

    http://social.technet.microsoft.com/Forums/sqlserver/en-US/e3e09cd5-ff38-4f9b-9724-832f9ba824df/full-text-search-of-pdf-files-in-a-file-table

    I followed the advice and uninstalled Adobe iFilter11 and installed 9 instead.. after updating the path variable (to the iFilter) and a quick restart all is well and pdfs are now indexing fine!

    The post above is from the beginning of April and as yet no fix from Adobe, in fact not even an acknowledgement there may be a compatibility issue.

    If you're having problems I'd recommend installing V.9 and waiting for a new version of 11 or 12 to come out.

    Hope this ends up being useful to someone!!

    Jeff

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply