In this weeks blog post we will have a look at how easy it is to combine FileTables and FullText Search. Last week we had a look at some of the basics of the new FileTable feature – if you missed out on last week blog post, you can read it here – FileTable Part 1.
To get the full effect of this powerful combination, it is highly recommended that you install “Microsoft Office 2010 Filter packs”. to check if you have that installed take a look at your installed programs and look after something like this:
if not, you can download and install it from here – Microsoft Filter Pack 2.0. By installing this feature Full Text Search will now have the ability to index the content of Microsoft Word and many other file types. To se a complete list of the file types supported in your setup, run the following query:
select * from sys.fulltext_document_types
After installing the Microsoft filter pack, you should have support for more than 150 document_types.
Ok, let’s get started with the proper demo stuff, I’ll be using the same table structure as I did in Part 1 of this series. I am not going to post the code once again, so you’ll have to go back to Part 1 and get it from there. I start out with an empty table and an empty FileTable folder. Before I do anything else I’ll create a text document with the following text in it: “AGF please buy back Morten Duncan Rasmussen” – I save the document and name it Duncan.txt.
First of all, we need a FullText catalog, I ‘ll create that like this:
CREATE FULLTEXT CATALOG MyFullTextCatalog AS DEFAULT GO
Next step is to create a FullText Index on the FileTable table that we have, the interesting column is the file_stream column, this is where the content of the document is stored. As with any other FullText Index we need a Unique key index to build the index with, this is where a quick lookup in the sys.indexes is handy.
Another thing that you might notice is that I use language 1030, this is the code for Danish word breaker. Change tracking is set to auto and I’m using the system stop list. This is not a detailed FullText blog post, so I’ll not go into further details with that right now.
Here is the code to create the FullText Index:
CREATE FULLTEXT INDEX ON dbo.MyDocumentStore ( name LANGUAGE 1030, file_stream TYPE COLUMN file_type LANGUAGE 1030 ) KEY INDEX PK__MyDocume__5A5B77D54AAAEEEA ON MyFullTextCatalog WITH CHANGE_TRACKING AUTO, STOPLIST = SYSTEM GO
Now everything is configured, and we are ready to write some FullText queries against the documents in our FileTable folder. Here is some examples:
-- My Surname should not be found in the text SELECT stream_id, name, file_stream.GetFileNamespacePath(1,1) as FileLocation FROM dbo.MyDocumentStore WHERE CONTAINS(file_stream, 'Schmidt') -- Geniiius APS or http://www.Geniiius.com/blog/? SELECT stream_id, name, file_stream.GetFileNamespacePath(1,1) as FileLocation FROM dbo.MyDocumentStore WHERE CONTAINS(file_stream, '"Geniiius Aps" OR "http://www.Geniiius.com/blog/"') -- The best team on the planet SELECT stream_id, name, file_stream.GetFileNamespacePath(1,1) as FileLocation FROM dbo.MyDocumentStore WHERE CONTAINS(file_stream, 'AGF')
No surprise that only the last query return some rows. Now let’s do some more testing, and because that Change Tracking is set to auto this should be very easy. I’ll create a Microsoft PowerPoint document, copy it to my FileTable folder and seconds after I should be able to query the content from SQL Server. Let’s try.
This is how my PowerPoint document looks:
Now let’s us try to rerun the three queries again, and wupaa all three of them return results. Why is the fist one returning rows you might ask? The first query return rows because of the meta data on the file, where my name in the auther.
Is this a cool feature or what Happy Searching.