Optimizing Full Text Search

  • I have a rather complex database for which I want to provide a global search option in a SQL Server 2008 database.

    Basically I'd like to know what the best practices are for this.

    I have tried two different approaches to indexing the tables

    1. Index every nvarchar column

    2. Create two extra columns for each indexed table ([IndexedContent] varbinary(MAX), [IndexedType] nvarchar(128)) - the former contains a concatenated string of all nvarchar columns, the latter contains '.html'. Then indexing [IndexedContent] with [IndexedType] as type column.

    The search queries (stored procedures) look like this:

    1. Indexing every nvarchar column

    SELECT

    'Client' AS [Type], [Clients].[ClientID] AS [ID], NULL AS [ParentID], [Clients].[Number],

    ([Clients].[GivenName] + ' ' + [Clients].[FamilyName]) AS [Name]

    FROM

    [Clients]

    LEFT OUTER JOIN [Patients] ON [Clients].[ClientID] = [Patients].[ClientID]

    WHERE

    @Clients = 1 AND

    (FREETEXT([Clients].*, @Query) OR FREETEXT([Patients].*, @Query))

    UNION

    SELECT

    'Patient' AS [Type], [Patients].[PatientID] AS [ID], [Patients].[ClientID] AS [ParentID], [Patients].[Number],

    ([Patients].[Name] + ' ' + CASE WHEN [Breeds].[Name] IS NOT NULL THEN '(' + [Breeds].[Name] +')' ELSE CASE WHEN [Species].[Name] IS NOT NULL THEN '(' + [Species].[Name] + ')' ELSE '' END END) AS [Name]

    FROM

    [Patients]

    LEFT OUTER JOIN [Clients] ON [Patients].[ClientID] = [Clients].[ClientID]

    LEFT OUTER JOIN [Species] ON [Patients].[SpeciesID] = [Species].[SpeciesID]

    LEFT OUTER JOIN [Breeds] ON [Patients].[BreedID] = [Breeds].[BreedID]

    WHERE

    @Patients = 1 AND

    (FREETEXT([Patients].*, @Query) OR FREETEXT([Breeds].*, @Query) OR FREETEXT([Clients].*, @Query))

    UNION

    SELECT

    'Referal' AS [Type], [Referals].[ReferalID] AS [ID], NULL AS [ParentID], NULL AS [Number],

    ([Referals].[Name] + CASE WHEN [Referals].[Contact] IS NOT NULL THEN ' (' + [Referals].[Contact] + ')' ELSE '' END) AS [Name]

    FROM

    [Referals]

    WHERE

    @Referals = 1 AND FREETEXT([Referals].*, @Query)

    UNION

    SELECT

    'User' AS [Type], [UserID] AS [ID], NULL AS [ParentID], NULL AS [Number],

    ([GivenName] + ' ' + [FamilyName]) AS [Name]

    FROM

    [Users]

    WHERE

    @Users = 1 AND FREETEXT([Users].*, @Query)

    ORDER BY

    [Name]

    2. Indexing one varbinary(MAX) column

    SELECT

    'Client' AS [Type], [Clients].[ClientID] AS [ID], NULL AS [ParentID], [Clients].[Number],

    ([Clients].[GivenName] + ' ' + [Clients].[FamilyName]) AS [Name]

    FROM

    [Clients]

    LEFT OUTER JOIN [Patients] ON [Clients].[ClientID] = [Patients].[ClientID]

    WHERE

    @Clients = 1 AND

    (FREETEXT([Clients].[IndexedContent], @Query) OR FREETEXT([Patients].[IndexedContent], @Query))

    UNION

    SELECT

    'Patient' AS [Type], [Patients].[PatientID] AS [ID], [Patients].[ClientID] AS [ParentID], [Patients].[Number],

    ([Patients].[Name] + ' ' + CASE WHEN [Breeds].[Name] IS NOT NULL THEN '(' + [Breeds].[Name] +')' ELSE CASE WHEN [Species].[Name] IS NOT NULL THEN '(' + [Species].[Name] + ')' ELSE '' END END) AS [Name]

    FROM

    [Patients]

    LEFT OUTER JOIN [Clients] ON [Patients].[ClientID] = [Clients].[ClientID]

    LEFT OUTER JOIN [Species] ON [Patients].[SpeciesID] = [Species].[SpeciesID]

    LEFT OUTER JOIN [Breeds] ON [Patients].[BreedID] = [Breeds].[BreedID]

    WHERE

    @Patients = 1 AND

    (FREETEXT([Patients].[IndexedContent], @Query) OR FREETEXT([Clients].[IndexedContent], @Query))

    UNION

    SELECT

    'Referal' AS [Type], [Referals].[ReferalID] AS [ID], NULL AS [ParentID], NULL AS [Number],

    ([Referals].[Name] + CASE WHEN [Referals].[Contact] IS NOT NULL THEN ' (' + [Referals].[Contact] + ')' ELSE '' END) AS [Name]

    FROM

    [Referals]

    WHERE

    @Referals = 1 AND FREETEXT([Referals].[IndexedContent], @Query)

    UNION

    SELECT

    'User' AS [Type], [UserID] AS [ID], NULL AS [ParentID], NULL AS [Number],

    ([GivenName] + ' ' + [FamilyName]) AS [Name]

    FROM

    [Users]

    WHERE

    @Users = 1 AND FREETEXT([Users].[IndexedContent], @Query)

    ORDER BY

    [Name]

    The perfomance for 2 is marginally faster than 1 in some cases.

    The tables contain up to 50,000 rows each but I will be implementing search on another table [Journal] which will contain maybe up to 1,000,000 rows.

    Presently a search takes anywhere between 15 seconds and a full minute on my development machine (Windows 7 Ultimate 64-bit, SQL Server 2008 SP1 64-bit, 2,33 GHz Core 2 Duo, 4 GB RAM).

    I'm currently looking to optimize the database and the queries, not looking for suggestions about bigger hardware or moving the full-text catalog to a different server as that is not an option for the time being.

    Any suggestions will be greatly appreciated.

  • I'm not sure that the full-text search is the main culprit for the lack of performance.

    Can you eliminate the "UNION" and ORs from your code?

  • Hi dmoldovan,

    Thanks a billion for your suggestions.

    Removing the UNIONs did nothing but removing the ORs and replacing them with two SELECT statements is an entirely different case: the execution time for the longest query drops down to 0.01 seconds!

    Sometimes it's just the simple things.

    Final code:

    SET NOCOUNT ON

    SELECT

    'Client' AS [Type], [Clients].[ClientID] AS [ID], NULL AS [ParentID], [Clients].[Number],

    ([Clients].[GivenName] + ' ' + [Clients].[FamilyName]) AS [Name]

    FROM

    [Clients]

    LEFT OUTER JOIN [Patients] ON [Clients].[ClientID] = [Patients].[ClientID]

    WHERE

    @Clients = 1 AND

    FREETEXT([Clients].[IndexedContent], @Query)

    UNION

    SELECT

    'Client' AS [Type], [Clients].[ClientID] AS [ID], NULL AS [ParentID], [Clients].[Number],

    ([Clients].[GivenName] + ' ' + [Clients].[FamilyName]) AS [Name]

    FROM

    [Clients]

    LEFT OUTER JOIN [Patients] ON [Clients].[ClientID] = [Patients].[ClientID]

    WHERE

    @Clients = 1 AND

    FREETEXT([Patients].[IndexedContent], @Query)

    UNION

    SELECT

    'Patient' AS [Type], [Patients].[PatientID] AS [ID], [Patients].[ClientID] AS [ParentID], [Patients].[Number],

    ([Patients].[Name] + ' ' + CASE WHEN [Breeds].[Name] IS NOT NULL THEN '(' + [Breeds].[Name] +')' ELSE CASE WHEN [Species].[Name] IS NOT NULL THEN '(' + [Species].[Name] + ')' ELSE '' END END) AS [Name]

    FROM

    [Patients]

    LEFT OUTER JOIN [Clients] ON [Patients].[ClientID] = [Clients].[ClientID]

    LEFT OUTER JOIN [Species] ON [Patients].[SpeciesID] = [Species].[SpeciesID]

    LEFT OUTER JOIN [Breeds] ON [Patients].[BreedID] = [Breeds].[BreedID]

    WHERE

    @Patients = 1 AND

    FREETEXT([Patients].[IndexedContent], @Query)

    UNION

    SELECT

    'Patient' AS [Type], [Patients].[PatientID] AS [ID], [Patients].[ClientID] AS [ParentID], [Patients].[Number],

    ([Patients].[Name] + ' ' + CASE WHEN [Breeds].[Name] IS NOT NULL THEN '(' + [Breeds].[Name] +')' ELSE CASE WHEN [Species].[Name] IS NOT NULL THEN '(' + [Species].[Name] + ')' ELSE '' END END) AS [Name]

    FROM

    [Patients]

    LEFT OUTER JOIN [Clients] ON [Patients].[ClientID] = [Clients].[ClientID]

    LEFT OUTER JOIN [Species] ON [Patients].[SpeciesID] = [Species].[SpeciesID]

    LEFT OUTER JOIN [Breeds] ON [Patients].[BreedID] = [Breeds].[BreedID]

    WHERE

    @Patients = 1 AND

    FREETEXT([Clients].[IndexedContent], @Query)

    UNION

    SELECT

    'Referal' AS [Type], [Referals].[ReferalID] AS [ID], NULL AS [ParentID], NULL AS [Number],

    ([Referals].[Name] + CASE WHEN [Referals].[Contact] IS NOT NULL THEN ' (' + [Referals].[Contact] + ')' ELSE '' END) AS [Name]

    FROM

    [Referals]

    WHERE

    @Referals = 1 AND FREETEXT([Referals].[IndexedContent], @Query)

    UNION

    SELECT

    'User' AS [Type], [UserID] AS [ID], NULL AS [ParentID], NULL AS [Number],

    ([GivenName] + ' ' + [FamilyName]) AS [Name]

    FROM

    [Users]

    WHERE

    @Users = 1 AND FREETEXT([Users].[IndexedContent], @Query)

    ORDER BY

    [Name]

    Thanks again - you've saved my day

  • You're welcome...

    Now examine the actual execution plan and see if you can still improve the code and / or indexing. For example, UNION adds a "sort" step to the execution plan, as you can see here: http://www.sqlshare.com/media.aspx?vid=389&

    However, watch out - better can be sometimes the enemy of the plain "good"...:-)

  • My guess is that the code corresponds to a search form - if you give more details, you can get help if you need further changes...

  • Indeed it's to be used on a search form where the user can select to search in clients, patients, referals and users (hence the @Client = 1 etc.)

    I have removed the index columns and am now indexing on just the nvarchar columns but that poses some problems:

    Is it possible to search int columns (Client number) as well - or do I need for instance BEGIN TRY CAST(@Query AS int)... for that?

    I have split the clients names in two columns GivenName and FamilyName but this means that if a guy is called 'Jens Martin' 'Olson' I don't get all results with a query like 'Jens Olson'.

    Also I have replaced the FREETEXT(...) with INNER JOIN CONTAINSTABLE which works pretty cool with ranking but using INNER JOIN FREETEXTTABLE totally kills the running time again.

  • Similarly a query like '"Jens" OR "Martin" OR "Olsen"' also kill the running time completely.

  • Examining the actual execution plan would be really helpful - it shows you what to change (indexing, table structure), if you need any change.

    Please see

    http://www.sommarskog.se/dyn-search-2005.html

    for solutions to challenges very similar to yours.

    In your case, I think I'd choose the dynamic SQL. I'd construct a query string depending on what the user chooses on the form. And yes, you can search the IDs...

Viewing 8 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply