Limiting FTS ContainsTable search query to a smaller set of data motivated by trying to fix some FT GATHERER blocking

  • SQL Version: SQL 2008 R2 10.50.2550

    There's a few parts to this so not sure if I should break this out to separate topics... I'm a little new to Full Text Searches so this first part I'm just generally curious about and then I'll elaborate on motive...

    The current structure of the FTS is as follows

    SELECT TOP 200 c.EmployerId, c.EmployeeId

    FROM dbo.EmployeeFTS c

    INNER JOIN CONTAINSTABLE

    (

    dbo.EmployeeFTS

    ,SearchText

    ,@query

    ) results

    ON c.EmployeeId = results.[Key]

    WHERE

    c.EmployerId= @EmployerId

    Order by results.Rank DESC

    The idea is that it's a multi-tenant DB so all searches only need to occur over a given EmployerId. But the way this is written seems a little inefficient because it's searching over all employers and then after that filtering out other employer matches... It seems like it'd be better to be able to utilize top_n_rank of ContainsTable but in order to do that ContainsTable needs to only operate over a specific Employer's set of data... Is there a recommended way to go about this?

    Search Text is a computed column which is concatenation of other columns in the EmployeeFTS table...

    So what I've done so far is to add "EmployerId:<EmployerId>" to the concatenation of that computed column and then append that onto the query... so the search becomes

    set @query = 'EmployerId:' + cast(@EmployerId as varchar(10)) + ' AND ' + @query

    SELECT c.EmployerId, c.EmployeeId

    FROM dbo.EmployeeFTS c

    INNER JOIN CONTAINSTABLE

    (

    dbo.EmployeeFTS

    ,SearchText

    ,@query

    ,200

    ) results

    ON c.EmployeeId= results.[Key]

    WHERE

    c.EmployerId= @EmployerId --I left this on just to be extra sure

    From my initial tests the results returned are always the same, the logical reads from statistics is way lower, and the cost of the execution plan is a lot lower as well...

    It FEELS like sort of a hack to do it this way though so I was curious if there's a better recommendation...

    -I read this

    http://technet.microsoft.com/en-us/library/cc721269%28SQL.100%29.aspx#_Toc202506246

    Where it talks about being able to limit by using contains, but then I don't get ranking. And from the tests I did the containstable version I wrote above with the concatenated EmployerId performs better...

    Additional thing I'm doing and some notes:

    -I've enabled Trace Flag 7646 on staging and I'm going to move that to production here soon to see if that alleviates the issue. I'm not really that worried about the delay in accuracy this is going to cause.

    -We're applying the latest SP to 2008 R2 later this month not really sure if there's something in there that will help

    -There are about a million records in the table and it gains about 500 new records a day

    Motivation:

    On a daily basis the Employee searches are getting blocked by this

    <command_type>FT GATHERER</command_type>

    <block_info>

    <lock_type>objectlock</lock_type>

    <object_name>fulltext_docidfilter_2104498676</object_name>

    </block_info>

    Which is in turn blocked by some longer running Employee Search... So the blocking chain goes

    -Long search with large number of reads blocks FT GATHERER

    -FT GATHERER which is waiting for (1x: 56572ms)LCK_M_IX

    -So now a bunch of other Employee searches start to get backed up behind the FT GATHERER...

    Brent Ozar talking about stuff that seemed related

    http://www.brentozar.com/archive/2008/11/stackoverflows-sql-2008-fts-issue-solved/

    Which is why I'm wanting to optimize the search... There's a good chance I'll just leave Trace Flag 7646 enabled because I don't really see it causing any issues... There's also plans to move to Lucene so this might be a good catalyst for doing that as well 🙂

    Thanks in advance for your time and thoughts

  • Here's a script I whipped up that illustrates basically what I'm saying

    use master

    CREATE DATABASE FT_TEST

    GO

    use FT_TEST

    GO

    CREATE TABLE [dbo].[EmployeeFTS](

    [EmployeeID] [int] NOT NULL,

    [EmployerID] [int] NOT NULL,

    [Names] [nvarchar](150) NOT NULL,

    [PhoneNumbers] [nvarchar](max) NOT NULL,

    [Address] [nvarchar](250) NOT NULL,

    [SearchText] AS ((((([Names]+'; '))+([PhoneNumbers]+'; '))+([Address]))+((('|EmployerId:'+CONVERT([varchar](100),[EmployerId],0))))),

    CONSTRAINT [PK_EmployeeFTS] PRIMARY KEY CLUSTERED

    (

    [EmployeeID] ASC

    )

    )

    GO

    CREATE NONCLUSTERED INDEX [EmployeeFTS_EmployerId]

    ON [dbo].[EmployeeFTS] ([EmployerID])

    INCLUDE ([EmployeeID])

    GO

    CREATE FULLTEXT CATALOG [FT_TEST_EmployeeFT] WITH ACCENT_SENSITIVITY = ON

    GO

    CREATE FULLTEXT STOPLIST [StopList_EmployeeFTS]

    ;

    CREATE FULLTEXT INDEX ON [dbo].[EmployeeFTS] KEY INDEX [PK_EmployeeFTS] ON ([FT_TEST_EmployeeFT], FILEGROUP [PRIMARY]) WITH (CHANGE_TRACKING AUTO, STOPLIST [StopList_EmployeeFTS])

    GO

    ALTER FULLTEXT INDEX ON [dbo].[EmployeeFTS] ADD ([SearchText] LANGUAGE [English])

    GO

    ALTER FULLTEXT INDEX ON [dbo].[EmployeeFTS] ENABLE

    GO

    truncate table [EmployeeFTS]

    GO

    declare @i as int = 0

    declare @iStr as varchar(10)

    WHILE @i < 100000

    BEGIN

    set @iStr = cast(@i as varchar(10))

    insert into [EmployeeFTS] (EmployeeId, EmployerId, Names, PhoneNumbers, [Address])

    VALUES(@i, @i %100, 'NAme ' + @iStr, 'PhoneNumber ' + @iStr, 'Address ' + @iStr)

    set @i = @i + 1

    END

    GO

    SET STATISTICS IO ON

    GO

    declare @EmployerId as int = 2

    declare @query as varchar(4000)

    set @query = '(FORMSOF(INFLECTIONAL, NAme)) AND (FORMSOF(INFLECTIONAL, address))'

    print @query

    SELECT TOP 200 c.EmployerId, c.EmployeeId

    FROM dbo.EmployeeFTS c

    INNER JOIN CONTAINSTABLE

    (

    dbo.EmployeeFTS

    ,SearchText

    ,@query

    ) results

    ON c.EmployeeId = results.[Key]

    WHERE

    c.EmployerId= @EmployerId

    Order by results.Rank DESC

    GO

    declare @EmployerId as int = 2

    declare @query as varchar(4000)

    set @query = '(FORMSOF(INFLECTIONAL, NAme)) AND (FORMSOF(INFLECTIONAL, address))'

    set @query = '\|EmployerId:' + cast(@EmployerId as varchar(10)) + ' AND ' + @query

    print @query

    SELECT c.EmployerId, c.EmployeeId

    FROM dbo.EmployeeFTS c

    INNER JOIN CONTAINSTABLE

    (

    dbo.EmployeeFTS

    ,SearchText

    ,@query

    ,200

    ) results

    ON c.EmployeeId= results.[Key]

    WHERE

    c.EmployerId= @EmployerId

  • I realize now I probably did a bad job of creating a narrow question 🙂

    I ended up just going with

    "what I've done so far is to add "EmployerId:<EmployerId>" to the concatenation of that computed column and then append that onto the query"

    As a solution

    So I added the EmployerID to the computed column that's used for the full text index...

    And then all queries coming in I prefix that on the search string so that it filters down by employer first and then returns relevant results..

    Worked great

    Haven't had a single block on gatherer ever since and search is running faster then ever...

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply