Full Text Search Slowness

  • We have a table containing open Job positions, and want to use Full-Text search on the job descriptions, to look for "finance", "accounting" etc, so I set up full-text searching, but it seems much slower than I would expect. The data being searched is in a varchar(max) column, there are 330,000 rows, the longest string is 64,000 chracters, and only 370 rows greater than 10,000 characters, so seems to me it shouldn't be a long running query..

    What I did:

    1) Enabled the database for full text search.

    2) Define Full-Text Index, select the column JOB_Combined_Title_Description

    3) create a new catalog on the F drive

    4) Populate the catalog

    5) run query such as:

    Takes 10 seconds returning 18,000 rows

    select job_id, job_title

    from Job_TextSearchTesting

    WHERE CONTAINS(JOB_Combined_Title_Description, '"finance" OR "accounting"');

    Table: Full text search is built on the 2nd to last column, JOB_Combined_Title_Description

    It had HTML code, but I stripped that out first.

    USE [My_Database]

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    SET ANSI_PADDING ON

    GO

    CREATE TABLE [dbo].[Job_TextSearchTesting](

    [JOB_ID] [int] IDENTITY(125000,1) NOT NULL,

    [JOB_origMemID] [int] NULL,

    [JOB_memID] [int] NULL,

    [JOB_memRecID] [varchar](20) NULL,

    [JOB_entryMemID] [int] NULL,

    [JOB_entryRecID] [varchar](20) NULL,

    [JOBS_ID] [int] NULL,

    [JOB_PrivateOnly] [bit] NOT NULL,

    [JOB_jobType] [varchar](50) NULL,

    [JOB_jobType11] [int] NULL,

    [JOB_jobType21] [int] NULL,

    [JOB_refNum] [varchar](40) NULL,

    [JOB_releaseDate] [datetime] NULL,

    [FIRMT_ID] [int] NULL,

    [TRANSTYPE_ID] [int] NULL,

    [JOB_title] [varchar](80) NULL,

    [JOB_funcCode1] [varchar](5) NULL,

    [JOB_funcCode2] [varchar](5) NULL,

    [JOB_srMgmt] [bit] NOT NULL,

    [JOB_reportsTo] [varchar](40) NULL,

    [JOB_coSizeNum] [decimal](18, 1) NULL,

    [JOB_coSize] [varchar](15) NULL,

    [JOB_coSizeCode] [bigint] NULL,

    [JOB_indCode1] [varchar](5) NULL,

    [JOB_indCode2] [varchar](5) NULL,

    [JOB_indName1] [varchar](50) NULL,

    [JOB_indName2] [varchar](50) NULL,

    [JOB_locCode1] [varchar](6) NULL,

    [JOB_locName1] [varchar](50) NULL,

    [JOB_locCity] [varchar](50) NULL,

    [JOB_actLocCode] [varchar](6) NULL,

    [JOB_PI_company] [text] NULL,

    [JOB_salaryFormat] [bit] NOT NULL,

    [JOB_salaryDMin] [money] NULL,

    [JOB_salaryDMax] [money] NULL,

    [JOBCUR_ID] [int] NULL,

    [JOB_bonus] [int] NULL,

    [JOB_bonusPrMin] [float] NULL,

    [JOB_bonusPrMax] [float] NULL,

    [JOB_bonusDMin] [money] NULL,

    [JOB_bonusDMax] [money] NULL,

    [JOB_options] [int] NULL,

    [JOB_commission] [int] NULL,

    [JOB_car] [int] NULL,

    [JOB_incentives] [int] NULL,

    [RELOC_ID] [int] NULL,

    [JOB_PI_salary1] [text] NULL,

    [JOB_salaryMid] [money] NULL,

    [JOB_bonusMid] [money] NULL,

    [JOB_totalComp] [money] NULL,

    [JOB_salaryReport] [varchar](50) NULL,

    [JOB_PI_salary2] [text] NULL,

    [JOB_PI_specReq] [text] NULL,

    [JOB_PI_companyDesc] [text] NULL,

    [JOB_PI_posDesc] [text] NULL,

    [JOB_PI_RemClose] [varchar](255) NULL,

    [JOB_PI_posUpdated] [text] NULL,

    [JOBRMV_ID] [int] NULL,

    [JOB_replyDirect] [bit] NOT NULL,

    [JOB_postNameDisp] [bit] NOT NULL,

    [JOB_postName] [varchar](150) NULL,

    [JOB_postAddressDisp] [bit] NOT NULL,

    [JOB_postAddress] [text] NULL,

    [JOB_postPhoneDisp] [bit] NOT NULL,

    [JOB_postPhone] [varchar](30) NULL,

    [JOB_postFaxDisp] [bit] NOT NULL,

    [JOB_postFax] [varchar](30) NULL,

    [JOB_postEmailDisp] [bit] NOT NULL,

    [JOB_postEmail] [varchar](100) NULL,

    [JOB_confidentialEmail] [varchar](100) NULL,

    [JOB_PI_contact] [text] NULL,

    [JOB_PI_contactEdit] [bit] NOT NULL,

    [JOB_staffComments] [text] NULL,

    [JOB_operator] [varchar](30) NULL,

    [JOB_lastUpdated] [datetime] NOT NULL,

    [JOB_creationDate] [datetime] NOT NULL,

    [JOB_locCode2] [varchar](6) NULL,

    [JOB_locName2] [varchar](50) NULL,

    [JOB_statusNum] [int] NULL,

    [JOB_emailTest] [int] NOT NULL,

    [JOB_memSpecType] [varchar](10) NULL,

    [JOB_Coupon] [varchar](6) NULL,

    [job_ShowRecProfile] [bit] NOT NULL,

    [JOB_AcceptReferrals] [bit] NOT NULL,

    [JOB_BasicMemApply] [char](1) NULL,

    [JOB_BasicMemApplyDate] [datetime] NULL,

    [JOB_JobFeed] [char](1) NULL,

    [JOB_JobFeedDate] [datetime] NULL,

    [JOB_postApplyLinkDisp] [bit] NULL,

    [JOB_postApplyLink] [varchar](max) NULL,

    [JOB_JobLevel] [int] NULL,

    [JOB_EmailSubject] [varchar](max) NULL,

    [JOB_specinstr] [varchar](max) NULL,

    [JOB_FeedCity] [varchar](50) NULL,

    [JOB_FeedState] [varchar](20) NULL,

    [JOB_FeedCountry] [varchar](50) NULL,

    [JOB_Combined_Title_Description] [varchar](max) NULL,

    [JOB_Title_Description_Updated] [char](1) NULL,

    CONSTRAINT [PK_Job_TextSearchTesting] PRIMARY KEY CLUSTERED

    (

    [JOB_ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

    GO

    SET ANSI_PADDING OFF

    GO

    EXEC sys.sp_addextendedproperty @name=N'MS_Description', @value=N'Combine JOB_title JOB_PI_posDesc for Full text searching' , @level0type=N'SCHEMA',@level0name=N'dbo', @level1type=N'TABLE',@level1name=N'Job_TextSearchTesting', @level2type=N'COLUMN',@level2name=N'JOB_Combined_Title_Description'

  • A few things:

    Can you post the query plan that's used for this query?

    Also, does that column need to be NULLable? If not can you set it to NOT NULL, run the query and see if that helps?

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (1/11/2016)


    A few things:

    Can you post the query plan that's used for this query?

    Also, does that column need to be NULLable? If not can you set it to NOT NULL, run the query and see if that helps?

    There are NULLs in the column, not sure why though. Could be old, inactive records.

    StmtText

    ------------------------------------------------------------------------------

    select job_id, job_title

    from Job_TextSearchTesting

    WHERE CONTAINS(JOB_Combined_Title_Description, '"finance" OR "accounting"')

    StmtText

    ---------------------------------------------------------------------------------------

    |--Hash Match(Inner Join, HASH: ([Full-text Search Engine].)=([execunet_DataMaint].[dbo].[Job_TextSearchTesting].[JOB_ID]))

    |--Remote Scan(OBJECT: (CONTAINS))

    |--Clustered Index Scan(OBJECT: ([execunet_DataMaint].[dbo].[Job_TextSearchTesting].[PK_Job_TextSearchTesting]))

  • homebrew01 (1/11/2016)


    Alan.B (1/11/2016)


    A few things:

    Can you post the query plan that's used for this query?

    Also, does that column need to be NULLable? If not can you set it to NOT NULL, run the query and see if that helps?

    There are NULLs in the column, not sure why though. Could be old, inactive records.

    StmtText

    ------------------------------------------------------------------------------

    select job_id, job_title

    from Job_TextSearchTesting

    WHERE CONTAINS(JOB_Combined_Title_Description, '"finance" OR "accounting"')

    StmtText

    ---------------------------------------------------------------------------------------

    |--Hash Match(Inner Join, HASH: ([Full-text Search Engine].)=([execunet_DataMaint].[dbo].[Job_TextSearchTesting].[JOB_ID]))

    |--Remote Scan(OBJECT: (CONTAINS))

    |--Clustered Index Scan(OBJECT: ([execunet_DataMaint].[dbo].[Job_TextSearchTesting].[PK_Job_TextSearchTesting]))

    Ok, I'm buried at the moment with work but I think I have a good alternative way of getting what you need. I'll be back online in a few hours and post what I have.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Ok, I have a nasty fast alternative for your under performing full-text index. We're going to create what I'll call a Nindex (N-Grams Index); it involves a Tally table and a couple indexed views. I may update this post with a solution that only uses one indexed view but, for now, this will be a huge improvement. The code below can be copied/pasted and ran as is.

    In summary we're going to:

    1) Create a Tally table with 1,000,000 rows and do some pre-cleanup

    2) Create 330,000 rows of sample data similar to what you are working with 5.5% of the records containing the text "accounting" or "finance" in your JOB_Combined_Title_Description column

    3) Create 2 indexed views

    Lastly, I'll show you how to utilize those index views to get your 18,000 records in ~200ms. Note the comments in my code:

    USE tempdb -- somewhere safe for testing

    GO

    /****************************************************************************************

    STEP 1: Prep and required tally table (a CTE Tally table will not do)

    ****************************************************************************************/;

    IF OBJECT_ID('dbo.Job_TextContainsA') IS NOT NULL DROP VIEW dbo.Job_TextContainsA;

    IF OBJECT_ID('dbo.Job_TextContainsF') IS NOT NULL DROP VIEW dbo.Job_TextContainsF;

    IF OBJECT_ID('dbo.Job_TextSearch') IS NOT NULL DROP TABLE dbo.Job_TextSearch;

    IF OBJECT_ID('dbo.Tally') IS NOT NULL DROP TABLE dbo.Tally;

    CREATE TABLE dbo.Tally(N int NOT NULL)

    INSERT dbo.Tally

    SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

    FROM sys.all_columns a, sys.all_columns b;

    ALTER TABLE dbo.Tally ADD CONSTRAINT pk_tally_N PRIMARY KEY CLUSTERED(N);

    ALTER TABLE dbo.Tally ADD CONSTRAINT uq_tally_N UNIQUE(N);

    GO

    /****************************************************************************************

    STEP 2: Use the tally table to create the sample data

    This runs for about 10 seconds on my PC and creates a simplified version of your

    table with 330,000 records. ~5.5% of the JOB_Combined_Title_Description fields will

    contain the text "accounting" or "finance".

    ****************************************************************************************/;

    WITH

    base AS

    (

    SELECT TOP(330000)

    job_id = ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),

    x = ABS(CHECKSUM(newid())%1000)+1,

    fa = ABS(CHECKSUM(newid())%2)

    FROM dbo.tally a, dbo.tally b

    ),

    txt AS

    (

    SELECT

    job_id,

    job_Combined_Title_Description = REPLICATE(NEWID(),ABS(ABS(CHECKSUM(newid())%200)-100)),

    x, fa

    FROM base

    )

    SELECT

    job_id,

    job_title = ISNULL(CAST('Some Job Title...' AS varchar(100)),'x'),

    job_combined_title_description =

    ISNULL

    ((

    CASE

    WHEN x < 55

    THEN STUFF

    (

    JOB_Combined_Title_Description,

    ABS(CHECKSUM(newid())%100)+1,0,

    CASE fa WHEN 1 THEN 'Accounting' ELSE 'Finance' END

    )

    ELSE JOB_Combined_Title_Description

    END),CAST(NEWID() AS varchar(36))+'Finance')

    INTO dbo.Job_TextSearch

    FROM txt;

    GO

    ALTER TABLE dbo.Job_TextSearch ALTER COLUMN job_id int NOT NULL;

    GO

    ALTER TABLE dbo.Job_TextSearch ADD CONSTRAINT pk_strings PRIMARY KEY(job_id);

    GO

    /****************************************************************************************

    STEP 3: Create a couple indexed views that filter for the text "accounting" of "finance"

    The first view will contain all of the job_ids where job_Combined_Title_Description

    contains "accounting" and another for "finance".

    Notes:

    1. The indexes will take about 2 minutes to build but you would only need to build them

    once.

    2. This could all be done in one indexed view but at a slight performance cost; for now

    we'll use two.

    3. Instead of a view you could do this in a table but an indexed view will be much easier

    to maintain.

    4. If you will need to search for more text than just "accounting" or "finance" we can

    come back revise this solution.

    ****************************************************************************************/;

    -- View for "accounting"

    CREATE VIEW dbo.Job_TextContainsA

    WITH SCHEMABINDING AS

    SELECT job_id, AF='A'

    FROM dbo.Job_TextSearch

    CROSS JOIN dbo.Tally

    WHERE N <= LEN(job_Combined_Title_Description)

    AND LEN(JOB_Combined_Title_Description) >= 10

    AND SUBSTRING(JOB_Combined_Title_Description,N,10) = 'Accounting';

    GO

    -- View for "finance"

    CREATE VIEW dbo.Job_TextContainsF

    WITH SCHEMABINDING AS

    SELECT job_id, AF='F'

    FROM dbo.Job_TextSearch

    CROSS JOIN dbo.Tally

    WHERE N <= LEN(JOB_Combined_Title_Description)

    AND LEN(JOB_Combined_Title_Description) >= 7

    AND SUBSTRING(JOB_Combined_Title_Description,N,7) = 'Finance';

    GO

    -- Create indexes on the views

    CREATE UNIQUE CLUSTERED INDEX pk_ContainsA ON dbo.Job_TextContainsA (job_id, AF);

    GO

    CREATE UNIQUE CLUSTERED INDEX pk_ContainsF ON dbo.Job_TextContainsF (job_id, AF);

    GO

    Here's see how to retrieve the data and run a performance test:

    SET STATISTICS TIME ON;

    SELECT s.job_id, Job_title

    FROM dbo.Job_TextSearch s

    JOIN dbo.Job_TextContainsA a ON s.job_id = a.job_id

    UNION ALL

    SELECT s.job_id, Job_title

    FROM dbo.Job_TextSearch s

    JOIN dbo.Job_TextContainsA a ON s.job_id = a.job_id;

    SET STATISTICS TIME OFF;

    Test Results:

    SQL Server Execution Times:

    CPU time = 218 ms, elapsed time = 236 ms.

    ... and the query plan:

    The one warning is: if this is a table that get's inserted and updated a lot or you're inserting/updating 1000's of rows at a time my solution will add some overhead (not too much but there is a cost there).

    Edit: couple errors in my comments

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply