More efficient way to perform a lookup?

  • I've spent a fair bit of time searching, but can't seem to find a succinct answer...is there a more efficient/faster way of performing the following query? I simply have a lookup table with 45k records and want to search a string field for the presence of any of these lookup values. Currently this query takes 45 min on 800k records. Would it help to index [Table1] on [Field1]? Would a subquery with patindex() be faster?

    UPDATE [Table1]

    SET [Field2] = 'Yes'

    FROM [Table1] A

    JOIN [Lookup] as B

    ON (A.Field1 like '% ' + B.LookupValue + ' %')

    Thanks!

  • r_noob (7/15/2013)


    I've spent a fair bit of time searching, but can't seem to find a succinct answer...is there a more efficient/faster way of performing the following query? I simply have a lookup table with 45k records and want to search a string field for the presence of any of these lookup values. Currently this query takes 45 min on 800k records. Would it help to index [Table1] on [Field1]? Would a subquery with patindex() be faster?

    UPDATE [Table1]

    SET [Field2] = 'Yes'

    FROM [Table1] A

    JOIN [Lookup] as B

    ON (A.Field1 like '% ' + B.LookupValue + ' %')

    Thanks!

    I think this is going to be a stunningly bad performer regardless. But an index might help. What might be even faster, believe it or not, is to do an EXISTS query instead of a JOIN. Hell, even a cursor approach could be faster than the join, which could hit millions of times on a 45000X800000 join of that type.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I agree with Kevin. A looped search would probably be faster than a Join because you could then add a WHERE Field2 <> 'Yes" to the code which would speed things up quite a bit especially if there were some intelligent indexing around that column (at least on a temporary basis).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • You may want to try full text search here. Together with using WHERE EXISTS instead of JOIN, should be considerably faster than JOIN with non sargable LIKE. For some reason I think it will win over loop-based aproch too.

    _____________________________________________
    "The only true wisdom is in knowing you know nothing"
    "O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!":-D
    (So many miracle inventions provided by MS to us...)

    How to post your question to get the best and quick help[/url]

  • I am not sure full text indexing will help with the '%' + column + '%' join ...

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • r_noob (7/15/2013)


    I've spent a fair bit of time searching, but can't seem to find a succinct answer...is there a more efficient/faster way of performing the following query? I simply have a lookup table with 45k records and want to search a string field for the presence of any of these lookup values. Currently this query takes 45 min on 800k records. Would it help to index [Table1] on [Field1]? Would a subquery with patindex() be faster?

    UPDATE [Table1]

    SET [Field2] = 'Yes'

    FROM [Table1] A

    JOIN [Lookup] as B

    ON (A.Field1 like '% ' + B.LookupValue + ' %')

    Thanks!

    Try this:

    IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL

    DROP TABLE #TempTable

    CREATE TABLE #TempTable (

    [ID] INT IDENTITY(1,1) NOT NULL,

    [Field1] INT NULL,

    [Field2] NVARCHAR(3) NULL,

    PRIMARY KEY (ID))

    INSERT INTO #TempTable

    SELECT 22,'No' UNION ALL

    SELECT 31,'No' UNION ALL

    SELECT 42,'No' UNION ALL

    SELECT 55,'No'

    IF OBJECT_ID('tempdb..#LookupTable') IS NOT NULL

    DROP TABLE #LookupTable

    CREATE TABLE #LookupTable (

    [ID] INT IDENTITY(1,1) NOT NULL,

    [LookupValue] INT NULL,

    [LookDesc] NVARCHAR(30) NULL,

    PRIMARY KEY (ID))

    INSERT INTO #LookupTable

    SELECT 22,'Apple' UNION ALL

    SELECT 33,'Orange' UNION ALL

    SELECT 44,'Banana' UNION ALL

    SELECT 55,'Pear'

    UPDATE #TempTable

    SET Field2 = 'Yes'

    FROM

    #TempTable AS A

    INNER JOIN

    #LookupTable AS B

    ON A.ID = B.ID

    WHERE

    A.Field1 = B.LookupValue

    SELECT * FROM #TempTable AS tt

     

  • r_noob (7/15/2013)

    ... ON (A.Field1 like '% ' + B.LookupValue + ' %')

    That looks like words in a sentence. Try Jeff's splitter: split the sentence on space, remove any leading blanks and seek into your lookup table.

    Like this:

    SELECT *

    FROM #Table1 a

    CROSS APPLY dbo.DelimitedSplit8K(a.Field1, ' ') split

    WHERE EXISTS (SELECT 1 FROM #Lookup b

    WHERE b.LookDesc = split.Item)

    SELECT *

    FROM #Table1 a

    CROSS APPLY dbo.DelimitedSplit8K(a.Field1, ' ') split

    INNER LOOP JOIN #Lookup b

    ON b.LookDesc = split.Item

    Prep:

    IF OBJECT_ID('tempdb..#Table1') IS NOT NULL

    DROP TABLE #Table1

    CREATE TABLE #Table1 (

    [ID] INT IDENTITY(1,1) NOT NULL,

    [Field1] VARCHAR(100),

    [Field2] VARCHAR(3) NULL)

    INSERT INTO #Table1

    SELECT 'The quick brown fox','No' UNION ALL

    SELECT 'jumped over','No' UNION ALL

    SELECT 'the','No' UNION ALL

    SELECT 'lazy dog','No' UNION ALL

    SELECT 'One day in the life of Ivan Denisovitch', 'No'

    INSERT INTO #Table1 select a.Field1, 'No' AS Field2 from #Table1 a, sys.columns b

    CREATE UNIQUE CLUSTERED INDEX ucx_ID ON #Table1 (ID)

    IF OBJECT_ID('tempdb..#Lookup') IS NOT NULL

    DROP TABLE #Lookup

    CREATE TABLE #Lookup ([LookDesc] VARCHAR(30) NOT NULL)

    INSERT INTO #Lookup

    SELECT 'dog' UNION ALL

    SELECT 'fox' UNION ALL

    SELECT TOP 45000 x = RIGHT(NEWID(),5)+RIGHT(NEWID(),5) FROM SYSCOLUMNS A, SYSCOLUMNS B

    CREATE UNIQUE CLUSTERED INDEX ucx_LookDesc ON #Lookup (LookDesc)


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • The performance of Chris' solution is fantastic.

    If we use his setup script, we can use the following code to perform the update:

    IF OBJECT_ID('tempdb..#t2') IS NOT NULL drop table #t2

    SELECT DISTINCT ID

    INTO #t2

    FROM #Table1 a

    CROSS APPLY dbo.DelimitedSplit8K(a.Field1, ' ') split

    WHERE EXISTS (SELECT 1 FROM #Lookup b WHERE b.LookDesc = split.Item)

    UPDATE #Table1

    SET Field2 = 'Yes'

    WHERE ID in (

    select ID from #t2

    )

    This code executes in less than 100ms on my machine. I expect similar results for the real data. Not bad to go from 45 minutes to sub-second performance!

    Well done!

    /SG

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply