Fastest way to look for a word from a string matching a list of words in a lookup table

  • Hi All
    I have a table currently holding 5.2 mil rows with a total of 50 + columns in about 10 of the columns could have a string which I need to check against my lookup table with about 15000 rows of the single or pattern of words i need to find the first matching value and insert the DataID from my lookup table into a staging table for later use I have some code that works but is' very slow doing about 10000 rows in about 2 hours and I have 5.2 mil to look through 10 times so any ideas on how I could speed this up please I have put the code and examples of the lookup table and the strings it could be looking up in

    Lookup Table "Dim_InMem_ReplaceSize" 15488 rows sorted ascending by len of the value I'm looking for in the string in the products table
    LookupValue                 LookupLenSort    DataID
    48K                               3                           15072
    50A                               3                           15073
    32 | K                            6                           13817
    34 - A                            6                           13818
    34DD|36D|38DD3       14                           7499
    30/B,30/C,32/C           14                           5622

    Products Table with 5.2 mil rows but this will be bigger
    One of the columns has the following value
    model_number
    76576U-Generic Font-Cleverley/18-24 Months
    7418BSUSBADUN-- Inter Font-Bobby Boswell/M
    Body Sculpture BE-6120GHW Elliptical Cross Trainer CLICK TO VIEW PRODUCT VIDEO! 12 Program ComputerMagnetic Brake System4 user settingsHand Pulse SensorsAdjustable SaddleDual HandlebarsFREE NEXT DAY DELIVERY
    Big Air Universal Octagonal 10ft Trampoline + Safety EnclosureFor all the Family56 All New Extreme Bounce Springs2.0mm Gauge Top Frame5 Year Frame WarrantyHOT Galvanised FrameFREE NEXT DAY DELIVERY AVAILABLE

    The code I have currently is a function wrapping the column i'm looking up from one at a time in stages 

    this is the function
    (
        @Value VARCHAR(8000)
    )
    RETURNS TABLE AS
    RETURN
        WITH ColourList AS ( -- replace this section with your Colour table when the time comes.
        SELECT SCT.LookupValue,SCT.DataID
        ----,SCT.ReplaceBackSize
        FROM dbo.Dim_InMem_ReplaceSize SCT
            ),
            LookupList
            AS
            (
                SELECT
                    ROW_NUMBER() OVER (ORDER BY
                    --LEN(cl.LookupValue
                    cl.LookupLenSort
                    --)
                    --DESC
                    ) AS rn,    -- prevents "Yellow" from butchering up "LightGoldenRodYellow"..
                    cl.LookupValue,DataID
                    ----,cl.ReplaceBackSize

                FROM
                    dbo.Dim_InMem_ReplaceSize cl
                WHERE
                    @Value LIKE '%' + cl.LookupValue + '%'
            ), RecursiveReplace AS (
                -- -- My select 1
                SELECT cis.DataID AS ReturnString,
                ----cis.ReplaceBackSize AS ReturnString,
                    cis.rn
                FROM
                    LookupList cis
                WHERE
                    cis.rn = 1

                UNION ALL

                -- -- My select 2
                SELECT cis.DataID
                ------cis.ReplaceBackSize
                AS ReturnString,
                    cis.rn
                FROM
                    LookupList cis
                    JOIN RecursiveReplace rr
                        ON cis.rn = rr.rn + 1
            )
        SELECT TOP 1
            rr.ReturnString
        FROM
            RecursiveReplace rr
        ORDER BY
            rr.rn DESC;

    And this is how I'm using it in 10k chunks the lookup is the one taking the time

    DECLARE        @DateNow    DATE    =    CONVERT(DATE,GETDATE())
    -- -- -- -- -- Large batch processing
    ---- -- -- -- -- Lookup from column model_number

    WHILE EXISTS (
    SELECT TOP 1
    Fact_ProviderProductID        =    FSM.Fact_ProviderProductID
    ,Fact_MasterProductID        =    FSM.Fact_MasterProductID
    ,FlagProcessed                =    NULL
    FROM dbo.Fact_InMem_Staging_MasterProductLookupProviderProductID FSM
    LEFT OUTER JOIN dbo.Fact_InMem_Staging_MasterProductLookupProviderProductIDSplitBatch FSMS
        ON FSM.Fact_ProviderProductID = FSMS.Fact_ProviderProductID
    WHERE FSMS.Fact_ProviderProductID IS NULL
    )

    BEGIN

    INSERT INTO dbo.Fact_InMem_Staging_MasterProductLookupProviderProductIDSplitBatch
    SELECT TOP 10000
    Fact_ProviderProductID        =    FSM.Fact_ProviderProductID
    ,Fact_MasterProductID        =    FSM.Fact_MasterProductID
    ,FlagProcessed                =    0
    FROM dbo.Fact_InMem_Staging_MasterProductLookupProviderProductID FSM
    LEFT OUTER JOIN dbo.Fact_InMem_Staging_MasterProductLookupProviderProductIDSplitBatch FSMS
        ON FSM.Fact_ProviderProductID = FSMS.Fact_ProviderProductID
    WHERE FSMS.Fact_ProviderProductID IS NULL

    INSERT INTO dbo.Staging_Feed_AffW_MasterSizeLookup
    SELECT
    aw_product_id                =    SFA.aw_product_id
    ,[Value]                    =    (SELECT lc.ReturnString FROM dbo.UTfn_LikeSizeTableDataID(REPLACE(SFA.model_number + ' ','/',' ')) lc)
    ,[Source]                    =    'model_number'
    ,Etl_DateFirstLoad            =    @DateNow
    ,Etl_DateUpdated            =    @DateNow
    FROM dbo.Fact_InMem_Staging_MasterProductLookupProviderProductIDSplitBatch FSMPP
    INNER JOIN dbo.Staging_Feed_AffW SFA
        ON FSMPP.Fact_ProviderProductID = SFA.aw_product_id
    WHERE FSMPP.FlagProcessed = 0
    ----(SELECT lc.ReturnString FROM dbo.UTfn_LikeSizeTableDataID(REPLACE(SFA.model_number + ' ','/',' ')) lc) IS NOT NULL

    UPDATE dbo.Fact_InMem_Staging_MasterProductLookupProviderProductIDSplitBatch
    SET FlagProcessed = 1
    WHERE FlagProcessed = 0

    END

    The above was something that was recommended to me on I think the SQL 2014 forum
    Could anyone recommend a faster way to do this please as having to wait for about 2 months for some look up's to complete is not my idea of best pratice

    PS I have no control over the source of the data

    Kind regards
    The Northern Monkey

  • Not sure this is ever going to go very fast.   Also, using the function that way, while it is an inline table-valued function, the problem is that it's used in a way that I'm not sure is really going to work very well.   You also are performing a REPLACE function on long string data, so that's not helping either.   You might be able to use FULL TEXT INDEXING, because your search is a CONTAINS kind of search, and it would have to be on the smaller table.  I don't know enough about FTI to help much, so Google will be your best friend in that regard.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • maybe a set up script that creates the tables and inserts some simple sample data and also the expected results would help......you have had many views and only one reply.
    just saying.

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply