Need Help on Fastest Search Logic

  • born2achieve (10/1/2013)


    Hi Livingston,

    thanks for your time on this and i have tried whatever you have shown couple of days before, the problem is time consuming and how to fasten the search process.

    hello...the sql code was provided as a set up for others that may like to get engaged in this thread...this is something that you have not provided so far.

    can you please confirm that, as in your original post, you want to match "corn" with "cornil" ??

    good luck

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Hi Livingston,

    my bad. i didn't realize your set up data. The data is perfect but it's not cornil. it should be corn on the last insert row.

  • Sorry about that... I was going to make a large amount of test data to test performance of various methods with but got caught up in another discussion.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Haven't abandoned this and still working on a test data generator that everyone could use on this problem. [font="Arial Black"]Is the Product_Name in Table1 UNIQUE in Table1? Or can it be duplicated with multiple different Qty numbers?[/font]

    Also, this is part of the reason why we ask for "the DDL and some readily consumable test data". It answers so many questions. Please see the first "Helpful Link" in my signature line below.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Thanks a lot for your time on this and product_name on the table1 is unique.

  • This makes use of Moden's DelimitedSplit8k which you can download in the scripts section.

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE FUNCTION tvf_ChildRowNames

    (

    @rowName varchar(250)

    )

    RETURNS TABLE

    AS

    RETURN

    (

    SELECT name from table2 where exists ( select 1 from [DelimitedSplit8K](name, ',') where item = @rowName)

    )

    GO

    You use it like,

    select name from tvf_ChildRowNames('canola')

  • Generating random CSV's with a random number of elements from the list in Table1 was a bit of a chore. The items wouldn't come out random so I had to play with it a bit to trick it. It takes a bit to build the test tables (02:17 mm:ss) on my old machine but it would appear that I've been able to generate test data not unlike your real data for everyone to play with. It's a bit late here so I'll play with it tomorrow night. In the meantime, here's the code to generate the data for folks to play with and test for performance.

    --===== Conditionally drop the test tables to make reruns in SSMS easier

    IF OBJECT_ID('tempdb..#Table1','U') IS NOT NULL DROP TABLE #Table1;

    IF OBJECT_ID('tempdb..#Table2','U') IS NOT NULL DROP TABLE #Table2;

    GO

    --===== Create Table1

    WITH

    cteGenProduct AS

    (

    SELECT DISTINCT TOP 1000

    ProductName = REPLACE(SUBSTRING(LTRIM(NEWID()),ABS(CHECKSUM(NEWID()))%10+1,ABS(CHECKSUM(NEWID()))%10+5),'-','')

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    )

    SELECT ID = IDENTITY(BIGINT,1,1)

    ,ProductName

    ,Qty = ISNULL(ABS(CHECKSUM(NEWID()))%1000,0)

    INTO #Table1

    FROM cteGenProduct

    ;

    SELECT * FROM #Table1

    ;

    --===== Create Table 2.

    -- The ProductNames from Table 1 will randomly appear as CSV data

    -- with 4 to 10 "elements"

    WITH

    cteRandomData AS

    (

    SELECT TOP 25000

    ID = ROW_NUMBER()OVER(ORDER BY (SELECT 1))

    ,Details = CAST(NEWID() AS NVARCHAR(1000))

    ,Description = CAST(NULL AS NVARCHAR(1000))

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    )

    SELECT ID

    ,ProductName =

    (SELECT STUFF(CAST(

    ( --=== Concatenate N number of random t1.ProductName's

    SELECT TOP (ABS(CHECKSUM(NEWID()))%7+4)

    N','+t1.ProductName

    FROM #Table1 t1

    WHERE t2.ID > 0 --This nonsense is to randomize the CSV

    ORDER BY NEWID()

    FOR XML PATH('')

    )

    AS NVARCHAR(MAX)),1,1,N'')

    )

    ,Details

    ,Description

    INTO #Table2

    FROM cteRandomData t2

    ;

    SELECT * FROM #Table2

    ;

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have the desired process down to 11 seconds using my two favorite tools of "Divide'n'Conquer" and the DelimitedSplit8K function on my old 32 bit single 1.8 Ghz cpu 11 year old machine. I need to clean the code up, test it on a slightly more modern machine, and then I'll post my solution. Going to bed... finally.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Again, I apologize for this taking so long. Real life had me hopping and it took a little thought on how to generate the test data and still have it finish sometime in the same week :-D. I hope you'll find it worth the wait.

    Just to keep everything all in one place, here's the code I used to setup the test. The CSVs in Table 2 all have 4 to 10 elements and ALL Product Numbers in the CSV are available in Table 1. The Product Numbers in Table 1 are unique. The Product Numbers in Table 2 are not.

    I used random lengths of random GUIDs and replaced the dashes with spaces to simulate multi-word Product Names. The code will certainly work with single word Product Names, as well. I also used a random GUID as a unique value in the "Details" column of Table 2.

    As cited in the original post, there are 1,000 rows in Table 1 (#Table1) and 25,000 rows in Table 2 (#Table2).

    Here's the code to generate the test tables according to the requirements above. On a decent machine, this code takes approximately 22 seconds to execute. It's a bit long winded because of the sort by NEWID() to randomize the Product Numbers in the CSVs.

    --===== Conditionally drop the test tables to make reruns in SSMS easier

    IF OBJECT_ID('tempdb..#Table1','U') IS NOT NULL DROP TABLE #Table1;

    IF OBJECT_ID('tempdb..#Table2','U') IS NOT NULL DROP TABLE #Table2;

    GO

    --===== Create Table1

    WITH

    cteGenProduct AS

    (

    SELECT DISTINCT TOP 1000

    ProductName = LTRIM(REPLACE(SUBSTRING(LTRIM(NEWID()),ABS(CHECKSUM(NEWID()))%10+1,ABS(CHECKSUM(NEWID()))%10+5),'-',' '))

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    )

    SELECT ID = IDENTITY(BIGINT,1,1)

    ,ProductName

    ,Qty = ISNULL(ABS(CHECKSUM(NEWID()))%1000,0)

    INTO #Table1

    FROM cteGenProduct

    ;

    SELECT * FROM #Table1

    ;

    --===== Create Table 2.

    -- The ProductNames from Table 1 will randomly appear as CSV data

    -- with 4 to 10 "elements"

    WITH

    cteRandomData AS

    (

    SELECT TOP 25000

    ID = ROW_NUMBER()OVER(ORDER BY (SELECT 1))

    ,Details = CAST(NEWID() AS NVARCHAR(1000))

    ,Description = CAST(NULL AS NVARCHAR(1000))

    FROM master.sys.all_columns ac1

    CROSS JOIN master.sys.all_columns ac2

    )

    SELECT ID

    ,ProductName =

    (SELECT STUFF(CAST(

    ( --=== Concatenate N number of random t1.ProductName's

    SELECT TOP (ABS(CHECKSUM(NEWID()))%7+4)

    N','+t1.ProductName

    FROM #Table1 t1

    WHERE t2.ID > 0 --This nonsense is to randomize the CSV

    ORDER BY NEWID()

    FOR XML PATH('')

    )

    AS NVARCHAR(MAX)),1,1,N'')

    )

    ,Details

    ,Description

    INTO #Table2

    FROM cteRandomData t2

    ;

    SELECT * FROM #Table2

    ;

    Next, we need the quintessential function to normalize Table 2. This is the "next" version of a very well tested and proven function. Please read the header of the code for more information. To be safe during this proof of concept, all of this code is being executed in TempDB.

    As a bit of a sidebar, you're using NVARCHAR(MAX) for your Product Number CSV's. This will only handle up to 8K bytes including the delimiters and you could get some unwanted changes in data if you're really using NVARCHAR for its intended purpose. If your data exceeds 8K bytes or you truly need NVARCHAR, then we'll have to wittle on all the code a bit.

    USE tempdb;

    GO

    CREATE FUNCTION [dbo].[DelimitedSplit8K]

    /**********************************************************************************************************************

    Purpose:

    Given a string containing multiple elements separated by a single character delimiter and that single character

    delimiter, this function will split the string and return a table of the single elements (Item) and the element

    position within the string (ItemNumber).

    Notes:

    1. Performance of this function approaches that of a CLR.

    2. Note that this code implicitly converts NVARCHAR to VARCHAR and that conversion may NOT be faithful.

    Revision History:

    Note that this code is a modification of a well proven function created as a community effort and initially documented

    at the following URL (http://www.sqlservercentral.com/articles/Tally+Table/72993/). This code is still undergoing

    tests. Although every care has certainly been taken to ensure its accuracy, you are reminded to do your own tests to

    ensure that this function is suitable for whatever application you might use it for.

    --Jeff Moden, 01 Sep 2013

    **********************************************************************************************************************/

    --===== Define I/O parameters

    (@pString VARCHAR(8000) , @pDelimiter CHAR(1)) --DO NOT USE MAX DATA-TYPES HERE! IT WILL KILL PERFORMANCE!

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...

    -- enough to cover VARCHAR(8000).

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E4(N) AS (SELECT 1 FROM E1 a, E1 b, E1 c, E1 d), --10E+4 or 10,000 rows max

    cteTally(N) AS ( --=== This provides the "base" CTE and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS ( --=== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT 1 UNION ALL

    SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter COLLATE Latin1_General_BIN

    ),

    cteLen(N1,L1)AS ( --=== Return start and length (for use in substring).

    -- The ISNULL/NULLIF combo handles the length for the final of only element.

    SELECT s.N1,

    ISNULL(NULLIF(CHARINDEX(@pDelimiter ,@pString COLLATE Latin1_General_BIN,s.N1) ,0)-s.N1,8000)

    FROM cteStart s

    )

    --===== Do the actual split.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),

    Item = SUBSTRING(@pString, l.N1, l.L1)

    FROM cteLen l

    ;

    GO

    I did try the LIKE and the CHARINDEX methods. They each took over 3 minutes to run and they both have a hidden problem. For example, if the value in Table 1 is "Shot", both methods will also find "Shot Glass" and "Shotgun", which doesn't seem correct to me.

    Since the overwhelming recommendation is to "normalize" the data in Table 2 and that, apparently, can't be done at this point in the project, the following code creates a normalized result set for table 2 as a CTE and then runs against that. This is proof positive that data should never be stored in a CSV column because, even with the overhead of having to normalize Table 2 as a first step, it blows the other methods away coming in at only 2.1 to 2.5 seconds on my laptop.

    The code below produces the desired result with some additions. The result is stored in a temporary table (#MyHead) as requested. I also took the liberty of combining the output of Table1 and Table2 so that you know what uses what. I also include an "ItemNumber" column so you know which element in the CTE matched the Product Number from Table1 and included the ID's of both just in case your "additional processing" might need it.

    Since the code is all in one query, it is possible to convert the code to either a view or a high performance Inline Table Valued Function (iTVF).

    WITH

    cteNormalize AS

    ( --=== This normalizes T2

    SELECT t2.ID, split.ItemNumber, Split.Item

    FROM #Table2 t2

    CROSS APPLY dbo.DelimitedSplit8k(t2.ProductName,',') split

    )

    SELECT Table1_ID = t1.ID

    ,Table1_ProductName = t1.ProductName

    ,norm.ItemNumber

    ,Table2_ID = t2.ID

    ,ProductNameCSV = t2.ProductName

    ,t2.Details

    ,t2.Description

    INTO #MyHead

    FROM #Table1 t1

    JOIN cteNormalize norm

    ON norm.Item = t1.ProductName

    JOIN #Table2 t2

    ON t2.ID = norm.ID

    ;

    Please send beer... I already have enough pretzels. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    WOW!!!!

    Thanks a lot for your precious time on this post and really this is going to help many users. I just back from my sick days to work. So i will have to try you implementation in my actual concept. will post you once i do. please stay tune....

    Thinking of saying more than thanks, finding the words!!!

  • Hi Jeff,

    am back,

    i got one more tricky situation from my client. i frightened to hear about this concept from them. the concept will be i will have to take the product name from table 1 and search it with %product name% search condition. not whole word matching.

    In your example, after we split the comma separated values into temp table , fetch each item from table 1 and we have map it with %table1.productname% onto temp table. i am wondering about this ugly concept. because it will kill the time. do you have any suggestion on this concept. sample below,

    if the product name on table 1 is "milk" and on the temp table if we have "milk with fat","milk with out fat","milk with less fat" then we have to fetch these three product name. for this i hope we should have to use % table1.productname %.

    could you please

  • At this point, you're better off just creating a full text index on table2 and searching it. You can join it to table2 and perform LIKE '%a%' on the result set returned.

  • lnardozi 61862 (10/13/2013)


    At this point, you're better off just creating a full text index on table2 and searching it. You can join it to table2 and perform LIKE '%a%' on the result set returned.

    You could be right but the FTI wouldn't be on Table 2 according to the OPs latest post. It would only be needed on Table 1. Fortunately, I don't see any requirements to join Table 1 to Table 2 under such conditions.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • born2achieve (10/13/2013)


    Hi Jeff,

    am back,

    i got one more tricky situation from my client. i frightened to hear about this concept from them. the concept will be i will have to take the product name from table 1 and search it with %product name% search condition. not whole word matching.

    In your example, after we split the comma separated values into temp table , fetch each item from table 1 and we have map it with %table1.productname% onto temp table. i am wondering about this ugly concept. because it will kill the time. do you have any suggestion on this concept. sample below,

    if the product name on table 1 is "milk" and on the temp table if we have "milk with fat","milk with out fat","milk with less fat" then we have to fetch these three product name. for this i hope we should have to use % table1.productname %.

    could you please

    Would you also want to pick up "Milk Glass", "Milk Paint", "Milky Way Candy Bar", "Dried Milk", "Milky Ammonia", "Milk Stain Remover", "Chocolate Milk Syrup", etc? Even FTS (Full Text Search) is going to have a problem with leading wildcards because leading wildcards either aren't SARGable (can't use a seek) or you have to generate huge amounts of word parts to facilitate SARGable leading wildcard searches. And, are they going to want to be able to search for a mult-word entry regardless of the order of the search words entered?

    Fortunately, you only have a thousand or so products. It's not like you're trying to build a catalog for Grainger (which sells thousands and thousands of different parts and sizes of parts) or a document search like Google. Just do the normal double-ended wildcard search on Table 1 and see how it pans out. Even though it won't be SARGable, a nice narrow unique index that includes the product name and the product ID may help a bit here simply because the data is more narrow than the original Table 1 (assuming there are more than just the 3 columns you posted... I could be wrong). Maybe not.

    If they want something faster, then it's going to take a basic redesign of tables and data. For example, there should be a table of categories that should be searched for "Milk" related products and they should search for categories instead of individual products. That's likely the original intent of Table 2 but they fell way short on that. Such further design is also way beyond what should be solved in a forum like this one.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes Jeff,

    I totally agree your statement and thanks a lot for all your time on this thread. Really it helped me in many aspects and learning about different concepts.

    Once again many Thanks Gentle Man!!!

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply