Need Help on Fastest Search Logic

  • Hi,

    I have two tables named "Table1" and "Table2".

    Table1 Details:

    id bigint, product_name nvarchar(1000),quantity int

    records count on Table1 : 25000( may increase in future)

    Table2 Details:

    id bigint, product_name nvarchar(max),details nvarchar(1000), description nvarchar(1000)

    Note: Table2 productName column will have comma separated values

    records count on Table2 : 186289( may increase in future)

    sample data:

    Table1:

    1 canola 120

    2 bread 130

    3 sauce 140

    4 corn 120

    Table2:

    1 canola,tea,muffin,cheese jellyproducts null

    2 vinegar,canola,sunflower oliproducts null

    3. cornil,vegoil,canola,sesameoil oilproducts null

    my requirement is i will take each prodcutname from Table1 and will search it in Table2's prodcutname column. if matches found then get the row and insert into temp table.

    i tried with while loop/cursor with "like" condition but it takes 15+ hours to process the records. I know searching in comma separate data with "like" will take time, but it take 15+ hours and still running.

    if am not wrong that looping will be bad on this larger records. so i have tried with joins as well. but nothing seems reducing the processing time.

    Try1 :

    select T2.*

    from dbo.Table1 t1

    inner join dbo.Table2 t2 on t2.product_name like '%' + t1.product_name + '%'

    Try2:

    select T2.*

    from dbo.Table2 T2

    inner join dbo.Table1 T1 on CHARINDEX(T1.product_name, T2.product_name) > 0

    Is there any fastest way to achieve this requirement? please help me on this and share with me sample code if there is any.

  • born2achieve (9/28/2013)


    Hi,

    I have two tables named "Table1" and "Table2".

    Table1 Details:

    id bigint, product_name nvarchar(1000),quantity int

    records count on Table1 : 25000( may increase in future)

    Table2 Details:

    id bigint, product_name nvarchar(max),details nvarchar(1000), description nvarchar(1000)

    Note: Table2 productName column will have comma separated values

    records count on Table2 : 186289( may increase in future)

    sample data:

    Table1:

    1 canola 120

    2 bread 130

    3 sauce 140

    4 corn 120

    Table2:

    1 canola,tea,muffin,cheese jellyproducts null

    2 vinegar,canola,sunflower oliproducts null

    3. cornil,vegoil,canola,sesameoil oilproducts null

    my requirement is i will take each prodcutname from Table1 and will search it in Table2's prodcutname column. if matches found then get the row and insert into temp table.

    i tried with while loop/cursor with "like" condition but it takes 15+ hours to process the records. I know searching in comma separate data with "like" will take time, but it take 15+ hours and still running.

    if am not wrong that looping will be bad on this larger records. so i have tried with joins as well. but nothing seems reducing the processing time.

    Try1 :

    select T2.*

    from dbo.Table1 t1

    inner join dbo.Table2 t2 on t2.product_name like '%' + t1.product_name + '%'

    Try2:

    select T2.*

    from dbo.Table2 T2

    inner join dbo.Table1 T1 on CHARINDEX(T1.product_name, T2.product_name) > 0

    Is there any fastest way to achieve this requirement? please help me on this and share with me sample code if there is any.

    Is there any duplication of single product names in the product_name column of Table 2?

    Also, the absolute BEST way to do this would be to properly normalize Table 2. CSV columns are one of the worst things you can store in a permanent table in a database for all the reasons that you're now finding out.

    Also, for future posts, it's best if you post readily consumable test data. Please see the article at the first link under "Helpful Links" in my signature line below.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (9/29/2013)


    Also, the absolute BEST way to do this would be to properly normalize Table 2. CSV columns are one of the worst things you can store in a permanent table in a database for all the reasons that you're now finding out.

    This. A thousand times over, this. Unless completely impossible, redesign that table, it will make life soooo much easier.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thanks Jeff and Gila for the reply. I totally agree your answers. Appreciate gentle men.

  • born2achieve (9/29/2013)


    Thanks Jeff and Gila for the reply. I totally agree your answers. Appreciate gentle men.

    No problem although Gail is one fine lady with a manly avatar. 😛

    Do you have an answer to my question about duplicate single products (see my previous post, please)?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Oh that's my bad and sorry Gila.

    Thank you Jeff for the information. also answer for your question on Table2 we will have duplicate product name as it contains comma separated values column. but on table1 we will not have the duplication. but can have null.

    The logic here is take each row value from table1 prodcutname column and map it with table2 productname column(which is comma separated) and get the matched record and make it available for further processing in temp table

    any more suggestions....

  • born2achieve (9/30/2013)


    Oh that's my bad and sorry Gila.

    Thank you Jeff for the information. also answer for your question on Table2 we will have duplicate product name as it contains comma separated values column. but on table1 we will not have the duplication. but can have null.

    The logic here is take each row value from table1 prodcutname column and map it with table2 productname column(which is comma separated) and get the matched record and make it available for further processing in temp table

    any more suggestions....

    I guess I need to ask the question a different way... can any individual product in the CSV column show up in more than one row? In other words, can an individual product name be assigned to more than one category?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • To be crystal clear, would you ever have data like the following where a given single product appeared in the CSV of more than 1 product category? I ask because this WILL cause a partial Cartesian Product and THAT will determine the "best" way to handle the tables as they currently exist.

    1 canola,tea,muffin,cheese jellyproducts null

    2 vinegar,canola,sunflower oliproducts null

    3 cornil,vegoil,canola,sesameoil oilproducts null

    4 margarine,butter,muffin,bagel butterproducts null

    5 muffin,bagel,wheatbread breadproducts null

    6 tea,coffee,hotcocoa hotdrinkproducts null

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    your sample data format is perfect. This is how my data will looks like. Do you have any solutions.....

  • born2achieve (9/30/2013)


    Hi Jeff,

    your sample data format is perfect. This is how my data will looks like. Do you have any solutions.....

    Not sure yet. I know this is taking a while to wrangle out but getting it right is always important and, because of the huge amount of data the partial Cartesian Products generate, so is performance.

    I know your requirement is to return each row in Table2 for every match in Table 1 and to put those matches in a Temp Table for "later processing" but perhaps the underlying problem to this all is, what will that later process actually do with that data? We might be able to streamline what we actually need to return if we knew just a bit more about that process.

    I'm off to work. I probably won't be able to reply until I get home tonight.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi Jeff,

    Thanks a lot for your time on this and from temp table i am taking data and doing some other process. that will not be a deal. the actual issues is what you are looking on. if Cartesian product helps that would be great.

    My eyes are rolling over to see your reply always.

    Many thanks

  • Cartesian products aren't things that help 😉 This is a cartesian product:

    SELECT <stuff> FROM LargeTable1 CROSS JOIN LargeTable2

    You get a huge resultset with a number of row equal to the product of the row counts in each table (hence cartesian product). It's a great way to kill a server.

    Is a table redesign not an option? It really, really would make stuff like this far easier.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi Gila,

    Thanks for your time,

    The problem now is that production rolled out. so at this point of time the table design cannot be changed. looking for some temp solution.

    Any hope.....

  • you appear to want to match "corn" with "cornil"...is this correct?

    some set up data to play with,,,,,

    CREATE TABLE [dbo].[Table1] (

    [id] [bigint] NULL,

    [product_name] [nvarchar](1000) NULL,

    [quantity] [int] NULL

    ) ON [PRIMARY]

    CREATE TABLE [dbo].[Table2] (

    [id] [bigint] NULL,

    [product_name] [nvarchar](1000) NULL,

    [details] [nvarchar](1000) NULL,

    [description] [nvarchar](1000) NULL

    ) ON [PRIMARY]

    INSERT INTO [Table1]([id],[product_name],[quantity])

    VALUES(1,'canola',120)

    INSERT INTO [Table1]([id],[product_name],[quantity])

    VALUES(2,'bread',130)

    INSERT INTO [Table1]([id],[product_name],[quantity])

    VALUES(3,'sauce',140)

    INSERT INTO [Table1]([id],[product_name],[quantity])

    VALUES(4,'corn',120)

    INSERT INTO [Table2]([id],[product_name],[details],[description])

    VALUES(1,'canola,tea,muffin,cheese ','jellyproducts',NULL)

    INSERT INTO [Table2]([id],[product_name],[details],[description])

    VALUES(2,'vinegar,canola,sunflower ','oilproducts',NULL)

    INSERT INTO [Table2]([id],[product_name],[details],[description])

    VALUES(3,'cornil,vegoil,canola,sesameoil ','oilproducts',NULL)

    GO

    /*Try1 : */

    SELECT t2.id,

    t2.product_name,

    t2.details,

    t1.id AS T1_row

    FROM Table1 AS t1

    INNER JOIN Table2 AS t2 ON t2.product_name LIKE '%' + t1.product_name + '%'

    /*Try2: */

    SELECT T2.id,

    T2.product_name,

    T2.details,

    T1.id AS T1_row

    FROM Table2 AS T2

    INNER JOIN Table1 AS T1 ON CHARINDEX(T1.product_name, T2.product_name) > 0

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • Hi Livingston,

    thanks for your time on this and i have tried whatever you have shown couple of days before, the problem is time consuming and how to fasten the search process.

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply