Need Help on Fastest Search Logic

Question

Need Help on Fastest Search Logic

born2achieve

Hall of Fame

Points: 3182
More actions
September 28, 2013 at 6:19 am

#395575

Hi,
I have two tables named "Table1" and "Table2".
Table1 Details:
id bigint, product_name nvarchar(1000),quantity int
records count on Table1 : 25000( may increase in future)
Table2 Details:
id bigint, product_name nvarchar(max),details nvarchar(1000), description nvarchar(1000)
Note: Table2 productName column will have comma separated values
records count on Table2 : 186289( may increase in future)
sample data:
Table1:
1 canola 120
2 bread 130
3 sauce 140
4 corn 120
Table2:
1 canola,tea,muffin,cheese jellyproducts null
2 vinegar,canola,sunflower oliproducts null
3. cornil,vegoil,canola,sesameoil oilproducts null
my requirement is i will take each prodcutname from Table1 and will search it in Table2's prodcutname column. if matches found then get the row and insert into temp table.
i tried with while loop/cursor with "like" condition but it takes 15+ hours to process the records. I know searching in comma separate data with "like" will take time, but it take 15+ hours and still running.
if am not wrong that looping will be bad on this larger records. so i have tried with joins as well. but nothing seems reducing the processing time.
Try1 :
select T2.*
from dbo.Table1 t1
inner join dbo.Table2 t2 on t2.product_name like '%' + t1.product_name + '%'
Try2:
select T2.*
from dbo.Table2 T2
inner join dbo.Table1 T1 on CHARINDEX(T1.product_name, T2.product_name) > 0
Is there any fastest way to achieve this requirement? please help me on this and share with me sample code if there is any.

Viewing 15 posts - 1 through 15 (of 32 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1003873 More actions · Answer 1

born2achieve (9/28/2013)
Hi,
I have two tables named "Table1" and "Table2".
Table1 Details:
id bigint, product_name nvarchar(1000),quantity int
records count on Table1 : 25000( may increase in future)
Table2 Details:
id bigint, product_name nvarchar(max),details nvarchar(1000), description nvarchar(1000)
Note: Table2 productName column will have comma separated values
records count on Table2 : 186289( may increase in future)
sample data:
Table1:
1 canola 120
2 bread 130
3 sauce 140
4 corn 120
Table2:
1 canola,tea,muffin,cheese jellyproducts null
2 vinegar,canola,sunflower oliproducts null
3. cornil,vegoil,canola,sesameoil oilproducts null
my requirement is i will take each prodcutname from Table1 and will search it in Table2's prodcutname column. if matches found then get the row and insert into temp table.
i tried with while loop/cursor with "like" condition but it takes 15+ hours to process the records. I know searching in comma separate data with "like" will take time, but it take 15+ hours and still running.
if am not wrong that looping will be bad on this larger records. so i have tried with joins as well. but nothing seems reducing the processing time.
Try1 :
select T2.*
from dbo.Table1 t1
inner join dbo.Table2 t2 on t2.product_name like '%' + t1.product_name + '%'
Try2:
select T2.*
from dbo.Table2 T2
inner join dbo.Table1 T1 on CHARINDEX(T1.product_name, T2.product_name) > 0
Is there any fastest way to achieve this requirement? please help me on this and share with me sample code if there is any.

Is there any duplication of single product names in the product_name column of Table 2?

Also, the absolute BEST way to do this would be to properly normalize Table 2. CSV columns are one of the worst things you can store in a permanent table in a database for all the reasons that you're now finding out.

Also, for future posts, it's best if you post readily consumable test data. Please see the article at the first link under "Helpful Links" in my signature line below.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

Jeff Moden (9/29/2013)
Also, the absolute BEST way to do this would be to properly normalize Table 2. CSV columns are one of the worst things you can store in a permanent table in a database for all the reasons that you're now finding out.

This. A thousand times over, this. Unless completely impossible, redesign that table, it will make life soooo much easier.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

born2achieve Hall of Fame Points: 3182 More actions · Answer 3

Thanks Jeff and Gila for the reply. I totally agree your answers. Appreciate gentle men.

Jeff Moden SSC Guru Points: 1003873 More actions · Answer 4

born2achieve (9/29/2013)
Thanks Jeff and Gila for the reply. I totally agree your answers. Appreciate gentle men.

No problem although Gail is one fine lady with a manly avatar. 😛

Do you have an answer to my question about duplicate single products (see my previous post, please)?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

born2achieve Hall of Fame Points: 3182 More actions · Answer 5

Oh that's my bad and sorry Gila.

Thank you Jeff for the information. also answer for your question on Table2 we will have duplicate product name as it contains comma separated values column. but on table1 we will not have the duplication. but can have null.

The logic here is take each row value from table1 prodcutname column and map it with table2 productname column(which is comma separated) and get the matched record and make it available for further processing in temp table

any more suggestions....

Jeff Moden SSC Guru Points: 1003873 More actions · Answer 6

born2achieve (9/30/2013)
Oh that's my bad and sorry Gila.
Thank you Jeff for the information. also answer for your question on Table2 we will have duplicate product name as it contains comma separated values column. but on table1 we will not have the duplication. but can have null.
The logic here is take each row value from table1 prodcutname column and map it with table2 productname column(which is comma separated) and get the matched record and make it available for further processing in temp table
any more suggestions....

I guess I need to ask the question a different way... can any individual product in the CSV column show up in more than one row? In other words, can an individual product name be assigned to more than one category?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1003873 More actions · Answer 7

To be crystal clear, would you ever have data like the following where a given single product appeared in the CSV of more than 1 product category? I ask because this WILL cause a partial Cartesian Product and THAT will determine the "best" way to handle the tables as they currently exist.

1 canola,tea,muffin,cheese jellyproducts null

2 vinegar,canola,sunflower oliproducts null

3 cornil,vegoil,canola,sesameoil oilproducts null

4 margarine,butter,muffin,bagel butterproducts null

5 muffin,bagel,wheatbread breadproducts null

6 tea,coffee,hotcocoa hotdrinkproducts null

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

born2achieve Hall of Fame Points: 3182 More actions · Answer 8

Hi Jeff,

your sample data format is perfect. This is how my data will looks like. Do you have any solutions.....

Jeff Moden SSC Guru Points: 1003873 More actions · Answer 9

born2achieve (9/30/2013)
Hi Jeff,
your sample data format is perfect. This is how my data will looks like. Do you have any solutions.....

Not sure yet. I know this is taking a while to wrangle out but getting it right is always important and, because of the huge amount of data the partial Cartesian Products generate, so is performance.

I know your requirement is to return each row in Table2 for every match in Table 1 and to put those matches in a Temp Table for "later processing" but perhaps the underlying problem to this all is, what will that later process actually do with that data? We might be able to streamline what we actually need to return if we knew just a bit more about that process.

I'm off to work. I probably won't be able to reply until I get home tonight.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

born2achieve Hall of Fame Points: 3182 More actions · Answer 10

Hi Jeff,

Thanks a lot for your time on this and from temp table i am taking data and doing some other process. that will not be a deal. the actual issues is what you are looking on. if Cartesian product helps that would be great.

My eyes are rolling over to see your reply always.

Many thanks

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 11

Cartesian products aren't things that help 😉 This is a cartesian product:

SELECT <stuff> FROM LargeTable1 CROSS JOIN LargeTable2

You get a huge resultset with a number of row equal to the product of the row counts in each table (hence cartesian product). It's a great way to kill a server.

Is a table redesign not an option? It really, really would make stuff like this far easier.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

born2achieve Hall of Fame Points: 3182 More actions · Answer 12

Hi Gila,

Thanks for your time,

The problem now is that production rolled out. so at this point of time the table design cannot be changed. looking for some temp solution.

Any hope.....

J Livingston SQL SSC Guru Points: 51550 More actions · Answer 13

you appear to want to match "corn" with "cornil"...is this correct?

some set up data to play with,,,,,

CREATE TABLE [dbo].[Table1] (

[id] [bigint] NULL,

[product_name] [nvarchar](1000) NULL,

[quantity] [int] NULL

) ON [PRIMARY]

CREATE TABLE [dbo].[Table2] (

[id] [bigint] NULL,

[product_name] [nvarchar](1000) NULL,

[details] [nvarchar](1000) NULL,

[description] [nvarchar](1000) NULL

) ON [PRIMARY]

INSERT INTO [Table1]([id],[product_name],[quantity])

VALUES(1,'canola',120)

INSERT INTO [Table1]([id],[product_name],[quantity])

VALUES(2,'bread',130)

INSERT INTO [Table1]([id],[product_name],[quantity])

VALUES(3,'sauce',140)

INSERT INTO [Table1]([id],[product_name],[quantity])

VALUES(4,'corn',120)

INSERT INTO [Table2]([id],[product_name],[details],[description])

VALUES(1,'canola,tea,muffin,cheese ','jellyproducts',NULL)

INSERT INTO [Table2]([id],[product_name],[details],[description])

VALUES(2,'vinegar,canola,sunflower ','oilproducts',NULL)

INSERT INTO [Table2]([id],[product_name],[details],[description])

VALUES(3,'cornil,vegoil,canola,sesameoil ','oilproducts',NULL)

GO

/*Try1 : */

SELECT t2.id,

t2.product_name,

t2.details,

t1.id AS T1_row

FROM Table1 AS t1

INNER JOIN Table2 AS t2 ON t2.product_name LIKE '%' + t1.product_name + '%'

/*Try2: */

SELECT T2.id,

T2.product_name,

T2.details,

T1.id AS T1_row

FROM Table2 AS T2

INNER JOIN Table1 AS T1 ON CHARINDEX(T1.product_name, T2.product_name) > 0

________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day

born2achieve Hall of Fame Points: 3182 More actions · Answer 14

Hi Livingston,

thanks for your time on this and i have tried whatever you have shown couple of days before, the problem is time consuming and how to fasten the search process.