Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Need Help on Fastest Search Logic


Need Help on Fastest Search Logic

Author
Message
born2achieve
born2achieve
SSC Veteran
SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)

Group: General Forum Members
Points: 230 Visits: 474
Hi,

I have two tables named "Table1" and "Table2".

Table1 Details:

id bigint, product_name nvarchar(1000),quantity int

records count on Table1 : 25000( may increase in future)

Table2 Details:

id bigint, product_name nvarchar(max),details nvarchar(1000), description nvarchar(1000)

Note: Table2 productName column will have comma separated values
records count on Table2 : 186289( may increase in future)

sample data:


Table1:

1 canola 120
2 bread 130
3 sauce 140
4 corn 120

Table2:

1 canola,tea,muffin,cheese jellyproducts null
2 vinegar,canola,sunflower oliproducts null
3. cornil,vegoil,canola,sesameoil oilproducts null

my requirement is i will take each prodcutname from Table1 and will search it in Table2's prodcutname column. if matches found then get the row and insert into temp table.

i tried with while loop/cursor with "like" condition but it takes 15+ hours to process the records. I know searching in comma separate data with "like" will take time, but it take 15+ hours and still running.

if am not wrong that looping will be bad on this larger records. so i have tried with joins as well. but nothing seems reducing the processing time.




Try1 :

select T2.*
from dbo.Table1 t1
inner join dbo.Table2 t2 on t2.product_name like '%' + t1.product_name + '%'


Try2:

select T2.*
from dbo.Table2 T2
inner join dbo.Table1 T1 on CHARINDEX(T1.product_name, T2.product_name) > 0






Is there any fastest way to achieve this requirement? please help me on this and share with me sample code if there is any.
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45100 Visits: 39912
born2achieve (9/28/2013)
Hi,

I have two tables named "Table1" and "Table2".

Table1 Details:

id bigint, product_name nvarchar(1000),quantity int

records count on Table1 : 25000( may increase in future)

Table2 Details:

id bigint, product_name nvarchar(max),details nvarchar(1000), description nvarchar(1000)

Note: Table2 productName column will have comma separated values
records count on Table2 : 186289( may increase in future)

sample data:


Table1:

1 canola 120
2 bread 130
3 sauce 140
4 corn 120

Table2:

1 canola,tea,muffin,cheese jellyproducts null
2 vinegar,canola,sunflower oliproducts null
3. cornil,vegoil,canola,sesameoil oilproducts null

my requirement is i will take each prodcutname from Table1 and will search it in Table2's prodcutname column. if matches found then get the row and insert into temp table.

i tried with while loop/cursor with "like" condition but it takes 15+ hours to process the records. I know searching in comma separate data with "like" will take time, but it take 15+ hours and still running.

if am not wrong that looping will be bad on this larger records. so i have tried with joins as well. but nothing seems reducing the processing time.




Try1 :

select T2.*
from dbo.Table1 t1
inner join dbo.Table2 t2 on t2.product_name like '%' + t1.product_name + '%'


Try2:

select T2.*
from dbo.Table2 T2
inner join dbo.Table1 T1 on CHARINDEX(T1.product_name, T2.product_name) > 0






Is there any fastest way to achieve this requirement? please help me on this and share with me sample code if there is any.



Is there any duplication of single product names in the product_name column of Table 2?

Also, the absolute BEST way to do this would be to properly normalize Table 2. CSV columns are one of the worst things you can store in a permanent table in a database for all the reasons that you're now finding out.

Also, for future posts, it's best if you post readily consumable test data. Please see the article at the first link under "Helpful Links" in my signature line below.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
GilaMonster
GilaMonster
SSC-Forever
SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)SSC-Forever (47K reputation)

Group: General Forum Members
Points: 47254 Visits: 44380
Jeff Moden (9/29/2013)
Also, the absolute BEST way to do this would be to properly normalize Table 2. CSV columns are one of the worst things you can store in a permanent table in a database for all the reasons that you're now finding out.


This. A thousand times over, this. Unless completely impossible, redesign that table, it will make life soooo much easier.


Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass


born2achieve
born2achieve
SSC Veteran
SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)

Group: General Forum Members
Points: 230 Visits: 474
Thanks Jeff and Gila for the reply. I totally agree your answers. Appreciate gentle men.
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45100 Visits: 39912
born2achieve (9/29/2013)
Thanks Jeff and Gila for the reply. I totally agree your answers. Appreciate gentle men.


No problem although Gail is one fine lady with a manly avatar. :-P

Do you have an answer to my question about duplicate single products (see my previous post, please)?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
born2achieve
born2achieve
SSC Veteran
SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)

Group: General Forum Members
Points: 230 Visits: 474
Oh that's my bad and sorry Gila.

Thank you Jeff for the information. also answer for your question on Table2 we will have duplicate product name as it contains comma separated values column. but on table1 we will not have the duplication. but can have null.

The logic here is take each row value from table1 prodcutname column and map it with table2 productname column(which is comma separated) and get the matched record and make it available for further processing in temp table

any more suggestions....
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45100 Visits: 39912
born2achieve (9/30/2013)
Oh that's my bad and sorry Gila.

Thank you Jeff for the information. also answer for your question on Table2 we will have duplicate product name as it contains comma separated values column. but on table1 we will not have the duplication. but can have null.

The logic here is take each row value from table1 prodcutname column and map it with table2 productname column(which is comma separated) and get the matched record and make it available for further processing in temp table

any more suggestions....


I guess I need to ask the question a different way... can any individual product in the CSV column show up in more than one row? In other words, can an individual product name be assigned to more than one category?

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45100 Visits: 39912
To be crystal clear, would you ever have data like the following where a given single product appeared in the CSV of more than 1 product category? I ask because this WILL cause a partial Cartesian Product and THAT will determine the "best" way to handle the tables as they currently exist.


1 canola,tea,muffin,cheese jellyproducts null
2 vinegar,canola,sunflower oliproducts null
3 cornil,vegoil,canola,sesameoil oilproducts null
4 margarine,butter,muffin,bagel butterproducts null
5 muffin,bagel,wheatbread breadproducts null
6 tea,coffee,hotcocoa hotdrinkproducts null


--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
born2achieve
born2achieve
SSC Veteran
SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)SSC Veteran (230 reputation)

Group: General Forum Members
Points: 230 Visits: 474
Hi Jeff,

your sample data format is perfect. This is how my data will looks like. Do you have any solutions.....
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)SSC-Forever (45K reputation)

Group: General Forum Members
Points: 45100 Visits: 39912
born2achieve (9/30/2013)
Hi Jeff,

your sample data format is perfect. This is how my data will looks like. Do you have any solutions.....


Not sure yet. I know this is taking a while to wrangle out but getting it right is always important and, because of the huge amount of data the partial Cartesian Products generate, so is performance.

I know your requirement is to return each row in Table2 for every match in Table 1 and to put those matches in a Temp Table for "later processing" but perhaps the underlying problem to this all is, what will that later process actually do with that data? We might be able to streamline what we actually need to return if we knew just a bit more about that process.

I'm off to work. I probably won't be able to reply until I get home tonight.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search