Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 1234»»»

Need Help on Fastest Search Logic Expand / Collapse
Author
Message
Posted Saturday, September 28, 2013 6:19 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, April 9, 2014 11:31 AM
Points: 175, Visits: 351
Hi,

I have two tables named "Table1" and "Table2".

Table1 Details:

id bigint, product_name nvarchar(1000),quantity int

records count on Table1 : 25000( may increase in future)

Table2 Details:

id bigint, product_name nvarchar(max),details nvarchar(1000), description nvarchar(1000)

Note: Table2 productName column will have comma separated values
records count on Table2 : 186289( may increase in future)

sample data:


Table1:

1 canola 120
2 bread 130
3 sauce 140
4 corn 120

Table2:

1 canola,tea,muffin,cheese jellyproducts null
2 vinegar,canola,sunflower oliproducts null
3. cornil,vegoil,canola,sesameoil oilproducts null

my requirement is i will take each prodcutname from Table1 and will search it in Table2's prodcutname column. if matches found then get the row and insert into temp table.

i tried with while loop/cursor with "like" condition but it takes 15+ hours to process the records. I know searching in comma separate data with "like" will take time, but it take 15+ hours and still running.

if am not wrong that looping will be bad on this larger records. so i have tried with joins as well. but nothing seems reducing the processing time.



Try1 :

select T2.*
from dbo.Table1 t1
inner join dbo.Table2 t2 on t2.product_name like '%' + t1.product_name + '%'


Try2:

select T2.*
from dbo.Table2 T2
inner join dbo.Table1 T1 on CHARINDEX(T1.product_name, T2.product_name) > 0




Is there any fastest way to achieve this requirement? please help me on this and share with me sample code if there is any.

Post #1499679
Posted Sunday, September 29, 2013 5:06 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:09 PM
Points: 36,765, Visits: 31,221
born2achieve (9/28/2013)
Hi,

I have two tables named "Table1" and "Table2".

Table1 Details:

id bigint, product_name nvarchar(1000),quantity int

records count on Table1 : 25000( may increase in future)

Table2 Details:

id bigint, product_name nvarchar(max),details nvarchar(1000), description nvarchar(1000)

Note: Table2 productName column will have comma separated values
records count on Table2 : 186289( may increase in future)

sample data:


Table1:

1 canola 120
2 bread 130
3 sauce 140
4 corn 120

Table2:

1 canola,tea,muffin,cheese jellyproducts null
2 vinegar,canola,sunflower oliproducts null
3. cornil,vegoil,canola,sesameoil oilproducts null

my requirement is i will take each prodcutname from Table1 and will search it in Table2's prodcutname column. if matches found then get the row and insert into temp table.

i tried with while loop/cursor with "like" condition but it takes 15+ hours to process the records. I know searching in comma separate data with "like" will take time, but it take 15+ hours and still running.

if am not wrong that looping will be bad on this larger records. so i have tried with joins as well. but nothing seems reducing the processing time.



Try1 :

select T2.*
from dbo.Table1 t1
inner join dbo.Table2 t2 on t2.product_name like '%' + t1.product_name + '%'


Try2:

select T2.*
from dbo.Table2 T2
inner join dbo.Table1 T1 on CHARINDEX(T1.product_name, T2.product_name) > 0




Is there any fastest way to achieve this requirement? please help me on this and share with me sample code if there is any.



Is there any duplication of single product names in the product_name column of Table 2?

Also, the absolute BEST way to do this would be to properly normalize Table 2. CSV columns are one of the worst things you can store in a permanent table in a database for all the reasons that you're now finding out.

Also, for future posts, it's best if you post readily consumable test data. Please see the article at the first link under "Helpful Links" in my signature line below.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1499795
Posted Sunday, September 29, 2013 5:09 PM


SSC-Forever

SSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-ForeverSSC-Forever

Group: General Forum Members
Last Login: Today @ 3:46 PM
Points: 42,462, Visits: 35,525
Jeff Moden (9/29/2013)
Also, the absolute BEST way to do this would be to properly normalize Table 2. CSV columns are one of the worst things you can store in a permanent table in a database for all the reasons that you're now finding out.


This. A thousand times over, this. Unless completely impossible, redesign that table, it will make life soooo much easier.



Gail Shaw
Microsoft Certified Master: SQL Server 2008, MVP
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Post #1499796
Posted Sunday, September 29, 2013 8:05 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, April 9, 2014 11:31 AM
Points: 175, Visits: 351
Thanks Jeff and Gila for the reply. I totally agree your answers. Appreciate gentle men.
Post #1499820
Posted Sunday, September 29, 2013 8:23 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:09 PM
Points: 36,765, Visits: 31,221
born2achieve (9/29/2013)
Thanks Jeff and Gila for the reply. I totally agree your answers. Appreciate gentle men.


No problem although Gail is one fine lady with a manly avatar.

Do you have an answer to my question about duplicate single products (see my previous post, please)?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1499822
Posted Monday, September 30, 2013 5:01 AM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, April 9, 2014 11:31 AM
Points: 175, Visits: 351
Oh that's my bad and sorry Gila.

Thank you Jeff for the information. also answer for your question on Table2 we will have duplicate product name as it contains comma separated values column. but on table1 we will not have the duplication. but can have null.

The logic here is take each row value from table1 prodcutname column and map it with table2 productname column(which is comma separated) and get the matched record and make it available for further processing in temp table

any more suggestions....
Post #1499925
Posted Monday, September 30, 2013 11:42 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:09 PM
Points: 36,765, Visits: 31,221
born2achieve (9/30/2013)
Oh that's my bad and sorry Gila.

Thank you Jeff for the information. also answer for your question on Table2 we will have duplicate product name as it contains comma separated values column. but on table1 we will not have the duplication. but can have null.

The logic here is take each row value from table1 prodcutname column and map it with table2 productname column(which is comma separated) and get the matched record and make it available for further processing in temp table

any more suggestions....


I guess I need to ask the question a different way... can any individual product in the CSV column show up in more than one row? In other words, can an individual product name be assigned to more than one category?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1500135
Posted Monday, September 30, 2013 6:00 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:09 PM
Points: 36,765, Visits: 31,221
To be crystal clear, would you ever have data like the following where a given single product appeared in the CSV of more than 1 product category? I ask because this WILL cause a partial Cartesian Product and THAT will determine the "best" way to handle the tables as they currently exist.

1   canola,tea,muffin,cheese        jellyproducts    null
2 vinegar,canola,sunflower oliproducts null
3 cornil,vegoil,canola,sesameoil oilproducts null
4 margarine,butter,muffin,bagel butterproducts null
5 muffin,bagel,wheatbread breadproducts null
6 tea,coffee,hotcocoa hotdrinkproducts null


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1500230
Posted Monday, September 30, 2013 6:21 PM
SSC-Enthusiastic

SSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-EnthusiasticSSC-Enthusiastic

Group: General Forum Members
Last Login: Wednesday, April 9, 2014 11:31 AM
Points: 175, Visits: 351
Hi Jeff,

your sample data format is perfect. This is how my data will looks like. Do you have any solutions.....
Post #1500232
Posted Tuesday, October 1, 2013 7:22 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 7:09 PM
Points: 36,765, Visits: 31,221
born2achieve (9/30/2013)
Hi Jeff,

your sample data format is perfect. This is how my data will looks like. Do you have any solutions.....


Not sure yet. I know this is taking a while to wrangle out but getting it right is always important and, because of the huge amount of data the partial Cartesian Products generate, so is performance.

I know your requirement is to return each row in Table2 for every match in Table 1 and to put those matches in a Temp Table for "later processing" but perhaps the underlying problem to this all is, what will that later process actually do with that data? We might be able to streamline what we actually need to return if we knew just a bit more about that process.

I'm off to work. I probably won't be able to reply until I get home tonight.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1500396
« Prev Topic | Next Topic »

Add to briefcase 1234»»»

Permissions Expand / Collapse