Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

More efficient way to perform a lookup? Expand / Collapse
Author
Message
Posted Monday, July 15, 2013 8:17 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, July 17, 2014 7:27 AM
Points: 1, Visits: 102
I've spent a fair bit of time searching, but can't seem to find a succinct answer...is there a more efficient/faster way of performing the following query? I simply have a lookup table with 45k records and want to search a string field for the presence of any of these lookup values. Currently this query takes 45 min on 800k records. Would it help to index [Table1] on [Field1]? Would a subquery with patindex() be faster?

UPDATE [Table1]
SET [Field2] = 'Yes'
FROM [Table1] A
JOIN [Lookup] as B
ON (A.Field1 like '% ' + B.LookupValue + ' %')

Thanks!
Post #1473950
Posted Monday, July 15, 2013 9:30 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: 2 days ago @ 8:36 AM
Points: 4,319, Visits: 6,112
r_noob (7/15/2013)
I've spent a fair bit of time searching, but can't seem to find a succinct answer...is there a more efficient/faster way of performing the following query? I simply have a lookup table with 45k records and want to search a string field for the presence of any of these lookup values. Currently this query takes 45 min on 800k records. Would it help to index [Table1] on [Field1]? Would a subquery with patindex() be faster?

UPDATE [Table1]
SET [Field2] = 'Yes'
FROM [Table1] A
JOIN [Lookup] as B
ON (A.Field1 like '% ' + B.LookupValue + ' %')

Thanks!


I think this is going to be a stunningly bad performer regardless. But an index might help. What might be even faster, believe it or not, is to do an EXISTS query instead of a JOIN. Hell, even a cursor approach could be faster than the join, which could hit millions of times on a 45000X800000 join of that type.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1473961
Posted Monday, July 15, 2013 10:22 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 8:38 AM
Points: 36,724, Visits: 31,174
I agree with Kevin. A looped search would probably be faster than a Join because you could then add a WHERE Field2 <> 'Yes" to the code which would speed things up quite a bit especially if there were some intelligent indexing around that column (at least on a temporary basis).

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1473969
Posted Tuesday, July 16, 2013 10:00 AM
SSCrazy

SSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazySSCrazy

Group: General Forum Members
Last Login: Friday, July 4, 2014 3:55 AM
Points: 2,836, Visits: 5,062
You may want to try full text search here. Together with using WHERE EXISTS instead of JOIN, should be considerably faster than JOIN with non sargable LIKE. For some reason I think it will win over loop-based aproch too.


_____________________________________________
"The only true wisdom is in knowing you know nothing"
"O skol'ko nam otkrytiy chudnyh prevnosit microsofta duh!"
(So many miracle inventions provided by MS to us...)

How to post your question to get the best and quick help
Post #1474202
Posted Tuesday, July 16, 2013 10:48 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: 2 days ago @ 8:36 AM
Points: 4,319, Visits: 6,112
I am not sure full text indexing will help with the '%' + column + '%' join ...

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #1474226
Posted Tuesday, July 16, 2013 12:12 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Sunday, September 29, 2013 1:24 AM
Points: 429, Visits: 1,721
r_noob (7/15/2013)
I've spent a fair bit of time searching, but can't seem to find a succinct answer...is there a more efficient/faster way of performing the following query? I simply have a lookup table with 45k records and want to search a string field for the presence of any of these lookup values. Currently this query takes 45 min on 800k records. Would it help to index [Table1] on [Field1]? Would a subquery with patindex() be faster?

UPDATE [Table1]
SET [Field2] = 'Yes'
FROM [Table1] A
JOIN [Lookup] as B
ON (A.Field1 like '% ' + B.LookupValue + ' %')

Thanks!


Try this:


IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL
DROP TABLE #TempTable

CREATE TABLE #TempTable (
[ID] INT IDENTITY(1,1) NOT NULL,
[Field1] INT NULL,
[Field2] NVARCHAR(3) NULL,
PRIMARY KEY (ID))

INSERT INTO #TempTable
SELECT 22,'No' UNION ALL
SELECT 31,'No' UNION ALL
SELECT 42,'No' UNION ALL
SELECT 55,'No'

IF OBJECT_ID('tempdb..#LookupTable') IS NOT NULL
DROP TABLE #LookupTable

CREATE TABLE #LookupTable (
[ID] INT IDENTITY(1,1) NOT NULL,
[LookupValue] INT NULL,
[LookDesc] NVARCHAR(30) NULL,
PRIMARY KEY (ID))

INSERT INTO #LookupTable
SELECT 22,'Apple' UNION ALL
SELECT 33,'Orange' UNION ALL
SELECT 44,'Banana' UNION ALL
SELECT 55,'Pear'

UPDATE #TempTable
SET Field2 = 'Yes'
FROM
#TempTable AS A
INNER JOIN
#LookupTable AS B
ON A.ID = B.ID
WHERE
A.Field1 = B.LookupValue


SELECT * FROM #TempTable AS tt



 
Post #1474248
Posted Tuesday, July 16, 2013 2:11 PM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Today @ 12:38 AM
Points: 1,072, Visits: 6,338
r_noob (7/15/2013)


... ON (A.Field1 like '% ' + B.LookupValue + ' %')



That looks like words in a sentence. Try Jeff's splitter: split the sentence on space, remove any leading blanks and seek into your lookup table.

Like this:

	SELECT *
FROM #Table1 a
CROSS APPLY dbo.DelimitedSplit8K(a.Field1, ' ') split
WHERE EXISTS (SELECT 1 FROM #Lookup b
WHERE b.LookDesc = split.Item)

SELECT *
FROM #Table1 a
CROSS APPLY dbo.DelimitedSplit8K(a.Field1, ' ') split
INNER LOOP JOIN #Lookup b
ON b.LookDesc = split.Item

Prep:
IF OBJECT_ID('tempdb..#Table1') IS NOT NULL
DROP TABLE #Table1

CREATE TABLE #Table1 (
[ID] INT IDENTITY(1,1) NOT NULL,
[Field1] VARCHAR(100),
[Field2] VARCHAR(3) NULL)

INSERT INTO #Table1
SELECT 'The quick brown fox','No' UNION ALL
SELECT 'jumped over','No' UNION ALL
SELECT 'the','No' UNION ALL
SELECT 'lazy dog','No' UNION ALL
SELECT 'One day in the life of Ivan Denisovitch', 'No'

INSERT INTO #Table1 select a.Field1, 'No' AS Field2 from #Table1 a, sys.columns b

CREATE UNIQUE CLUSTERED INDEX ucx_ID ON #Table1 (ID)



IF OBJECT_ID('tempdb..#Lookup') IS NOT NULL
DROP TABLE #Lookup

CREATE TABLE #Lookup ([LookDesc] VARCHAR(30) NOT NULL)

INSERT INTO #Lookup
SELECT 'dog' UNION ALL
SELECT 'fox' UNION ALL
SELECT TOP 45000 x = RIGHT(NEWID(),5)+RIGHT(NEWID(),5) FROM SYSCOLUMNS A, SYSCOLUMNS B

CREATE UNIQUE CLUSTERED INDEX ucx_LookDesc ON #Lookup (LookDesc)




Low-hanging fruit picker and defender of the moggies





For better assistance in answering your questions, please read this.




Understanding and using APPLY, (I) and (II) Paul White

Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Post #1474283
Posted Thursday, August 1, 2013 4:02 AM


Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Tuesday, November 26, 2013 8:40 AM
Points: 316, Visits: 908
The performance of Chris' solution is fantastic.

If we use his setup script, we can use the following code to perform the update:

IF OBJECT_ID('tempdb..#t2') IS NOT NULL drop table #t2

SELECT DISTINCT ID
INTO #t2
FROM #Table1 a
CROSS APPLY dbo.DelimitedSplit8K(a.Field1, ' ') split
WHERE EXISTS (SELECT 1 FROM #Lookup b WHERE b.LookDesc = split.Item)

UPDATE #Table1
SET Field2 = 'Yes'
WHERE ID in (
select ID from #t2
)

This code executes in less than 100ms on my machine. I expect similar results for the real data. Not bad to go from 45 minutes to sub-second performance!

Well done!

/SG

Post #1479854
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse