January 11, 2012 at 3:27 pm
I have a question about what are the alternatives for LIKE '%%' and Full-Text Search when it comes to searching organisation names, firstnames, surnames, etc.
The LIKE '%%' causes table scan, however returns the exact results for names containing . The FTS keeps returning results that are "sort of same". FTS is usually offered as an alternative, but I don't understand why. Isn't this meant for unstructured data search? In my opinion it's no alternative for structured data like names.
I've also tried implementations of SOUNDEX, Levenshtein, etc., etc. But nothing gives me the desired performance.
I'm really curious how the experts handle this!
Thank you in advance for your answer.
Best regards,
Peter
January 11, 2012 at 3:33 pm
Hi.
Could you please post and example that shows how're you using your FullText code?
[EDIT] AFAIK you'd be fine using CONTAINS or CONTAINSTABLE.
Best regards,
Best regards,
Andre Guerreiro Neto
Database Analyst
http://www.softplan.com.br
MCITPx1/MCTSx2/MCSE/MCSA
January 12, 2012 at 1:42 am
Hi Andre,
This is the code producing the results I want:
SELECT
*
FROM
[dbo].[companies] a
WHERE
a.[organisationname] LIKE '%FEK%'
60 rows affected.
I've tried to find a FTS alternative:
SELECT
*
FROM
[dbo].[companies] a
WHERE
CONTAINS(a.[organisationname], '"*FEK*"')
11 rows affected (the * wildcard before the term FEK seams to be ignored).
SELECT
*
FROM
[dbo].[companies] a
WHERE
CONTAINS(a.[organisationname], 'FORMSOF(INFLECTIONAL, "FEK")')
2 rows affected (same result as doing a "non-inflectional search on FEK")
I can't find the syntax that gives me (at least almost) similar results to the LIKE.
Hope you can help me out.
Best regards,
Peter
January 12, 2012 at 9:10 am
I know virtually nothing about FTS, so I can't comment on how to get that to work the way you want.
I do know a few things about T-SQL though, and I haven't found a good answer as to how to implement LIKE '%data%' that performs well on a large dataset because, as you mention, that will always do a scan. I usually recommend FTS for these scenarios as the best option, always stating that I'm not a FTS person.
I'm sure that someone much more intelligent and experienced than I am has come up with some kind of magic for doing this well, but I haven't found it anywhere yet.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
January 12, 2012 at 9:27 am
Hello Jack,
Thank you for your response.
As you are a T-SQL person, I was thinking about the following approach I would like to discuss. As the problem lies in % prior to the search term, than this is the target of elimination.
What if I would create a simple lookup table with a term and key. For every name I create all the terms the name can begin with when using a % and add the key as reference to the specific record. So for the name of John Smith with key 1 the search terms will be:
John Smith
ohn Smith
hn Smith
n Smith
Smith
mith
ith
th
h
I eliminated spaces as trimming the search term makes spaces useless.
In this case the search term '%ohn smit%' on the original table becomes a 'ohn smit%' on the lookup table with a join on the original table.
Of course the table needs to be updated when modifying a name, however keeping it up2date using a table valued function will be quite easy.
What is your opinion on this solution?
Best regards,
Peter
January 12, 2012 at 9:31 am
Unfortunately the use of wild cards before and after a string using like will force a table scan which can be problematic.
What I have done in the past is to set up a full text index using the field which we are looking for, such as telephone number.
In the where clause I have used:
WHEREcid.id_code = 'PS'
ANDcid.Linked_ID = 1
AND (
home_phone LIKE '%' + @phonenum + '%'
ORcell_phone LIKE '%' + @phonenum + '%'
ORbusiness_phone LIKE '%' + @phonenum + '%'
)
In this case the full text index is made up of the 3 telephone fields. I have then changed teh where clause to be:
WHEREcid.id_code = 'PS'
ANDcid.Linked_ID = 1
AND FREETEXT(ci.*, @phonenum)
Another option would have been
AND CONTAINS((ci.home_phone,ci.cell_phone,ci.business_phone), @phonenum)
Performance is pretty consistent, with over 1000 hits per hr the time went from several seconds to just about 2 against a table with 2.8M rows. (The CONTAINS statement did not perform as well as the FREETEXT, but it is an option.)
Steve Jimmo
Sr DBA
“If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan
January 12, 2012 at 9:39 am
Hi Steve,
Thank you for your solution. However I can't seem to get this working in the same way as LIKE '%<term>%' !
When I use the FREETEXT or CONTAINS only 2 rows are returned (not using *). Only the rows where the <term> forms a whole word, but not the words where <term> forms the start (this can be done using * of course), part or end.
Are the results for your phonenumber similar to the LIKE search? Does it really return start, part and end numbers?
What is the contents of the @phonenum variable?
Best regards,
Peter
January 12, 2012 at 9:47 am
poostwoud (1/12/2012)
Hello Jack,Thank you for your response.
As you are a T-SQL person, I was thinking about the following approach I would like to discuss. As the problem lies in % prior to the search term, than this is the target of elimination.
What if I would create a simple lookup table with a term and key. For every name I create all the terms the name can begin with when using a % and add the key as reference to the specific record. So for the name of John Smith with key 1 the search terms will be:
John Smith
ohn Smith
hn Smith
n Smith
Smith
mith
ith
th
h
I eliminated spaces as trimming the search term makes spaces useless.
In this case the search term '%ohn smit%' on the original table becomes a 'ohn smit%' on the lookup table with a join on the original table.
Of course the table needs to be updated when modifying a name, however keeping it up2date using a table valued function will be quite easy.
What is your opinion on this solution?
Best regards,
Peter
I had thought of the same thing, but in reality isn't that an awful lot like what FTS does, and then you have to maintain it?
It would probably perform better for queries, but how would it affect overall server performance? You'd have the overhead of maintaining the table of partial names for every name row. I'm not sure it is worth it.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
January 12, 2012 at 9:58 am
Peter
What is the contents of the @phonenum variable?
The variable contains a telephone number. In this case, it could be a cell, home or work phone. The reason for the like statement is that a person has entered the telephone number. The unknown is whether or not the area code has been used or whether the D/E person has actually put the phone number in the correct place, ie home phone in the homephone field, etc.
We use the same type of system in our price checker system, where the customer can start typing the first 3 characters of the item and the search will start returning data. The more characters typed, the more refined the select.
This actually mimics the idea you outlined above, but more efficiently.
Steve Jimmo
Sr DBA
“If we ever forget that we are One Nation Under God, then we will be a Nation gone under." - Ronald Reagan
January 12, 2012 at 11:18 am
sjimmo (1/12/2012)
Peter
What is the contents of the @phonenum variable?
The variable contains a telephone number. In this case, it could be a cell, home or work phone. The reason for the like statement is that a person has entered the telephone number. The unknown is whether or not the area code has been used or whether the D/E person has actually put the phone number in the correct place, ie home phone in the homephone field, etc.
We use the same type of system in our price checker system, where the customer can start typing the first 3 characters of the item and the search will start returning data. The more characters typed, the more refined the select.
This actually mimics the idea you outlined above, but more efficiently.
There's an article here on ssc about something very similar. A calculated column using REVERSE(phonenum), indexed, using LIKE with REVERSE(@phonenum)+'%'.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 12, 2012 at 12:12 pm
poostwoud (1/11/2012)
I have a question about what are the alternatives for LIKE '%%' and Full-Text Search when it comes to searching organisation names, firstnames, surnames, etc.The LIKE '%%' causes table scan, however returns the exact results for names containing . The FTS keeps returning results that are "sort of same". FTS is usually offered as an alternative, but I don't understand why. Isn't this meant for unstructured data search? In my opinion it's no alternative for structured data like names.
I've also tried implementations of SOUNDEX, Levenshtein, etc., etc. But nothing gives me the desired performance.
I'm really curious how the experts handle this!
Peter,
There is no general solution to the %match% problem for all requirements, but there are specific solutions to specific problems. To take a somewhat related example, consider the task of populating a drop-down list of search suggestions as the user begins typing into a field on a website (think Google Instant Search). Generally this is done by pre-processing the data source and applying heuristics to map possible user keystroke sequences to useful suggestions. Clearly it is not practical to store all possible keystrokes and all possible matches, and it wouldn't be useful to the end-user anyway. So a trade-off is made between latency, accuracy, resource usage and probably many other factors. My point is that there is probably a specific solution to your specific problem, but it would help a great deal if you could go into a little more detail. Who is the end-user? Do they require live results? Is this for a specific SQL solution or an application? Questions like that.
Viewing 11 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply