December 15, 2004 at 2:43 pm
I have a table called tblTeamName it consists out of 3 columns:
In tblTeamName there can be no duplicate records.
My problem/opurtunity is this:
A team name can be spelled in many different ways, sometimes up to 30 ways. If I’m looking up a teamID I just take the input string like “FC Barcelona” and try to match it with a simple “where teamID = ….” Statement.
If the team-name isn’t there I just add the new version of that team-name manually.
I read a lot about searching techniques on the forum, my opinion is that the “double-metaphone” function from Lawrence Philips is the best recommended technique.
How can I make my TeamID-lookup search as efficient (and fast!?!?) as possible?
Should I start with a hard “where teamID = ….” Statement.
And then a “where TeamName Like ‘%.....%’ statement
and or at last the double-metaphone function?!?!
I have all these question unaswered this is getting me stuck in the developing process like:
- But how should I handle results if it returns more then 1 different teamID?!?
- Should I have a TeamNameMetaphone column?!?
All the tips, example code and hints are welcome!
December 16, 2004 at 3:23 am
This answer will mean a significant amount of work, but will produce the best results in the long term:
Clean your data before you let it into your database
December 16, 2004 at 8:06 am
I don't think that the data isn't clean... maybe I don't get thepoint correct but the TeamName's are all valid in the way that "AFC Ajax" is just as correct as "Ajax Amsterdam".
But if I'm missing your point please let me know, I guess you mean that before I compare any TeamName I have to know the Team already?
I would just like to know what is the best way to know that:
"Ajax Amst." isn't in the table yet and that it probably is a (name)version of ClubID 2
Select * from tblTeam
where TeamId = 2
returns:
TeamID TeamName
----------- ---------------
2 Ajax Amsterdam
Select * from tblTeamName
where TeamId = 2
reurns:
TeamId TeamName CompetitionID
----------- ------------- ------------
2 A.Ajax 5
2 Ajax 5
2 Ajax Amsterdam 5
2 Ajax Amsterdam (NED) 5
December 16, 2004 at 9:04 am
Well, I'd say that a lot depends on how you use the table, and who is (or will be) searching it. Is it a part of some utility that will be used by webpage visitors? Or is it something to be used primarily by administrator and other people, who know at least search basics? Then you can decide, how much of the variations you have to code to be performed automatically, and how much you can rely on the user to enter the data correctly.
BTW, the table tblTeamName would be probably better called tblTeamAliases, because IMHO that's what it is. I think that AKM's remark about clearing the data first was entirely correct - unless you really have a reason to store various aliases of the name, which seems to be what you are doing with this table . Unfortunately the name of the table does not really reflect this intention and can be a bit confusing.
I'm afraid I'm not able to answer your questions (especially regarding the metaphone), but obviously when using such system of aliases, you will be receiving several team ID's quite often - e.g. for Rangers, Real, and many others. Probably you should display all returned names and allow the user to select the correct one.
HTH, Vladan
December 16, 2004 at 9:08 am
Oh, one more thing... are you sure you need to store the aliases physically in a table? Maybe it could be accomplished without it, just implementing various ways of searching in the original name? What is the main reason for such table?
Vladan
Viewing 5 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply