May 15, 2010 at 4:26 pm
I'm trying to find values in our account table where the account names are similiar. I've already found the exact duplicates but now I'm trying to find like values i.e. Thompson Tools and Thompson Tools, Inc.
I've tried several different ways and nothing seems to work. Here are 2 ways I've tried and no success:
select a1.account, a2.account
from sysdba.ACCOUNT a1
INNER JOIN sysdba.ACCOUNT a2
on a1.accountid = a2.accountid
Where substring(a1.account,1,6) like substring(a2.account ,1,6)
Group by a1.account, a2.account
Having count(*) > 1
This is bringing back only the exact matches and not the like values.
I've also tried something like this:
select a1.account, a2.account
from sysdba.ACCOUNT a1
INNER JOIN sysdba.ACCOUNT a2
on a1.accountid = a2.accountid
Where a1.account like substring(a2.account ,1,6)
Group by a1.account, a2.account
This brings back all the same records just showing the substring value from both fields.
I've tried using a subselect but the same issue as above.
If anyone has any idea what I'm missing, I would appreciate the help.
May 15, 2010 at 4:57 pm
Maybe you can use SOUNDEX or DIFFERENCE to find similar rows. But I guess you'd manually need to verify the results, since there is no logic know of that can tell you whether Mayer's Inc. is the same company as Maier's Inc. or Meyer's.
I'm not sure how your table structure looks like but I hope you don't have only the name in there. Maybe you can use SOUNDEX together with the zip code...
May 15, 2010 at 7:57 pm
another thing I've done is use a Tally table to split the company name into single words....then try to match the single word against another copy of the table;
that way, one of the searches would compare 'Thompson' against all columns, which might help me with duplicates.
here's an example, you could use charindex instead of like as well to test for matches.
select
MyQuery.account,
a1.account
from sysdba.ACCOUNT a1
INNER JOIN (select myAlias.item as account
from sysdba.ACCOUNT
CROSS APPLY [fn_split](account,' ') myAlias --space as a delimiter
) MyQuery
on a1.accountid like '%' + MyQuery.account + '%'
WHERE myAlias.item NOT IN ('CO.','CORP','CORP.','INC')
GO
create function [dbo].[fn_split](
@STR varchar(8000),
@spliter char(1)
)
returns @returnTable table (idx int primary key identity, item varchar(8000))
as
begin
declare @spliterIndex int
SELECT @STR = @spliter + @STR + @spliter
INSERT @returnTable
SELECT SUBSTRING(@str,N+1,CHARINDEX(@spliter,@str,N+1)-N-1)
FROM dbo.Tally
WHERE N < LEN(@str)
AND SUBSTRING(@str,N,1) = @spliter
ORDER BY N
return
end
Lowell
May 16, 2010 at 8:14 am
Thanks! I'll give both of those a try.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy