RE: pattern matching? – SQLServerCentral

SSC Guru

Points: 323504

December 21, 2012 at 5:54 am

wow, thats going to be tough;

the only thing i could think of was a combination of opc.three's example, and joining it against a list of common suffixes to find potential duplicates, but that of course is going an ongoing thing as you dig deeper into the data.

something like this is what i thought might be a starting point:

With MySampleData(CompanyName)

AS

(

SELECT 'Costco' UNION ALL

SELECT 'Costco LLC' UNION ALL

SELECT 'Costco Whls' UNION ALL

SELECT 'Home Interiors Malaga' UNION ALL

SELECT 'Home Plumbing' UNION ALL

SELECT 'Home Property Management' UNION ALL

SELECT 'Home Realty' UNION ALL

SELECT 'Home Svc'

),

CommonSuffixes (val)

AS

(

SELECT ' Inc' UNION ALL

SELECT ' LLC' UNION ALL

SELECT ' Company' UNION ALL

SELECT ' Co'

)

SELECT

ROW_NUMBER() OVER (PARTITION BY SOUNDEX(CompanyName) ORDER BY CompanyName) AS RW,

SOUNDEX(CompanyName) AS SoundX,

*

FROM MySampleData

LEFT OUTER JOIN CommonSuffixes

ON CHARINDEX(CommonSuffixes.val,MySampleData.CompanyName) > 0

--WHERE CommonSuffixes.val IS NOT NULL --(turns the LEFT OUTER into an inner join, i know)

ORDER BY CompanyName,RW

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!