|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Today @ 10:34 AM
Points: 494,
Visits: 2,158
|
|
Hi,
Consider the folllwing two records:
if object_id('test') > 0 drop table dbo.test
create table dbo.test ( master_id int, Title varchar(20), Forename varchar(50), Surname varchar(50), Address1 varchar(200), Address2 varchar(200), Address3 varchar(200), Address4 varchar(200), Town varchar(100), County varchar(100), Postcode varchar(20), Hierarchy int )
insert into dbo.test select 75984, 'Mr', 'O', 'SQLady', '6 Vivien Avenue Midsomer Norto', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2 union all select 76144, 'Mrs', 'N', 'Replacemento', '56 Vivien Avenue Midsomer Nort', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2
select * from dbo.test
As you can see, Address1 contains part of address 2 which is incorrect. I tried to use replace but this doesn't find the whole of address2 in adress1 so nothing changes. Any ideas how to remove the Midsomer Nort from address 1?
Thanks in advance.
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Today @ 10:34 AM
Points: 494,
Visits: 2,158
|
|
I've seperated the street number but if anyone can help me remove the redundant information from the address1 I'd be very grateful.
update a set a.streetno = F.StreetNo from dbo.test as a cross apply (select charindex(' ', Address1) AS CommaPos) F0 cross apply (select case when CommaPos = 0 then 'Bad Address' else SUBSTRING(Address1,1,CommaPos - 1) end as StreetNo) F
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
It's one of those things that sounds trivially easy until you start trying to do it.
You could set it to remove subsets, but then you end up with Address2 = "No 6", and you end up deleting "No" out of "1234 North Main St" in Address1, so you end up with "1234 rth Main St".
How important is this to the business you work for/with? In the US (that doesn't look like a US address, but I didn't check), you can buy software that will clean up addresses for you. It's called "CASSing", and the vendors have to be "PAVE Certified" by the USPS. Canada has a similar system available from various vendors. I believe so do most of the European countries. I haven't checked Latin American countries for that kind of thing (the address looks like it might be LATAM or Spain is why I mention those).
If address validation and clean-up matters, I'd look into getting something like that. You can Bing/Google the companies that provide it. If you go that route (assuming it's available for this address' local), it'll clean those kinds of things up for you, and help you identify the ones that it can't auto-clean. Very efficient.
Addresses, like human names, are a lot trickier to clean up than they look at first glance.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Today @ 10:34 AM
Points: 494,
Visits: 2,158
|
|
GSquared, everything you say is correct.
I'm in the UK and yes we do use software to clean addresses for some customer data but others don't actually want their addresses rebuilding so we're having to manually cleanse the address fields which is a nightmare to do.
My main project is the data merge but without correctly formatted address fields this process is proving very difficult. Another member Chris@Work/Chris@Home has given some great hints in another thread but I feel like I’m' been going round in circles this week. I can't even get my household merge to work!
The current scripts used, although not very efficient but they do the job. To get past this problem the previous developer basically did a comparison of the LEFT(Address1, 12).
I was hoping to be able to standardise my addresses somehow and use the fuzzy mating ratio suggested by Chris to work out my dupes.
It’s the end of my second week in this job and I’m already starting to panic!
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|
|
SSC-Addicted
      
Group: General Forum Members
Last Login: Today @ 10:34 AM
Points: 494,
Visits: 2,158
|
|
Thinking about this more, I wonder if the following will work.
Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.
Will need to test on a large address data set to see how it performs.
----------------------------------- http://www.SQL4n00bs.com
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
Abu Dina (8/3/2012) Thinking about this more, I wonder if the following will work.
Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.
Will need to test on a large address data set to see how it performs.
Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters.
You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
SSCarpal Tunnel
       
Group: General Forum Members
Last Login: Today @ 9:39 AM
Points: 4,247,
Visits: 9,500
|
|
GSquared (8/6/2012)
Abu Dina (8/3/2012) Thinking about this more, I wonder if the following will work.
Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.
Will need to test on a large address data set to see how it performs.
Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters. You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number.
You've been through this pain before, haven't you?
____________________________________________________________________________________________
Help us to help you. For better, quicker and more focused answers to your questions, consider following the advice in this link:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
If you are asking for help and your post does not contain a question, you should expect responses which do not contain any answers. Put a question mark in there somewhere - it's not rocket science.
|
|
|
|
|
SSCrazy
      
Group: General Forum Members
Last Login: Today @ 10:43 AM
Points: 2,945,
Visits: 10,517
|
|
How about "cleaning" all address into another table with a pointer back to the original "unclean" address.
You can than compare the clean addresses to identify duplicates.
|
|
|
|
|
SSCoach
         
Group: General Forum Members
Last Login: Tuesday, May 21, 2013 1:55 PM
Points: 15,442,
Visits: 9,571
|
|
Phil Parkin (8/6/2012)
GSquared (8/6/2012)
Abu Dina (8/3/2012) Thinking about this more, I wonder if the following will work.
Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.
Will need to test on a large address data set to see how it performs.
Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters. You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number. You've been through this pain before, haven't you? 
Was the DBA for a direct-mail marketing company for 7 years. If there's one type of data I'm overqualified to suffer from, it's name-and-address tables.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Wednesday, March 27, 2013 8:38 AM
Points: 32,
Visits: 29
|
|
Below works! If I understood correctly 
create table #test ( master_id int, Title varchar(20), Forename varchar(50), Surname varchar(50), Address1 varchar(200), Address2 varchar(200), Address3 varchar(200), Address4 varchar(200), Town varchar(100), County varchar(100), Postcode varchar(20), Hierarchy int )
insert into #test select 75984, 'Mr', 'O', 'SQLady', '6 Vivien Avenue Midsomer Norto', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2 union all select 76144, 'Mrs', 'N', 'Replacemento', '56 Vivien Avenue Midsomer Nort', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2
select * from #test
DECLARE @Index INT DECLARE @Add1 VARCHAR(MAX) DECLARE @Add2 VARCHAR(MAX) DECLARE Addres CURSOR FOR SELECT address1, address2 FROM #test OPEN addres FETCH NEXT FROM addres INTO @add1, @add2 WHILE @@FETCH_STATUS = 0 BEGIN SET @Index = 1 WHILE @Index <= LEN(@add2) BEGIN IF @Add2 = RIGHT(@add1, LEN(@Add2)) BEGIN UPDATE #test SET address1 = REPLACE(address1, @Add2, '') WHERE address1 = @add1 END SET @add2 = LEFT(@add2, LEN(@Add2)-1) SET @index = @Index +1 END FETCH NEXT FROM addres INTO @add1, @add2 END CLOSE addres DEALLOCATE addres SELECT * FROM #test DROP TABLE #test
|
|
|
|