SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Remove redundant data from address column


Remove redundant data from address column

Author
Message
Abu Dina
Abu Dina
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2837 Visits: 3325
Hi,

Consider the folllwing two records:



if object_id('test') > 0  drop table dbo.test

create table dbo.test (
master_id int,
Title varchar(20),
Forename varchar(50),
Surname varchar(50),
Address1 varchar(200),
Address2 varchar(200),
Address3 varchar(200),
Address4 varchar(200),
Town varchar(100),
County varchar(100),
Postcode varchar(20),
Hierarchy int
)

insert into dbo.test
select 75984, 'Mr', 'O', 'SQLady', '6 Vivien Avenue Midsomer Norto', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2 union all
select 76144, 'Mrs', 'N', 'Replacemento', '56 Vivien Avenue Midsomer Nort', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2

select * from dbo.test



As you can see, Address1 contains part of address 2 which is incorrect. I tried to use replace but this doesn't find the whole of address2 in adress1 so nothing changes. Any ideas how to remove the Midsomer Nort from address 1?

Thanks in advance.

---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Abu Dina
Abu Dina
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2837 Visits: 3325
I've seperated the street number but if anyone can help me remove the redundant information from the address1 I'd be very grateful.


update a
set a.streetno = F.StreetNo
from dbo.test as a
cross apply (select charindex(' ', Address1) AS CommaPos) F0
cross apply (select case when CommaPos = 0 then 'Bad Address' else SUBSTRING(Address1,1,CommaPos - 1) end as StreetNo) F



---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
GSquared
GSquared
SSC Guru
SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)

Group: General Forum Members
Points: 55047 Visits: 9730
It's one of those things that sounds trivially easy until you start trying to do it.

You could set it to remove subsets, but then you end up with Address2 = "No 6", and you end up deleting "No" out of "1234 North Main St" in Address1, so you end up with "1234 rth Main St".

How important is this to the business you work for/with? In the US (that doesn't look like a US address, but I didn't check), you can buy software that will clean up addresses for you. It's called "CASSing", and the vendors have to be "PAVE Certified" by the USPS. Canada has a similar system available from various vendors. I believe so do most of the European countries. I haven't checked Latin American countries for that kind of thing (the address looks like it might be LATAM or Spain is why I mention those).

If address validation and clean-up matters, I'd look into getting something like that. You can Bing/Google the companies that provide it. If you go that route (assuming it's available for this address' local), it'll clean those kinds of things up for you, and help you identify the ones that it can't auto-clean. Very efficient.

Addresses, like human names, are a lot trickier to clean up than they look at first glance.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Abu Dina
Abu Dina
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2837 Visits: 3325
GSquared, everything you say is correct.

I'm in the UK and yes we do use software to clean addresses for some customer data but others don't actually want their addresses rebuilding so we're having to manually cleanse the address fields which is a nightmare to do.

My main project is the data merge but without correctly formatted address fields this process is proving very difficult. Another member Chris@Work/Chris@Home has given some great hints in another thread but I feel like I’m' been going round in circles this week. I can't even get my household merge to work!

The current scripts used, although not very efficient but they do the job. To get past this problem the previous developer basically did a comparison of the LEFT(Address1, 12).

I was hoping to be able to standardise my addresses somehow and use the fuzzy mating ratio suggested by Chris to work out my dupes.

It’s the end of my second week in this job and I’m already starting to panic! Unsure

---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Abu Dina
Abu Dina
SSCrazy
SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)SSCrazy (2.8K reputation)

Group: General Forum Members
Points: 2837 Visits: 3325
Thinking about this more, I wonder if the following will work.

Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.

Will need to test on a large address data set to see how it performs.

---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
GSquared
GSquared
SSC Guru
SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)

Group: General Forum Members
Points: 55047 Visits: 9730
Abu Dina (8/3/2012)
Thinking about this more, I wonder if the following will work.

Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.

Will need to test on a large address data set to see how it performs.


Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters.

You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Phil Parkin
Phil Parkin
SSC-Forever
SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)SSC-Forever (49K reputation)

Group: General Forum Members
Points: 49095 Visits: 21133
GSquared (8/6/2012)
Abu Dina (8/3/2012)
Thinking about this more, I wonder if the following will work.

Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.

Will need to test on a large address data set to see how it performs.


Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters.

You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number.


You've been through this pain before, haven't you? :-)


Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

If the answer to your question can be found with a brief Google search, please perform the search yourself, rather than expecting one of the SSC members to do it for you.

Please surround any code or links you post with the appropriate IFCode formatting tags. It helps readability a lot.
Michael Valentine Jones
Michael Valentine Jones
SSChampion
SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)SSChampion (13K reputation)

Group: General Forum Members
Points: 13823 Visits: 11848
How about "cleaning" all address into another table with a pointer back to the original "unclean" address.

You can than compare the clean addresses to identify duplicates.
GSquared
GSquared
SSC Guru
SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)SSC Guru (55K reputation)

Group: General Forum Members
Points: 55047 Visits: 9730
Phil Parkin (8/6/2012)
GSquared (8/6/2012)
Abu Dina (8/3/2012)
Thinking about this more, I wonder if the following will work.

Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.

Will need to test on a large address data set to see how it performs.


Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters.

You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number.


You've been through this pain before, haven't you? :-)


Was the DBA for a direct-mail marketing company for 7 years. If there's one type of data I'm overqualified to suffer from, it's name-and-address tables.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
aqasim
aqasim
SSC Rookie
SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)

Group: General Forum Members
Points: 32 Visits: 36
Below works! If I understood correctly :-)

create table #test (
master_id int,
Title varchar(20),
Forename varchar(50),
Surname varchar(50),
Address1 varchar(200),
Address2 varchar(200),
Address3 varchar(200),
Address4 varchar(200),
Town varchar(100),
County varchar(100),
Postcode varchar(20),
Hierarchy int
)

insert into #test
select 75984, 'Mr', 'O', 'SQLady', '6 Vivien Avenue Midsomer Norto', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2 union all
select 76144, 'Mrs', 'N', 'Replacemento', '56 Vivien Avenue Midsomer Nort', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2

select * from #test

DECLARE @Index INT
DECLARE @Add1 VARCHAR(MAX)
DECLARE @Add2 VARCHAR(MAX)
DECLARE Addres CURSOR FOR SELECT address1, address2 FROM #test
OPEN addres
FETCH NEXT FROM addres INTO @add1, @add2
WHILE @@FETCH_STATUS = 0
BEGIN
SET @Index = 1
WHILE @Index <= LEN(@add2)
BEGIN
IF @Add2 = RIGHT(@add1, LEN(@Add2))
BEGIN
UPDATE #test
SET address1 = REPLACE(address1, @Add2, '')
WHERE address1 = @add1
END
SET @add2 = LEFT(@add2, LEN(@Add2)-1)
SET @index = @Index +1
END
FETCH NEXT FROM addres INTO @add1, @add2
END
CLOSE addres
DEALLOCATE addres
SELECT * FROM #test
DROP TABLE #test
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search