Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12»»

Remove redundant data from address column Expand / Collapse
Author
Message
Posted Friday, August 03, 2012 8:09 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 2:57 AM
Points: 704, Visits: 3,270
Hi,

Consider the folllwing two records:



if object_id('test') > 0  drop table dbo.test

create table dbo.test (
master_id int,
Title varchar(20),
Forename varchar(50),
Surname varchar(50),
Address1 varchar(200),
Address2 varchar(200),
Address3 varchar(200),
Address4 varchar(200),
Town varchar(100),
County varchar(100),
Postcode varchar(20),
Hierarchy int
)

insert into dbo.test
select 75984, 'Mr', 'O', 'SQLady', '6 Vivien Avenue Midsomer Norto', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2 union all
select 76144, 'Mrs', 'N', 'Replacemento', '56 Vivien Avenue Midsomer Nort', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2

select * from dbo.test

As you can see, Address1 contains part of address 2 which is incorrect. I tried to use replace but this doesn't find the whole of address2 in adress1 so nothing changes. Any ideas how to remove the Midsomer Nort from address 1?

Thanks in advance.


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1339857
Posted Friday, August 03, 2012 8:15 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 2:57 AM
Points: 704, Visits: 3,270
I've seperated the street number but if anyone can help me remove the redundant information from the address1 I'd be very grateful.

update a
set a.streetno = F.StreetNo
from dbo.test as a
cross apply (select charindex(' ', Address1) AS CommaPos) F0
cross apply (select case when CommaPos = 0 then 'Bad Address' else SUBSTRING(Address1,1,CommaPos - 1) end as StreetNo) F



---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1339860
Posted Friday, August 03, 2012 8:17 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Yesterday @ 10:04 AM
Points: 15,442, Visits: 9,590
It's one of those things that sounds trivially easy until you start trying to do it.

You could set it to remove subsets, but then you end up with Address2 = "No 6", and you end up deleting "No" out of "1234 North Main St" in Address1, so you end up with "1234 rth Main St".

How important is this to the business you work for/with? In the US (that doesn't look like a US address, but I didn't check), you can buy software that will clean up addresses for you. It's called "CASSing", and the vendors have to be "PAVE Certified" by the USPS. Canada has a similar system available from various vendors. I believe so do most of the European countries. I haven't checked Latin American countries for that kind of thing (the address looks like it might be LATAM or Spain is why I mention those).

If address validation and clean-up matters, I'd look into getting something like that. You can Bing/Google the companies that provide it. If you go that route (assuming it's available for this address' local), it'll clean those kinds of things up for you, and help you identify the ones that it can't auto-clean. Very efficient.

Addresses, like human names, are a lot trickier to clean up than they look at first glance.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1339861
Posted Friday, August 03, 2012 8:26 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 2:57 AM
Points: 704, Visits: 3,270
GSquared, everything you say is correct.

I'm in the UK and yes we do use software to clean addresses for some customer data but others don't actually want their addresses rebuilding so we're having to manually cleanse the address fields which is a nightmare to do.

My main project is the data merge but without correctly formatted address fields this process is proving very difficult. Another member Chris@Work/Chris@Home has given some great hints in another thread but I feel like I’m' been going round in circles this week. I can't even get my household merge to work!

The current scripts used, although not very efficient but they do the job. To get past this problem the previous developer basically did a comparison of the LEFT(Address1, 12).

I was hoping to be able to standardise my addresses somehow and use the fuzzy mating ratio suggested by Chris to work out my dupes.

It’s the end of my second week in this job and I’m already starting to panic!


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1339873
Posted Friday, August 03, 2012 8:35 AM


Say Hey Kid

Say Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey KidSay Hey Kid

Group: General Forum Members
Last Login: Wednesday, April 09, 2014 2:57 AM
Points: 704, Visits: 3,270
Thinking about this more, I wonder if the following will work.

Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.

Will need to test on a large address data set to see how it performs.


---------------------------------------------------------


It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens

Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Post #1339880
Posted Monday, August 06, 2012 6:07 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Yesterday @ 10:04 AM
Points: 15,442, Visits: 9,590
Abu Dina (8/3/2012)
Thinking about this more, I wonder if the following will work.

Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.

Will need to test on a large address data set to see how it performs.


Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters.

You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1340536
Posted Monday, August 06, 2012 6:41 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 10:56 AM
Points: 4,832, Visits: 11,200
GSquared (8/6/2012)
Abu Dina (8/3/2012)
Thinking about this more, I wonder if the following will work.

Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.

Will need to test on a large address data set to see how it performs.


Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters.

You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number.


You've been through this pain before, haven't you?



Help us to help you. For better, quicker and more-focused answers to your questions, consider following the advice in this link.

When you ask a question (and please do ask a question: "My T-SQL does not work" just doesn't cut it), please provide enough information for us to understand its context.
Post #1340554
Posted Monday, August 06, 2012 7:40 AM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 4:16 PM
Points: 3,081, Visits: 11,235
How about "cleaning" all address into another table with a pointer back to the original "unclean" address.

You can than compare the clean addresses to identify duplicates.
Post #1340602
Posted Tuesday, August 07, 2012 6:22 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Yesterday @ 10:04 AM
Points: 15,442, Visits: 9,590
Phil Parkin (8/6/2012)
GSquared (8/6/2012)
Abu Dina (8/3/2012)
Thinking about this more, I wonder if the following will work.

Street names in the UK tend to end with words like Street, Avenue, Crescent etc, if I replace these with say an asterisk then I can remove anything right of the asterisk to get my street name.

Will need to test on a large address data set to see how it performs.


Be careful with that. Take into account that "Street" is often abbreviated "St", but so is "Saint". So, if you have "1234 St George Way", you'll end up losing part of the address that matters.

You might also want to make sure appartment/unit/suite numbers, and directionals don't get lost through something like "1234 1st Ave S, #6", where you'd lose the "South" directional, and the unit number.


You've been through this pain before, haven't you?


Was the DBA for a direct-mail marketing company for 7 years. If there's one type of data I'm overqualified to suffer from, it's name-and-address tables.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1341187
Posted Wednesday, August 08, 2012 3:07 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Friday, February 28, 2014 3:38 AM
Points: 32, Visits: 32
Below works! If I understood correctly

create table #test (
master_id int,
Title varchar(20),
Forename varchar(50),
Surname varchar(50),
Address1 varchar(200),
Address2 varchar(200),
Address3 varchar(200),
Address4 varchar(200),
Town varchar(100),
County varchar(100),
Postcode varchar(20),
Hierarchy int
)

insert into #test
select 75984, 'Mr', 'O', 'SQLady', '6 Vivien Avenue Midsomer Norto', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2 union all
select 76144, 'Mrs', 'N', 'Replacemento', '56 Vivien Avenue Midsomer Nort', 'Midsomer Norton', NULL, NULL, 'RADSTOCK', NULL, 'BA6 2VG' ,2

select * from #test

DECLARE @Index INT
DECLARE @Add1 VARCHAR(MAX)
DECLARE @Add2 VARCHAR(MAX)
DECLARE Addres CURSOR FOR SELECT address1, address2 FROM #test
OPEN addres
FETCH NEXT FROM addres INTO @add1, @add2
WHILE @@FETCH_STATUS = 0
BEGIN
SET @Index = 1
WHILE @Index <= LEN(@add2)
BEGIN
IF @Add2 = RIGHT(@add1, LEN(@Add2))
BEGIN
UPDATE #test
SET address1 = REPLACE(address1, @Add2, '')
WHERE address1 = @add1
END
SET @add2 = LEFT(@add2, LEN(@Add2)-1)
SET @index = @Index +1
END
FETCH NEXT FROM addres INTO @add1, @add2
END
CLOSE addres
DEALLOCATE addres
SELECT * FROM #test
DROP TABLE #test
Post #1341743
« Prev Topic | Next Topic »

Add to briefcase 12»»

Permissions Expand / Collapse