SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Strip SAON from the beginning of an address


Strip SAON from the beginning of an address

Author
Message
FunkyDexter
FunkyDexter
SSCrazy
SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)

Group: General Forum Members
Points: 2017 Visits: 1101

For those who don't know, a PAON is the part of an address that identifies a building. A SOAN is the part that identifies a part of a property. ie, in "Flat 24 Loddon House", "Flat 24" would be the SAON, "Loddon House" would be the PAON. in "56 Rice Road", there is no SAON and the PAON would be "56".

I have a data source in which the SAON part has frequently being included in the PAON and I need to clean them out. It's not going to be an exact science and I'm just looking for a "best I can get" result.

I have a cursor based function that's getting me close enough to be happy but the dataset is quite large so I'm hoping for a more efficient approach (this runs nightly so performance isn't really crucial but I'd still rather get as efficient as possible).

I've attached a script which shows the cursor approach as well as a script to create and populate a small table of SAON parts.

Can anyone suggest a non cursor based approach for this?


Attachments
tbl_Matrix_SAONParts.txt (19 views, 454 bytes)
fn_RemoveSAONPrefix.txt (21 views, 1.00 KB)
Sergiy
Sergiy
SSC Guru
SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)

Group: General Forum Members
Points: 63401 Visits: 13076
Do you save extracted SAON and POAN parts o addresses?
FunkyDexter
FunkyDexter
SSCrazy
SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)

Group: General Forum Members
Points: 2017 Visits: 1101
Neither is actually saved but both parts are used in a subsequent lookup to a table of UPRNs (a UPRN is a unique code given to every property in the UK)

I'm really concerned about getting the PAON though.
Sergiy
Sergiy
SSC Guru
SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)

Group: General Forum Members
Points: 63401 Visits: 13076
Do you really expect PAON part of an address to be changed over time?
Because I cannot see any other reason for not saving PAON and SAON parts once they are found.

Actually, I cannot see any reason for saving the whole address as a single string.
You can always get it by concatenating SAON and PAON parts.
Not so easy for the opposite transformation, as you already know.

Ideally, different parts of addresses must be separated on entry form, but if it's not done then you better split the address string into parts on saving it in the database.
Newly recorded addresses may be bound for manual verification, and correction if needed.
Saved and verified split would overwrite whatever fuzzy and probably not perfect logic you put in your function.

Having PAON parts saved could also help identifying the correct way of splitting of new addresses.
FunkyDexter
FunkyDexter
SSCrazy
SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)

Group: General Forum Members
Points: 2017 Visits: 1101
In principle I agree with you 100% but this isn't something I have control over unfortunately. The data is coming from a third party source.

Some context that may or not be useful: I work on a program that aims to identify the most "at risk" families in the country so that social services can intervene early. Basically we take a load of data feeds from a load of sources (police, education, social services, NHS... about 40 different agencies in total), and try and identify the individuals across those sources - so lots of fuzzy matching around names, aliases, dates of birth, known addresses, and known ID codes (NI number, UPN, NHS number etc.) and so on.

The particular problem I'm having here is on trying to match addresses against the post office database to get UPRNs. The post office's DB is in a nice consistent state but the data I'm trying to match against it is... well... I get what I'm given and have to make the best of itDoze
Sergiy
Sergiy
SSC Guru
SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)SSC Guru (63K reputation)

Group: General Forum Members
Points: 63401 Visits: 13076
For an address "2/56 Rice Road" what would be the interpretation?

And what if it's "Apt.22 2/56 Rice Road" ?
FunkyDexter
FunkyDexter
SSCrazy
SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)

Group: General Forum Members
Points: 2017 Visits: 1101
As I said in my original post, it's not an exact science and I'm really just looking for a "good enough" solution. Because we don't control the incoming data we could get anything and nothing I write is ever going to accommodate every situation. Neither of those patterns occurs frequently enough in the data to be worth considering.

I'm not really concerned about that side of things though. As I said, my current approach is getting me close enough to be happy with the result. I'm only really concerned with performance and replacing the cursor.
John Mitchell-245523
John Mitchell-245523
SSC Guru
SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)

Group: General Forum Members
Points: 85772 Visits: 18135
If all you want to do is remove the cursor from your function, this should work. It assumes that (apart from the number) there is no more than one "Part" to be removed from the address. If you have a lot of addresses to check (or even if you only have a few) you'll want to make this into an inline table-valued function for optimum performance.

DECLARE @ReturnString nvarchar(max) = 'Flat 24 Loddon House';
SELECT @ReturnString = STUFF(LTRIM(@ReturnString),1,LEN(Candidate),'') -- Remove the part
FROM tbl_Matrix_SAONParts
WHERE @ReturnString LIKE Candidate + '%';
SET @ReturnString = SUBSTRING(@ReturnString,PATINDEX('%[A-Z]%', @ReturnString), LEN(@ReturnString)); -- Remove the number
SELECT @ReturnString;


John
FunkyDexter
FunkyDexter
SSCrazy
SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)SSCrazy (2K reputation)

Group: General Forum Members
Points: 2017 Visits: 1101
Yes, I think I can make the assumption that there will only be a single part to remove. So that should do what I'm looking for. I hadn't considered using the Where clause like that but it seems quite obvious in hindsight.

I only want to remove the numeric part if a SAON part has been removed, though, or else I risk removing road numbers. (Ie "Flat 12 Blodwin House" should become "Blodwin House" but I wouldn't want "22 Acacia Avenue" to become "Acacia Avenue"). I should be able to handle that by checking the row count before doing that step though.

I'm going to have a play with it to test it but it looks like exactly what I'm after. Thanks. Smile
John Mitchell-245523
John Mitchell-245523
SSC Guru
SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)

Group: General Forum Members
Points: 85772 Visits: 18135
Or just put it in the SELECT:

DECLARE @ReturnString nvarchar(max) = 'Flat 24 Loddon House';
SELECT @ReturnString = SUBSTRING(
STUFF(LTRIM(@ReturnString),1,LEN(Candidate),'') -- part
, PATINDEX('%[A-Z]%',@ReturnString),LEN(@ReturnString) -- number
)
FROM tbl_Matrix_SAONParts
WHERE @ReturnString LIKE Candidate + '%';
SELECT @ReturnString;


John
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search