October 4, 2016 at 8:06 am
Hi Guys,
First post for me here, so i dont know if i am at the right place.
I have a street that i need to dissect a streetname in 3 columns
adres=molenweg 9a
streetname = molenweg
housenumber = 9
suffix = a
I have found the following on the internet
SELECT top 1 LEFT(Adres,PATINDEX('% [0-9]%',Adres)-1)'Street'
, SUBSTRING(Adres,PATINDEX('% [0-9]%',Adres)+1,PATINDEX('%[0-9],%',Adres+ ',')-PATINDEX('% [0-9]%',Adres))'House Number'
FROM t
it works when i dont have a suffix.
Can someone point me in the right direction ?
October 4, 2016 at 8:23 am
maikel01 (10/4/2016)
Hi Guys,First post for me here, so i dont know if i am at the right place.
I have a street that i need to dissect a streetname in 3 columns
adres=molenweg 9a
streetname = molenweg
housenumber = 9
suffix = a
I have found the following on the internet
SELECT top 1 LEFT(Adres,PATINDEX('% [0-9]%',Adres)-1)'Street'
, SUBSTRING(Adres,PATINDEX('% [0-9]%',Adres)+1,PATINDEX('%[0-9],%',Adres+ ',')-PATINDEX('% [0-9]%',Adres))'House Number'
FROM t
it works when i dont have a suffix.
Can someone point me in the right direction ?
Can you provide some specific parsing rules, please? Eg,
* Streetname = the first word after the equals sign
* House number = the penultimate character in the string
* Suffix = the final character in the string
Note: I do not think that these rules are sophisticated enough, though they work for your example. I presented them in order to make you think about other cases.
Once we have the detailed rules, we can get you up and running.
October 4, 2016 at 8:52 am
The following partial solution is dependant on PatExclude8K available here ... http://www.sqlservercentral.com/scripts/T-SQL/117890/
If your data is "adres=molenweg 9a" then you will need to strip out the "adres=" first.
DECLARE @SourceTable TABLE (
ID INT IDENTITY(1,1)
, adres VARCHAR(255)
);
INSERT INTO @SourceTable (adres)
VALUES (' molenweg 9a '), (' molenweg 9 ')
;WITH cteSplitOne AS (
SELECT ID, adres
-- Find the first number with a preceding space, and use that to extract the street name.
, StreetName = LTRIM(RTRIM(LEFT(adres,PATINDEX('% [0-9]%',adres)-1)))
-- The rest of the adress is the housenumber and suffix
, HouseAndSuffix = LTRIM(RTRIM(RIGHT(adres, LEN(adres) - PATINDEX('% [0-9]%',adres) +1)))
FROM @SourceTable
)
SELECT cte.ID, cte.adres
, cte.StreetName
, HouseNumber = num.NewString
, Suffix = suff.NewString
FROM cteSplitOne AS cte
CROSS APPLY dbo.PatExclude8K(HouseAndSuffix, '%[^0-9]%') AS num
CROSS APPLY dbo.PatExclude8K(HouseAndSuffix, '%[^A-Za-z]%') AS suff
October 4, 2016 at 9:06 am
Here's an option that doesn't need additional functions.
This will fail without warning if the street name has a number.
DECLARE @SourceTable TABLE (
ID INT IDENTITY(1,1)
, adres VARCHAR(255)
);
INSERT INTO @SourceTable (adres)
VALUES (' molenweg 9a '), (' molenweg 9 '), ('Two words 5423 X')
SELECT ID,
adres,
streetname,
LEFT( CompoNum, PATINDEX( '%[^0-9]%', CompoNum+'.')-1) AS housenumber,
STUFF( CompoNum, 1, PATINDEX( '%[^0-9]%', CompoNum+'.')-1, '') AS suffix
FROM @SourceTable
CROSS APPLY (SELECT LEFT( adres, PATINDEX( '%[0-9]%', adres)-1) AS streetname,
STUFF( adres, 1, PATINDEX( '%[0-9]%', adres)-1, '') CompoNum)x;
October 5, 2016 at 1:48 am
Hi Phil,
First of thanks for your reply i have read through the links u provided and it helped me understand what you mean.
The rules are the following:
* Streetname = the first word in the string there are a few rules it can have special chars in it e.g. '&', '.', ',','-','/','`'
* Housenumber = the first numeric character in the string it can look like '48' or '48-50'
* Suffix = the final character in the string
I would like to thank you already for giving me the pointer of giving more information.
October 5, 2016 at 1:51 am
Thanks for your reply DesNorton and Luis Cazares. It worked for a lot of addresses but i did not give not enough information how it could be build up that it does not work for all.
October 5, 2016 at 6:24 am
based on experience, you won't find a one-function-to-rule-them-all address cleanup. break it up into smaller pieces that complement each other.
there's too many exceptions to the patterns;
what I've done previously is create a separate table with "cleaned" addresses.
and changed the logic of the join criteria to use cleaned addresses if found, else original address.
with that in place, create a waterfall type of cleanup of addresses...clean records for like 70% with one method, then a second method works on what was left to clean up maybe 10%, with as many as ten methods to finally complete the "cleaned" addresses initiative until the diminishing returns made me spend time on other projects.
Lowell
October 5, 2016 at 7:47 am
Following the rules you posted, here's an alternative.
DECLARE @SourceTable TABLE (
ID INT IDENTITY(1,1)
, adres VARCHAR(255)
);
INSERT INTO @SourceTable (adres)
VALUES (' molenweg 9a '), (' molenweg 9 '), ('Two words 48-50')
SELECT ID,
adres,
LEFT( adres, numberstart - 1) AS streetname,
SUBSTRING( adres, numberstart, LEN(adres) - CASE WHEN RTRIM(adres) LIKE '%[^0-9]' THEN 1 ELSE 0 END) AS housenumber,
CASE WHEN RTRIM(adres) LIKE '%[^0-9]' THEN RIGHT(RTRIM(adres), 1) ELSE '' END AS suffix
FROM @SourceTable
CROSS APPLY (SELECT PATINDEX( '%[0-9]%', adres) AS numberstart)x;
October 6, 2016 at 1:42 am
I think you are right Lowell wil need to do it that way. Thanks luis this is gonna help me a lot.
Viewing 9 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply