Looking for way to dynamically replace street suffix with abbreviation based on tables

  • The Street Suffix list is from USPS, The address list I'm working with, well I'd rather keep the source of that to myself 🙂 Very interested in all options, because I enjoy learning!

    Jeff Moden (10/21/2016)


    @Jeff.Born,

    The list you have looks a bit familiar. Are you, by any chance, using the tables from the USPS? I ask because I went through this very same thing just a couple of months ago and may have a nasty fast solution but first want to compare my solution with some of the others posted here. I just need to know if you're using the USPS tables for this type of stuff to do my tests with because those are what I used.

  • Lowell,

    I'd also be interested in seeing your solution. My original thought was reduce the calls to a CASS company by not submitting obvious duplicates because every CASS request will cost me. I promise not to scream, just learn.

    Lowell (10/21/2016)


    i just recently did the opposite of what the OP is looking to do: i expanded all abbreviations to full words;

    mine was an investigation to eliminate variations in abbreviations and address formatting in order to find duplicates (because the powers that be nixed the "run it through a CASS company")

    i found the order of the REPLACE/STUFF commands was really important.

    for example I was cleaning up " 55 N St. Andrews St. blvd" to be "55 N Saint Andrews St. blvd" first, to finally get to "55 North Saint Andrews Street Boulevard"

    I ran my long logic through the Sakila database from MySQL(imported into SQL server), the Ashley Madison database, Adventureworks2014 and a real database of addresses from my company;

    all in all hundreds of thousands of addresses.

    I thought it might make for a good article; at least to make people either say they liked it scream "no, don't do it that way!"

  • jeff.born (10/21/2016)


    Drew,

    I'm not sure how to write the ISNULL version.. This syntactically incorrect version I came up with is:

    SELECT a.Address_pk, a.[Address],

    STUFF(a.[Address], ss.search_idx, LEN(ss.search_val), ss.PostalServiceStandardSuffixAbbreviation) AS AddressAbbreviated

    FROM [Address] a

    OUTER APPLY (

    SELECT TOP (1) *, LEN(a.[Address]) - LEN(txt.search_val) - NULLIF(CHARINDEX(REVERSE(txt.search_val), REVERSE(a.[Address])), 0) + 2 AS search_idx

    FROM (

    SELECT (ISNULL((SELECT ss.PrimaryStreetSuffixName, ss.CommonlyUsedStreetSuffixOrAbbreviation, ss.PostalServiceStandardSuffixAbbreviation

    FROM StreetSuffix ss),

    SELECT a.[Address], a.[Address], a.[Address]))

    ) ss

    CROSS APPLY ( VALUES (ss.PrimaryStreetSuffixName), (ss.CommonlyUsedStreetSuffixOrAbbreviation) ) txt(search_val)

    ORDER BY search_idx DESC, LEN(txt.search_val) DESC

    ) ss

    ORDER BY a.Address_pk;

    doesn't compile.

    The Union ALL option scales well with 200,000+ addresses to parse, and with almost everything looking great a few addresses seem to cause troubles. I had to add some street suffixes to illistrate along with additional addresses. But the end result is Boulevard comes out BoulevaRD and Avenue East shows up as Avenue EaST (West has the same issue)

    The issue I haven't been able to reproduce has the output with DRIVe instead of DR, but all the examples I provided that didn't work in a bigger sample of suffixes and addresses work in the smaller sample.

    Before I drop in the next samples, I wanted to say thanks and I'll mark this the solution, since it will address 99% of the duplication issues and the rest we can deal with. It would be nice to see a solution that will work for everything thought, or 99.9%!

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (1, 'ANEX', 'ANX', 'ANX');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (2, 'COURT', 'COURT', 'CT');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (3, 'PLACE', 'PL', 'PL');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (4, 'AVENUE', 'AV', 'AVE');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (5, 'AVENUE', 'AVE', 'AVE');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (6, 'COURT', 'COURT', 'CT');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (7, 'COURT', 'CT', 'CT');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (8, 'STREET', 'STREET', 'ST');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (9, 'STREET', 'STRT', 'ST');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (10, 'TRACE', 'TRACE', 'TRCE');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (11, 'ROAD', 'RD', 'RD');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (12, 'ROAD', 'ROAD', 'RD');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (13, 'ROADS', 'ROADS', 'RDS');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (14, 'ROADS', 'RDS', 'RDS');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (15, 'STREET', 'ST', 'ST');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (16, 'SQUARE', 'SQ', 'SQ');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (17, 'SQUARE', 'SQR', 'SQ');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (18, 'SQUARE', 'SQRE', 'SQ');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (19, 'SQUARE', 'SQU', 'SQ');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (20, 'SQUARE', 'SQUARE', 'SQ');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (21, 'SQUARES', 'SQRS', 'SQS');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (22, 'SQUARES', 'SQUARES', 'SQS');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (23, 'DRIVE', 'DR', 'DR');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (24, 'DRIVE', 'DRIV', 'DR');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (25, 'DRIVE', 'DRIVE', 'DR');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (26, 'DRIVE', 'DRV', 'DR');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (27, 'DRIVES', 'DRIVES', 'DRS');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (28, 'BOULEVARD', 'BLVD', 'BLVD');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1756 2nd Avenue');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('220 East 41st Street');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('777 6th Avenue');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1466 2nd Avenue');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('557 10th AV');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('745 7th Avenue');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1200 India Street');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('125 3rd STRT');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1201 G Street NW');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1400 Court Place');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1200 India ST');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1200 India Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8326-8336 Halls Ferry Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8336 Halls Ferry Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8326-8336 Ferry Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8326-8336 Halls Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8336 Ferry Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8336 Halls Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1 Bowdoin Square');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('48 Martin Luther King');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('2 Penn Plaza East');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('17 FDR Drive');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('130 Co-op City Boulevard');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1025 Richard Arrington');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1127 West Broadway');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1125-1127 229th Drive North');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1 Gateway Center');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('100 Van Cortlandt Avenue West');

    Try the attached function (in the text file). It doesn't have the problems that you posted. Please read the comments though, particularly the one that says that the resulting addresses should only be used for comparison and deduplication purposes. And THAT'S the only intent here is to dedupe for your CASS certification runs. You need to send the original addresses (deduped) and NOT the result of this function.

    The function CAN be used in a computed column (I haven't tried it after the mods I just made but believe it will allow PERSISTED).

    Don't let the size of the content of the function scare you because it does contain the full monty of the USPS abbreviation table.

    It IS a bit slow. It takes about 1.25 seconds for every 10,000 addresses but it's also doing a hell of a lot of work to make deduping of addresses a whole lot more certain. Again, you should NOT send the output of this function for CASS certification because it has NO SPACES in it. It's just for deduping and simplifying lookups.

    {EDIT} Just looked at the posted runtimes for the small amount of data used so far. I might not have done so bad. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jeff.born (10/21/2016)


    Yes there are, they are:

    N, S, E, W, NE, NW, SE, SW

    Sergiy (10/21/2016)


    Is there a fixed set of possible directionals?

    In this case you can dynamically remove them from Address strings and then match only the last word of remaining strings to the Street Suffixes:

    CREATE TABLE dbo.AddressDirectional (

    ID TINYINT IDENTITY (1,1) PRIMARY KEY NOT NULL,

    Code VARCHAR(5) COLLATE DATABASE_DEFAULT NOT NULL,

    UNIQUE (Code)

    )

    INSERT INTO dbo.AddressDirectional (Code)

    SELECT 'N'

    UNION

    SELECT 'S'

    UNION

    SELECT 'E'

    UNION

    SELECT 'W'

    UNION

    SELECT 'NE'

    UNION

    SELECT 'NW'

    UNION

    SELECT 'SE'

    UNION

    SELECT 'SW'

    SELECT [Address_pk], [Address],

    REPLACE([Address],ss.[PrimaryStreetSuffixName],ss.[PostalServiceStandardSuffixAbbreviation]) AS [AddressAbbreviated]

    FROM [dbo].[Address] a

    LEFT JOIN dbo.AddressDirectional d ON a.Address LIKE '% ' + d.Code

    INNER JOIN [dbo].[StreetSuffix] ss on RTRIM(ISNULL(LEFT(a.Address, LEN(a.Address)-LEN(' ' + d.Code)), a.[Address])) LIKE '% ' + ss.PrimaryStreetSuffixName

    UNION

    SELECT [Address_pk], [Address],

    REPLACE([Address],ss.[CommonlyUsedStreetSuffixOrAbbreviation], ss.[PostalServiceStandardSuffixAbbreviation]) AS [AddressAbbreviated]

    FROM [dbo].[Address] a

    LEFT JOIN dbo.AddressDirectional d ON a.Address LIKE '% ' + d.Code

    INNER JOIN [dbo].[StreetSuffix] ss on RTRIM(ISNULL(LEFT(a.Address, LEN(a.Address)-LEN(' ' + d.Code)), a.[Address])) LIKE '% ' + ss.CommonlyUsedStreetSuffixOrAbbreviation and not (a.[Address] LIKE '% ' + ss.PrimaryStreetSuffixName )

    UNION

    SELECT [Address_pk], [Address], 'N/A'

    FROM [dbo].[Address] a

    LEFT JOIN dbo.AddressDirectional d ON a.Address LIKE '% ' + d.Code

    WHERE NOT EXISTS (

    SELECT *

    FROM [dbo].[StreetSuffix] ss WHERE a.[Address] LIKE '% ' + ss.CommonlyUsedStreetSuffixOrAbbreviation

    OR (RTRIM(ISNULL(LEFT(a.Address, LEN(a.Address)-LEN(' ' + d.Code)), a.[Address])) LIKE '% ' + ss.PrimaryStreetSuffixName )

    )

    ORDER BY [Address_pk]

    _____________
    Code for TallyGenerator

  • Or, if you cannot rely on directionals being the last piece of Address strings, you may wish to use my original join, bit with an additional check that the Suffix you found is the last one in the string.

    In SQL term - there are no other words matching Suffixes after the one you've found:

    SELECT [Address_pk], [Address], --SUBSTRING(a.Address, CHARINDEX( ss.PrimaryStreetSuffixName,a.Address)+LEN(ss.PrimaryStreetSuffixName), LEN(a.Address)),

    REPLACE([Address],ss.[PrimaryStreetSuffixName],ss.[PostalServiceStandardSuffixAbbreviation]) AS [AddressAbbreviated]

    FROM [dbo].[Address] a

    INNER JOIN (SELECT DISTINCT PrimaryStreetSuffixName, [PostalServiceStandardSuffixAbbreviation] FROM [dbo].[StreetSuffix]) ss

    ON a.[Address] + ' ' LIKE '% ' + ss.PrimaryStreetSuffixName + ' %'

    WHERE NOT EXISTS (SELECT *

    FROM [dbo].[StreetSuffix] ss1

    WHERE (

    SUBSTRING(a.Address, CHARINDEX( ss.PrimaryStreetSuffixName,a.Address)+LEN(ss.PrimaryStreetSuffixName), LEN(a.Address)) LIKE '% '+ss1.PrimaryStreetSuffixName+'%'

    OR

    SUBSTRING(a.Address, CHARINDEX( ss.PrimaryStreetSuffixName,a.Address)+LEN(ss.PrimaryStreetSuffixName), LEN(a.Address)) LIKE '% '+ss1.CommonlyUsedStreetSuffixOrAbbreviation+'%'

    )

    )

    UNION

    SELECT [Address_pk], [Address], --SUBSTRING(a.Address, CHARINDEX( ss.CommonlyUsedStreetSuffixOrAbbreviation,a.Address)+LEN(ss.CommonlyUsedStreetSuffixOrAbbreviation), LEN(a.Address)),

    REPLACE([Address],ss.[CommonlyUsedStreetSuffixOrAbbreviation], ss.[PostalServiceStandardSuffixAbbreviation]) AS [AddressAbbreviated]

    FROM [dbo].[Address] a

    INNER JOIN [dbo].[StreetSuffix] ss on a.[Address] + ' ' LIKE '% ' + ss.CommonlyUsedStreetSuffixOrAbbreviation + ' %' AND not (a.[Address] + ' ' LIKE '% ' + ss.PrimaryStreetSuffixName + ' %')

    WHERE NOT EXISTS (SELECT *

    FROM [dbo].[StreetSuffix] ss1

    WHERE (

    SUBSTRING(a.Address, CHARINDEX( ss.CommonlyUsedStreetSuffixOrAbbreviation,a.Address)+LEN(ss.CommonlyUsedStreetSuffixOrAbbreviation), LEN(a.Address)) LIKE '% '+ss1.PrimaryStreetSuffixName+'%'

    OR

    SUBSTRING(a.Address, CHARINDEX( ss.CommonlyUsedStreetSuffixOrAbbreviation,a.Address)+LEN(ss.CommonlyUsedStreetSuffixOrAbbreviation), LEN(a.Address)) LIKE '% '+ss1.CommonlyUsedStreetSuffixOrAbbreviation+'%'

    )

    )

    UNION

    SELECT [Address_pk], [Address], 'N/A'

    FROM [dbo].[Address] a

    WHERE NOT EXISTS (

    SELECT *

    FROM [dbo].[StreetSuffix] ss WHERE (

    a.[Address] + ' ' LIKE '% ' + ss.CommonlyUsedStreetSuffixOrAbbreviation + ' %'

    OR a.[Address]+ ' ' LIKE '% ' + ss.PrimaryStreetSuffixName + ' %')

    )

    ORDER BY [Address_pk]

    Try to compare performance of it with "APPLY" version on your machine.

    On my laptop, on the small dataset, the old good INNER JOIN is on average twice as fast comparing to APPLY version.

    _____________
    Code for TallyGenerator

  • I know you are trying to replace full values with abbreviations and probably have good reasons to do this. However, if this was my system I would be looking to replace abbreviations with full values, particularly if it ends up as public facing data.

    Using abbreviations can look like you are penny pinching and willing to take short cuts in other areas. For me, showing a name in full shows respect, and helps give the impression you will show similar respect in other areas. It is nothing to do with the data, or the storage space it takes, it is an emotional or cultural thing.

    If your address data covers multiple countries, then you will hopefully have already found that abbreviations that are acceptable in one country are not correct in a different country. For example, in the UK although Rd, St, Av or Ave are recognised abbreviations for Road, Street, Avenue, they are rarely used in addressed mail. All the commercial address de-duplication systems I know of for UK addresses share the same concept of a correct address as one that does not have abbreviations.

    In most of Germany, the abbreviation Str is recognised and often used in preference to the full word Straße but both are acceptable, while in Switzerland the full word Strasse is preferred over Str. Germans prefer the spelling Straße to Strasse, while the Swiss people I know very much prefer Strasse over Straße.

    Standardising and de-duplicating addresses is important, but how you do it will without doubt impact the way your customers view your company.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

  • Maybe this helps? It may not look pretty, but it it worked for me given your requirements of removing the final suffix if it is directional and standardizing the street portion only if at the end (or second from the end if a direction at the end):

    /* ------------------------------------------------ */

    declare @addresses table(AddressStr varchar(100));

    declare @directions table (directionShortName varchar(5)

    );

    declare @suffixes table(

    suffixId int,

    suffixName varchar(50),

    suffixBadShortName varchar(10),

    suffixShortName varchar(5)

    );

    ---------------------------

    insert into @addresses

    Values

    ('48 Martin Luther King SW'),

    ('48 Avelon Ave'),

    ('48 Avelon Ave NE'),

    ('1756 2nd Avenue'),

    ('220 East 41st Street'),

    (' ?? Some wierd address somewhere, I dont ST know AVE N')

    ;

    insert into @directions

    select 'N'

    union

    select 'S'

    union

    select 'E'

    union

    select 'W'

    union

    select 'NE'

    union

    select 'NW'

    union

    select 'SE'

    union

    select 'SW'

    ;

    Insert into @suffixes

    Values

    (1,'ANEX', 'ANX', 'ANX'),

    (2, 'COURT', 'COURT', 'CT'),

    (3, 'PLACE', 'PL', 'PL'),

    (4, 'AVENUE', 'AV', 'AVE'),

    (5, 'AVENUE', 'AVE', 'AVE'),

    (29, 'COURT', 'COURT', 'CT'),

    (7, 'COURT', 'CT', 'CT'),

    (8, 'STREET', 'STREET', 'ST'),

    (9, 'STREET', 'STRT', 'ST'),

    (10, 'TRACE', 'TRACE', 'TRCE'),

    (11, 'ROAD', 'RD', 'RD'),

    (12, 'ROAD', 'ROAD', 'RD'),

    (13, 'ROADS', 'ROADS', 'RDS'),

    (14, 'ROADS', 'RDS', 'RDS'),

    (15, 'STREET', 'ST', 'ST'),

    (16, 'SQUARE', 'SQ', 'SQ'),

    (17, 'SQUARE', 'SQR', 'SQ'),

    (18, 'SQUARE', 'SQRE', 'SQ'),

    (19, 'SQUARE', 'SQU', 'SQ'),

    (20, 'SQUARE', 'SQUARE', 'SQ'),

    (21, 'SQUARES', 'SQRS', 'SQS'),

    (22, 'SQUARES', 'SQUARES', 'SQS'),

    (23, 'DRIVE', 'DR', 'DR'),

    (24, 'DRIVE', 'DRIV', 'DR'),

    (25, 'DRIVE', 'DRIVE', 'DR'),

    (26, 'DRIVE', 'DRV', 'DR'),

    (27, 'DRIVES', 'DRIVES', 'DRS'),

    (28, 'BOULEVARD', 'BLVD', 'BLVD')

    ;

    ---------------------------------------

    With scrub1 as(/* removes any directional suffix */

    Selecta.addressStr as Address_original,

    case whend.directionShortName is not null

    thenreverse(right(reverse(a.addressStr), len(a.addressStr)-charindex(' ',reverse(a.addressStr))))

    elsea.addressStr

    end as [Address_scrub1]

    --,reverse(left(reverse(a.addressStr), charindex(' ',reverse(a.addressStr))))

    from@addresses as a

    left join @directions as d

    ON ltrim(reverse(left(reverse(a.addressStr), charindex(' ',reverse(a.addressStr)))))

    =d.directionShortName

    )

    ,

    scrub2 as (/* attempts to standardize any remaining street suffix*/

    Selecta.address_original,

    a.address_scrub1,

    case whens.suffixBadShortName is not null

    thenreverse(right(reverse(a.address_scrub1), len(a.address_scrub1)-charindex(' ',reverse(a.address_scrub1))))+' '+s.suffixName

    Elsea.address_scrub1

    end as [Address_scrub2_final]

    --,s.suffixBadShortName

    fromscrub1 as a

    left join @suffixes as s

    ON ltrim(reverse(left(reverse(a.address_scrub1), charindex(' ',reverse(a.address_scrub1)))))

    =s.suffixBadShortName

    )

    Select*

    fromscrub2

    --------------------------------------------------

    ----------------------------------------------------

  • As for ongoing maintenance, there is a fancy cliche out there ... something like "machine learning." It means that new codes that are not familiar would get captured by the system. Of course in your case if you see a new street suffix like "pkwy", the machine can only flag those new ones for another process for human intervention (where a person would need to enter "Parkway" into the street suffixes table).

    BTW there is also a process that I used once long ago called NCOA in the US. It is the national change of address service. I dont recall all of the specifics. I remember however, if you pull the address and pass it them they can return the standardized version of the address or mark it as invalid (if not existing though real looking) . It may or may not be what you need but thought I'd mention just in case.

    ----------------------------------------------------

  • Texas A&M University offers A bulk cleansing service you could try.

    you upload a txt or csv and get back results;

    you have to register/create an account, but I believe it's free.

    https://geoservices.tamu.edu/Services/AddressNormalization/

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • And for any outside service that does cleansing, just be mindful about any protected information in your agreements from which you get these addresses. I see no issue with standardizing a known address (since these are public) , just would not want to also include the people who live there when exporting data to the service 😀

    ----------------------------------------------------

  • Just a reminder that the OP doesn't need to build a "correct" address formatter that follows CASS certification rules. He needed something that would standardize typical spellings of address parts just to do a simple dupe check before he send data to his authorized CASS Certified SAS. He has to send it to someone that carries such a certification to get breaks on postage.

    So, no need to get real fancy here. Replace the parts, single up spaces, drop special characters, and do a dupe check.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff i dont see the attachment about the function you are talking about. it would be really helpful if you could send me that. Thanks

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply