Home Forums SQL Server 2008 SQL Server 2008 - General Looking for way to dynamically replace street suffix with abbreviation based on tables RE: Looking for way to dynamically replace street suffix with abbreviation based on tables

  • jeff.born (10/21/2016)


    Drew,

    I'm not sure how to write the ISNULL version.. This syntactically incorrect version I came up with is:

    SELECT a.Address_pk, a.[Address],

    STUFF(a.[Address], ss.search_idx, LEN(ss.search_val), ss.PostalServiceStandardSuffixAbbreviation) AS AddressAbbreviated

    FROM [Address] a

    OUTER APPLY (

    SELECT TOP (1) *, LEN(a.[Address]) - LEN(txt.search_val) - NULLIF(CHARINDEX(REVERSE(txt.search_val), REVERSE(a.[Address])), 0) + 2 AS search_idx

    FROM (

    SELECT (ISNULL((SELECT ss.PrimaryStreetSuffixName, ss.CommonlyUsedStreetSuffixOrAbbreviation, ss.PostalServiceStandardSuffixAbbreviation

    FROM StreetSuffix ss),

    SELECT a.[Address], a.[Address], a.[Address]))

    ) ss

    CROSS APPLY ( VALUES (ss.PrimaryStreetSuffixName), (ss.CommonlyUsedStreetSuffixOrAbbreviation) ) txt(search_val)

    ORDER BY search_idx DESC, LEN(txt.search_val) DESC

    ) ss

    ORDER BY a.Address_pk;

    doesn't compile.

    The Union ALL option scales well with 200,000+ addresses to parse, and with almost everything looking great a few addresses seem to cause troubles. I had to add some street suffixes to illistrate along with additional addresses. But the end result is Boulevard comes out BoulevaRD and Avenue East shows up as Avenue EaST (West has the same issue)

    The issue I haven't been able to reproduce has the output with DRIVe instead of DR, but all the examples I provided that didn't work in a bigger sample of suffixes and addresses work in the smaller sample.

    Before I drop in the next samples, I wanted to say thanks and I'll mark this the solution, since it will address 99% of the duplication issues and the rest we can deal with. It would be nice to see a solution that will work for everything thought, or 99.9%!

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (1, 'ANEX', 'ANX', 'ANX');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (2, 'COURT', 'COURT', 'CT');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (3, 'PLACE', 'PL', 'PL');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (4, 'AVENUE', 'AV', 'AVE');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (5, 'AVENUE', 'AVE', 'AVE');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (6, 'COURT', 'COURT', 'CT');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (7, 'COURT', 'CT', 'CT');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (8, 'STREET', 'STREET', 'ST');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (9, 'STREET', 'STRT', 'ST');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (10, 'TRACE', 'TRACE', 'TRCE');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (11, 'ROAD', 'RD', 'RD');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (12, 'ROAD', 'ROAD', 'RD');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (13, 'ROADS', 'ROADS', 'RDS');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (14, 'ROADS', 'RDS', 'RDS');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (15, 'STREET', 'ST', 'ST');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (16, 'SQUARE', 'SQ', 'SQ');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (17, 'SQUARE', 'SQR', 'SQ');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (18, 'SQUARE', 'SQRE', 'SQ');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (19, 'SQUARE', 'SQU', 'SQ');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (20, 'SQUARE', 'SQUARE', 'SQ');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (21, 'SQUARES', 'SQRS', 'SQS');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (22, 'SQUARES', 'SQUARES', 'SQS');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (23, 'DRIVE', 'DR', 'DR');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (24, 'DRIVE', 'DRIV', 'DR');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (25, 'DRIVE', 'DRIVE', 'DR');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (26, 'DRIVE', 'DRV', 'DR');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (27, 'DRIVES', 'DRIVES', 'DRS');

    INSERT INTO [dbo].[StreetSuffix] (StreetSuffix_pk, PrimaryStreetSuffixName, CommonlyUsedStreetSuffixOrAbbreviation, PostalServiceStandardSuffixAbbreviation) VALUES (28, 'BOULEVARD', 'BLVD', 'BLVD');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1756 2nd Avenue');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('220 East 41st Street');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('777 6th Avenue');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1466 2nd Avenue');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('557 10th AV');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('745 7th Avenue');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1200 India Street');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('125 3rd STRT');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1201 G Street NW');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1400 Court Place');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1200 India ST');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1200 India Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8326-8336 Halls Ferry Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8336 Halls Ferry Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8326-8336 Ferry Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8326-8336 Halls Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8336 Ferry Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('8336 Halls Road');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1 Bowdoin Square');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('48 Martin Luther King');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('2 Penn Plaza East');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('17 FDR Drive');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('130 Co-op City Boulevard');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1025 Richard Arrington');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1127 West Broadway');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1125-1127 229th Drive North');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('1 Gateway Center');

    INSERT INTO [dbo].[Address] ([Address]) VALUES ('100 Van Cortlandt Avenue West');

    Try the attached function (in the text file). It doesn't have the problems that you posted. Please read the comments though, particularly the one that says that the resulting addresses should only be used for comparison and deduplication purposes. And THAT'S the only intent here is to dedupe for your CASS certification runs. You need to send the original addresses (deduped) and NOT the result of this function.

    The function CAN be used in a computed column (I haven't tried it after the mods I just made but believe it will allow PERSISTED).

    Don't let the size of the content of the function scare you because it does contain the full monty of the USPS abbreviation table.

    It IS a bit slow. It takes about 1.25 seconds for every 10,000 addresses but it's also doing a hell of a lot of work to make deduping of addresses a whole lot more certain. Again, you should NOT send the output of this function for CASS certification because it has NO SPACES in it. It's just for deduping and simplifying lookups.

    {EDIT} Just looked at the posted runtimes for the small amount of data used so far. I might not have done so bad. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)