Help Needed in set based Query

  • Hi,

    Version : sqlserver 2008 R2

    sample data to play with,

    declare @Address table(Id int primary key identity(1,1),Address varchar(100),city varchar(20),State varchar(20))

    insert into @Address

    select '123 sheril street, Newyork,NY','Newyork','NY' union all

    select '333 bamboo street, Manhattan,NY', null,null union all

    select '123 ketty street, Albany,NY','Albany','NY' union all

    select '123 sibul street, Buffalo,NY',null,null

    This is just sample data. i have around 100000 address for different city and states.Basically i need to remove the city and state from address

    Here is waht i need to achieve,

    1. Get the city,state check exists on address column if found then remove the city and state. if not leave it as it.

    2. If the city and state column is null and if that is present in address column then remove that from address clumn and upate it in city and state column

    respectively.

    Is this possible to achieve without cursor or looping? any sample query please .

  • /* remove City and State from the Address */

    SELECT Address

    , LEFT(Address,CHARINDEX(',',Address,1)-1) AS StreetAddr

    , RIGHT(Address,2) AS AddrState

    FROM Address1;

  • Simple solution using the DelimitedSplit8K [/url]function

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    declare @Address table(Id int primary key identity(1,1),Address varchar(100),city varchar(20),State varchar(20))

    insert into @Address

    select '123 sheril street, Newyork,NY','Newyork','NY' union all

    select '333 bamboo street, Manhattan,NY', null,null union all

    select '123 ketty street, Albany,NY','Albany','NY' union all

    select '123 sibul street, Buffalo,NY',null,null ;

    /* Split and Cross-tab */

    SELECT

    A.Id

    ,MAX(CASE WHEN X.ItemNumber = 1 THEN LTRIM(X.Item) END) AS Address

    ,MAX(CASE WHEN X.ItemNumber = 2 THEN LTRIM(X.Item)

    WHEN A.City IS NOT NULL THEN A.City END) AS City

    ,MAX(CASE WHEN X.ItemNumber = 3 THEN LTRIM(X.Item)

    WHEN A.State IS NOT NULL THEN A.State END) AS State

    FROM @Address A

    OUTER APPLY dbo.DelimitedSplit8K(A.Address,',') AS X

    GROUP BY A.Id;

    Results

    Id Address City State

    ---- ------------------- ------------ ------

    1 123 sheril street Newyork NY

    2 333 bamboo street Manhattan NY

    3 123 ketty street Albany NY

    4 123 sibul street Buffalo NY

  • Hey Guys,

    thanks for your reply.

    I Eirik,

    before i though of posting on this forum, i thought about using Jeff's delimit function what you have suggested. But i am sorry that i forgot to mention about some of the address messed up in my system like, it will not have proper comma separated address.

    Ex : larser&tubro street hudson TX

    Ex : peppy and tugo road pitsburgh PA

    Not sure how do i achieve this without looping it. I thought of creating temp table will have city and state as separate column and using that i need to match with my actual table to do the operation. is it possible to do without cursor? any suggestion or sample please

  • KGJ-Dev (5/1/2015)


    Hey Guys,

    thanks for your reply.

    I Eirik,

    before i though of posting on this forum, i thought about using Jeff's delimit function what you have suggested. But i am sorry that i forgot to mention about some of the address messed up in my system like, it will not have proper comma separated address.

    Ex : larser&tubro street hudson TX

    Ex : peppy and tugo road pitsburgh PA

    Not sure how do i achieve this without looping it. I thought of creating temp table will have city and state as separate column and using that i need to match with my actual table to do the operation. is it possible to do without cursor? any suggestion or sample please

    Thought it was too good to be true, normally this type of information wouldn't be that consistent;-)

    😎

    Do you have a list of Cities and States in a table/tables?

  • Hi Eirik,

    thanks for your time on this, please assume that i have city,state,zip in a separate table called tbl_city_state. please proceed with your suggestion/sample based on tbl_city_state table

    thanks

  • Here is a partial solution, it matches the city/state but does not remove them from the address.

    😎

    Sample data

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.TBL_ADDRESS_WORKTABLE') IS NOT NULL DROP TABLE dbo.TBL_ADDRESS_WORKTABLE;

    CREATE TABLE dbo.TBL_ADDRESS_WORKTABLE

    (

    Id INT IDENTITY(1,1) NOT NULL

    ,Address VARCHAR(100) NOT NULL

    ,city VARCHAR(20) NULL

    ,State VARCHAR(20) NULL

    );

    INSERT INTO dbo.TBL_ADDRESS_WORKTABLE

    SELECT '123 sheril street, Newyork,NY','New york','NY' UNION ALL

    SELECT '124 sheril street New York, NY','Newyork','NY' UNION ALL

    SELECT '333 bamboo street, Manhattan ,NY ', null,null UNION ALL

    SELECT '334 birch street Manhattan NY', null,null UNION ALL

    SELECT '123 ketty street Albany,NY','Albany','NY' UNION ALL

    SELECT '123 sibul street, Buffalo,NY',null,null ;

    IF OBJECT_ID(N'dbo.tbl_city_state') IS NOT NULL DROP TABLE dbo.tbl_city_state;

    CREATE TABLE dbo.tbl_city_state

    (

    city_state_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_CITY_STATE_CITY_STATE_ID PRIMARY KEY CLUSTERED

    ,State CHAR(2) NOT NULL

    ,City VARCHAR(50) NOT NULL

    ,County VARCHAR(100) NOT NULL

    );

    INSERT INTO dbo.tbl_city_state(State,City,County)

    VALUES

    ('NY','Albany','Albany')

    ,('NY','Manhattan','New York')

    ,('NY','Amsterdam','Montgomery')

    ,('NY','Auburn','Cayuga')

    ,('NY','Batavia','Genesee')

    ,('NY','Beacon','Dutchess')

    ,('NY','Binghamton','Broome')

    ,('NY','Buffalo','Erie')

    ,('NY','Canandaigua','Ontario')

    ,('NY','Cohoes','Albany')

    ,('NY','Corning','Steuben')

    ,('NY','Cortland','Cortland')

    ,('NY','Dunkirk','Chautauqua')

    ,('NY','Elmira','Chemung')

    ,('NY','Fulton','Oswego')

    ,('NY','Geneva','Ontario[A]')

    ,('NY','Glen Cove','Nassau')

    ,('NY','Glens Falls','Warren')

    ,('NY','Gloversville','Fulton')

    ,('NY','Hornell','Steuben')

    ,('NY','Hudson','Columbia')

    ,('NY','Ithaca','Tompkins')

    ,('NY','Jamestown','Chautauqua')

    ,('NY','Johnstown','Fulton')

    ,('NY','Kingston','Ulster')

    ,('NY','Lackawanna','Erie')

    ,('NY','Little Falls','Herkimer')

    ,('NY','Lockport','Niagara')

    ,('NY','Long Beach','Nassau')

    ,('NY','Mechanicville','Saratoga')

    ,('NY','Middletown','Orange')

    ,('NY','Mount Vernon','Westchester')

    ,('NY','New Rochelle','Westchester')

    ,('NY','New York','Bronx, Kings, New York,Queens, and Richmond')

    ,('NY','Newburgh','Orange')

    ,('NY','Niagara Falls','Niagara')

    ,('NY','North Tonawanda','Niagara')

    ,('NY','Norwich','Chenango')

    ,('NY','Ogdensburg','St. Lawrence')

    ,('NY','Olean','Cattaraugus')

    ,('NY','Oneida','Madison')

    ,('NY','Oneonta','Otsego')

    ,('NY','Oswego','Oswego')

    ,('NY','Peekskill','Westchester')

    ,('NY','Plattsburgh','Clinton')

    ,('NY','Port Jervis','Orange')

    ,('NY','Poughkeepsie','Dutchess')

    ,('NY','Rensselaer','Rensselaer')

    ,('NY','Rochester','Monroe')

    ,('NY','Rome','Oneida')

    ,('NY','Rye','Westchester')

    ,('NY','Salamanca','Cattaraugus')

    ,('NY','Saratoga Springs','Saratoga')

    ,('NY','Schenectady','Schenectady')

    ,('NY','Sherrill','Oneida')

    ,('NY','Syracuse','Onondaga')

    ,('NY','Tonawanda','Erie')

    ,('NY','Troy','Rensselaer')

    ,('NY','Utica','Oneida')

    ,('NY','Watertown','Jefferson')

    ,('NY','Watervliet','Albany')

    ,('NY','White Plains','Westchester')

    ,('NY','Yonkers','Westchester');

    The query

    USE tempdb;

    GO

    SET NOCOUNT ON;

    SELECT

    AW.Id

    ,CS.city_state_ID

    ,AW.Address

    ,CS.CITY

    ,CS.State

    FROM dbo.TBL_ADDRESS_WORKTABLE AW

    CROSS APPLY dbo.tbl_city_state CS

    WHERE

    (

    AW.city = CS.City

    AND

    AW.State = CS.State

    )

    OR

    (

    CHARINDEX(CS.CITY, AW.Address,1) > 0

    AND

    CHARINDEX(CS.State, AW.Address,1) > 0

    )

    OR

    (

    CHARINDEX(REPLACE(CS.CITY,CHAR(32),''), AW.Address,1) > 0

    AND

    CHARINDEX(CS.State, AW.Address,1) > 0

    )

    ;

    Sample results

    Id city_state_ID Address CITY State

    --- ------------- ---------------------------------- ---------- -----

    1 34 123 sheril street, Newyork,NY New York NY

    2 34 124 sheril street New York, NY New York NY

    3 2 333 bamboo street, Manhattan ,NY Manhattan NY

    4 2 334 birch street Manhattan NY Manhattan NY

    5 1 123 ketty street Albany,NY Albany NY

    6 8 123 sibul street, Buffalo,NY Buffalo NY

  • Hi Eirik,

    appreciated your time and wonderful help. My ultimate goal is to clean up the address field. address field should only have the address. Any thoughts please.

  • Here is an addition to the previous code, nothing fancy but it should work most of the time

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    SELECT

    AW.Id

    ,CS.city_state_ID

    ,AW.Address

    ,CASE

    WHEN CHARINDEX(CS.City,AW.Address,1) = 0 THEN AW.Address

    WHEN CHARINDEX(CS.City,AW.Address,1) > 0 THEN

    CASE

    WHEN RIGHT(SUBSTRING(AW.Address,1,CHARINDEX(CS.City,AW.Address,1) - 2),1) IN (' ',',') THEN SUBSTRING(AW.Address,1,CHARINDEX(CS.City,AW.Address,1) - 3)

    ELSE SUBSTRING(AW.Address,1,CHARINDEX(CS.City,AW.Address,1) - 2)

    END

    ELSE ''

    END

    ,CS.CITY

    ,CS.State

    FROM dbo.TBL_ADDRESS_WORKTABLE AW

    CROSS APPLY dbo.tbl_city_state CS

    WHERE

    (

    AW.city = CS.City

    AND

    AW.State = CS.State

    )

    OR

    (

    CHARINDEX(CS.CITY, AW.Address,1) > 0

    AND

    CHARINDEX(CS.State, AW.Address,1) > 0

    )

    OR

    (

    CHARINDEX(REPLACE(CS.CITY,CHAR(32),''), AW.Address,1) > 0

    AND

    CHARINDEX(CS.State, AW.Address,1) > 0

    )

    ;

    Quick thought, mis-spellings could be handled via alternative name/spelling table.

  • Wonderful. thank you so much. i will work with my real data and post back to you in case of any tricky situation. much appreciated.

  • Hi Eirik,

    I am into trouble now about some street names having names city name and the current logic removed the the street name as well

    ex:

    address = '555 Manhatten street, Manhattan ,NY '

    logically the the text has to be strip from second Manhattan but as per the current logic it strips @ first occurrence.

    i am only getting the out put as 555 where i supposed to get '555 Manhatten street' as output

    Any suggestion how to handle this please

  • born2achieve (5/14/2015)


    Hi Eirik,

    I am into trouble now about some street names having names city name and the current logic removed the the street name as well

    ex:

    address = '555 Manhatten street, Manhattan ,NY '

    logically the the text has to be strip from second Manhattan but as per the current logic it strips @ first occurrence.

    i am only getting the out put as 555 where i supposed to get '555 Manhatten street' as output

    Any suggestion how to handle this please

    Here is yet another solution which is based on the previous one with the two additions, firstly the logic of determine whether the Address has a State code. The second addition is to use the length of the City and State values to find last separator before those values and cut the Address string there if and only if it has a State code at the end.

    😎

    This is not bullet proof but it should work in most cases. You will probably have to handle the exceptions outside this code.

    Sample data

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.TBL_ADDRESS_WORKTABLE') IS NOT NULL DROP TABLE dbo.TBL_ADDRESS_WORKTABLE;

    CREATE TABLE dbo.TBL_ADDRESS_WORKTABLE

    (

    Id INT IDENTITY(1,1) NOT NULL

    ,Address VARCHAR(100) NOT NULL

    ,city VARCHAR(20) NULL

    ,State VARCHAR(20) NULL

    );

    INSERT INTO dbo.TBL_ADDRESS_WORKTABLE

    SELECT '123 sheril street, Newyork,NY','New york','NY' UNION ALL

    SELECT '124 sheril street New York, NY','Newyork','NY' UNION ALL

    SELECT '125 sheril street New York, NY',NULL,'NY' UNION ALL

    SELECT 'PBOX 12345 New York, NY','Newyork','NY' UNION ALL

    SELECT '333 bamboo street, Manhattan ,NY ', null,null UNION ALL

    SELECT '334 birch street Manhattan NY', null,null UNION ALL

    SELECT '335 Manhattan street','Manhattan','NY' UNION ALL

    SELECT '336 Manhattan street, Manhattan NY','Manhattan','NY' UNION ALL

    SELECT '337 Manhattan street, Manhattan NY',null,'NY' UNION ALL

    SELECT 'Manhattan street 338 Manhattan NY',null,NULL UNION ALL

    SELECT '123 ketty street Albany,NY','Albany','NY' UNION ALL

    SELECT '123 sibul street, Buffalo,NY',null,null ;

    IF OBJECT_ID(N'dbo.tbl_city_state') IS NOT NULL DROP TABLE dbo.tbl_city_state;

    CREATE TABLE dbo.tbl_city_state

    (

    city_state_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_CITY_STATE_CITY_STATE_ID PRIMARY KEY CLUSTERED

    ,State CHAR(2) NOT NULL

    ,City VARCHAR(50) NOT NULL

    ,County VARCHAR(100) NOT NULL

    );

    INSERT INTO dbo.tbl_city_state(State,City,County)

    VALUES

    ('NY','Albany','Albany')

    ,('NY','Manhattan','New York')

    ,('NY','Amsterdam','Montgomery')

    ,('NY','Auburn','Cayuga')

    ,('NY','Batavia','Genesee')

    ,('NY','Beacon','Dutchess')

    ,('NY','Binghamton','Broome')

    ,('NY','Buffalo','Erie')

    ,('NY','Canandaigua','Ontario')

    ,('NY','Cohoes','Albany')

    ,('NY','Corning','Steuben')

    ,('NY','Cortland','Cortland')

    ,('NY','Dunkirk','Chautauqua')

    ,('NY','Elmira','Chemung')

    ,('NY','Fulton','Oswego')

    ,('NY','Geneva','Ontario[A]')

    ,('NY','Glen Cove','Nassau')

    ,('NY','Glens Falls','Warren')

    ,('NY','Gloversville','Fulton')

    ,('NY','Hornell','Steuben')

    ,('NY','Hudson','Columbia')

    ,('NY','Ithaca','Tompkins')

    ,('NY','Jamestown','Chautauqua')

    ,('NY','Johnstown','Fulton')

    ,('NY','Kingston','Ulster')

    ,('NY','Lackawanna','Erie')

    ,('NY','Little Falls','Herkimer')

    ,('NY','Lockport','Niagara')

    ,('NY','Long Beach','Nassau')

    ,('NY','Mechanicville','Saratoga')

    ,('NY','Middletown','Orange')

    ,('NY','Mount Vernon','Westchester')

    ,('NY','New Rochelle','Westchester')

    ,('NY','New York','Bronx, Kings, New York,Queens, and Richmond')

    ,('NY','Newburgh','Orange')

    ,('NY','Niagara Falls','Niagara')

    ,('NY','North Tonawanda','Niagara')

    ,('NY','Norwich','Chenango')

    ,('NY','Ogdensburg','St. Lawrence')

    ,('NY','Olean','Cattaraugus')

    ,('NY','Oneida','Madison')

    ,('NY','Oneonta','Otsego')

    ,('NY','Oswego','Oswego')

    ,('NY','Peekskill','Westchester')

    ,('NY','Plattsburgh','Clinton')

    ,('NY','Port Jervis','Orange')

    ,('NY','Poughkeepsie','Dutchess')

    ,('NY','Rensselaer','Rensselaer')

    ,('NY','Rochester','Monroe')

    ,('NY','Rome','Oneida')

    ,('NY','Rye','Westchester')

    ,('NY','Salamanca','Cattaraugus')

    ,('NY','Saratoga Springs','Saratoga')

    ,('NY','Schenectady','Schenectady')

    ,('NY','Sherrill','Oneida')

    ,('NY','Syracuse','Onondaga')

    ,('NY','Tonawanda','Erie')

    ,('NY','Troy','Rensselaer')

    ,('NY','Utica','Oneida')

    ,('NY','Watertown','Jefferson')

    ,('NY','Watervliet','Albany')

    ,('NY','White Plains','Westchester')

    ,('NY','Yonkers','Westchester');

    The query

    USE tempdb;

    GO

    SET NOCOUNT ON;

    /**********************************************************

    Separate City and State from an Address

    1. Take the last three characters from the Address and

    remove space/comma.

    2. Check if it matches a State from tbl_city_state or

    the value in the State column.

    3. Cross apply the tbl_city_state to match with either

    the column values or pattern in the Address column.

    4. Find the first comma/space before the City part of

    the Address if the Address ends with a State code

    and return the position otherwise the length of the

    Address.

    5. Substring the Address from position 1 to the value

    from 4. as the "clean" Address.

    **********************************************************/

    ;WITH BASE_DATA AS

    (

    SELECT

    AW.Id

    ,AW.Address

    ,REPLACE(REPLACE(RIGHT(AW.Address,3),' ',''),',','') AS ADDR_END

    ,AW.city

    ,AW.State

    FROM dbo.TBL_ADDRESS_WORKTABLE AW

    )

    ,CHECK_STATE_IN_ADDRESS AS

    (

    SELECT

    BD.Id

    ,BD.Address

    ,BD.city

    ,BD.State

    ,BD.ADDR_END

    ,CASE

    WHEN BD.ADDR_END = BD.State AND BD.State IS NOT NULL THEN 1

    WHEN BD.State IS NULL AND EXISTS (SELECT * FROM dbo.tbl_city_state WHERE State = BD.ADDR_END) THEN 1

    ELSE 0

    END AS STATE_IN_ADDR

    FROM BASE_DATA BD

    )

    ,ADDRESS_WITH_CITY_STATE AS

    (

    SELECT

    CSIA.Id

    ,CSIA.Address

    ,CS.city

    ,CS.State

    ,CHARINDEX(CHAR(32),REVERSE(CSIA.Address),(1 + LEN(CS.city ) + LEN(CS.State))) AS POINT_A

    ,CHARINDEX(CHAR(44),REVERSE(CSIA.Address),(1 + LEN(CS.city ) + LEN(CS.State))) AS POINT_B

    ,CSIA.STATE_IN_ADDR

    FROM CHECK_STATE_IN_ADDRESS CSIA

    CROSS APPLY dbo.tbl_city_state CS

    WHERE

    (

    CSIA.city = CS.City

    AND

    CSIA.State = CS.State

    )

    OR

    (

    CHARINDEX(CS.CITY, CSIA.Address,1) > 0

    AND

    CHARINDEX(CS.State, CSIA.Address,1) > 0

    )

    OR

    (

    CHARINDEX(REPLACE(CS.CITY,CHAR(32),''), CSIA.Address,1) > 0

    AND

    CHARINDEX(CS.State, CSIA.Address,1) > 0

    )

    )

    SELECT

    AWCS.Id

    ,SUBSTRING(AWCS.Address,1,

    CASE

    WHEN AWCS.STATE_IN_ADDR = 1 AND AWCS.POINT_B = 0 THEN LEN(AWCS.Address) - AWCS.POINT_A

    WHEN AWCS.STATE_IN_ADDR = 1 AND AWCS.POINT_B > 0 THEN LEN(AWCS.Address) - AWCS.POINT_B

    ELSE LEN(AWCS.Address)

    END) AS Address

    ,AWCS.city

    ,AWCS.State

    FROM ADDRESS_WITH_CITY_STATE AWCS;

    Results

    Id Address city State

    --- ---------------------- ----------- -----

    1 123 sheril street New York NY

    2 124 sheril street New York NY

    3 125 sheril street New York NY

    4 PBOX 12345 New York NY

    5 333 bamboo stree Manhattan NY

    6 334 birch street Manhattan NY

    7 335 Manhattan street Manhattan NY

    8 336 Manhattan street Manhattan NY

    9 337 Manhattan street Manhattan NY

    10 Manhattan street 338 Manhattan NY

    11 123 ketty street Albany NY

    12 123 sibul street Buffalo NY

  • Another approach, not perfect but not bad either;-) Additional metrics can be used to enhance the logic.

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    ;WITH BASE_DATA AS

    (

    SELECT

    AW.Id

    ,AW.Address

    ,AW.city

    ,AW.State

    ,LEN(AW.Address) AS Address_LEN

    ,LEN(AW.city) AS city_LEN

    ,LEN(AW.State) AS State_LEN

    ,CHARINDEX(REVERSE(AW.city), REVERSE(AW.Address),1) AS CITY_POS

    ,CHARINDEX(CHAR(32),REVERSE(AW.Address), CHARINDEX(REVERSE(AW.city), REVERSE(AW.Address),1)) AS SPACE_POS

    ,CHARINDEX(CHAR(44),REVERSE(AW.Address), CHARINDEX(REVERSE(AW.city), REVERSE(AW.Address),1)) AS COMMA_POS

    ,CHARINDEX(REVERSE(AW.State), REVERSE(AW.Address),1) AS State_POS

    ,CS.city_state_ID

    FROM dbo.TBL_ADDRESS_WORKTABLE AW

    LEFT OUTER JOIN dbo.tbl_city_state CS

    ON AW.city = CS.City

    AND AW.State = CS.State

    )

    SELECT

    BD.Id

    ,BD.Address

    ,BD.city

    ,BD.State

    ,SUBSTRING(BD.Address,1, BD.Address_LEN - COALESCE(CASE

    WHEN BD.CITY_POS > 0 AND BD.COMMA_POS > BD.SPACE_POS THEN BD.COMMA_POS

    WHEN BD.CITY_POS > 0 AND BD.COMMA_POS = 0 THEN BD.SPACE_POS

    END,0)) AS Cleaned_Candidate_Address

    ,BD.Address_LEN

    ,BD.city_LEN

    ,BD.State_LEN

    ,BD.CITY_POS

    ,BD.SPACE_POS

    ,BD.COMMA_POS

    ,BD.State_POS

    ,BD.city_state_ID

    FROM BASE_DATA BD;[/code]

  • Hi Eirik,

    This is excellent approach and did clean almost. Thank you very much Gentle Man!!!.

  • born2achieve (5/16/2015)


    Hi Eirik,

    This is excellent approach and did clean almost. Thank you very much Gentle Man!!!.

    You are most welcome.

    😎

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply