How to find repetitive strings in a column and then remove the second string

  • I am having an issue with some bad address data. So I have an address field that is 32 characters in length. What is happening is that when the address is passed down through the web it is being repeated. The issue has been fixed on the web but I need to fix the ones that came in wrong.

    For example:

    Customer address

    1 2422 MARK CIR 2422 MARK CIRC

    2 208 PENNINGTON RD 208 PENNINGTON

    3 1000 HORSESHOE 22.3 DR 1000 HORS

    4 54 OAKHILL FARMS PKWY 54 OAKHILL

    So I need the data to look like this:

    Customer address

    1 2422 MARK CIR

    2 208 PENNINGTON RD

    3 1000 HORSESHOE 22.3 DR

    4 54 OAKHILL FARMS PKWY

    Has anyone had anything similiar to this? Any suggestions on how to identify and fix? I have roughly 600 address that came across like the examples above.

    Below is the sql statements to create and insert the data.

    create table address_fix (

    customer_number int,

    address varchar(32));

    insert address_fix values (1, '2422 MARK CIR 2422 MARK CIRC');

    insert address_fix values (2, '208 PENNINGTON RD 208 PENNINGTON');

    insert address_fix values (3, '1000 HORSESHOE 22.3 DR 1000 HORS');

    insert address_fix values (4, '54 OAKHILL FARMS PKWY 54 OAKHILL');

  • Here's an option, but it's not perfect as it won't get the first row because it's not completely duplicated.

    WITH

    E(n) AS(

    SELECT n FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0))E(n)

    ),

    E2(n) AS(

    SELECT a.n FROM E a, E b

    ),

    cteTally(n) AS(

    SELECT TOP 32 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) n

    FROM E2

    )

    SELECT *

    --UPDATE a SET address = New_Address

    FROM address_fix a

    CROSS APPLY (SELECT t.n, LEFT( a.address, t.n) New_Address

    FROM cteTally t

    WHERE a.address LIKE SUBSTRING( a.address, n, 32) + '%'

    AND t.n > 1

    AND t.n < LEN( a.address)) p;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Hey man - thanks for the reply! Wow - it works great and is definitely a start! Your query was pulling in the 1 number of the duplicated address - so I adjusted the query to subtract a position and it worked wonders! Below is the change I made.

    SELECT t.n, LEFT( a.address, t.n-1) New_Address

    FROM cteTally t

    WHERE a.address LIKE SUBSTRING( a.address, n, 32) + '%'

    AND t.n > 1

    AND t.n < LEN( a.address)

    But much appreciated!

  • Here's another option. It considers that the address can be identified by the first 8 characters. You can adjust that value.

    SELECT *, LEFT( a.address, CHARINDEX( LEFT( a.address, 8), a.address, 2) - 1)

    --UPDATE a SET address = New_Address

    FROM address_fix a;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply