charindex help

  • Hi guys, need some help with a mailing file problem. On some customer addresses, we have received data where the address appears twice within the same field - mainly on rural property names eg. Penrose Penrose or Kentucky Kentucky. I need to re-format it for mailing. I'm ok where it's a 1-word property that's repeated - I used this code below to update the field [addr_1] to just the first word -

    update ##tbl_xx

    set addr_1 = substring(addr_1, 1, (CHARINDEX(' ',addr_1)-1) )

    where substring(addr_1,1,charindex(' ',addr_1)) = substring(addr_1,(charindex(' ',addr_1)+1),99)

    But having problems where it's two-worded name that is repeated eg. wilson farm wilson farm

    I need a query that 1) finds these repeated addresses, and then 2) updates the field to just select the first two words. The closest I can get is the code below -

    select addr_1,

    rtrim(substring(addr_1,1,charindex(' ',addr_1)) +substring(addr_1,(charindex(' ',addr_1)+1), charindex(' ',addr_1))) as reformatted_addr

    from ##tbl_xx

    where addr_1 like '% % % %' and addr_1 not like '%[0-9]%'

    I tried using patindex but don't have much experience with it, so couldn't get it to work for me. Help!

  • The only difference between PATINDEX and CHARINDEX is that PATINDEX supports wildcard characters for the match whereas CHARINDEX does not.

    This code uses Charindex to fill in parameters of SUBSTRING. The first WHEN in the case statement uses it to find the first space in field "address" and then compares the substring from the start of address to the space with the substring from after the space to the end. The second when is similar, but it uses charindex in the "start position" parameter to find the next space. It then can compare the two halves of address, before and after that second space. In either case, if a match is found, the first half is used as the returned value.

    One other thought does occur here. In general, it can be dangerous to completely automate data clean-up efforts like this. My friends in a certain town in Washington state would probably wonder when they were moved to "Walla". You may want to consider using this sort of query to produce a candidates listing for manual review and action.

    Declare @address table (address char(50))

    insert into @address

    select 'Lincoln'

    union all

    select 'Kentucky Kentucky'

    union all

    select 'Walla Walla'

    union all

    select 'New York New York'

    union all

    select 'Ohio'

    union all

    select 'Iowa'

    select case when substring(address, 1, charindex(' ',rtrim(address))) --Substring up to 1st space

    = substring(address, charindex(' ', rtrim(address)) +1, len(address)) --Substring from 1st space to end

    then substring(address, 1, charindex(' ',rtrim(address))) --Substring up to 1st space

    when substring(address, 1, charindex(' ', rtrim(address), charindex(' ', rtrim(address)) +1) ) --Substring up to 2nd space

    = substring(address, charindex(' ',rtrim(address), charindex(' ',rtrim(address)) +1 ) +1, len(address)) --Substring from 2nd space to end

    then substring(address, 1, charindex(' ',rtrim(address),charindex(' ',rtrim(address)) +1) ) --Substring up to 2nd space

    else address

    end

    from @address

  • Borrowing on John's good data creation code and assuming the column is actually a VARCHAR instead of a char, here's a super simple way to take care of double, triple, or more words without having to know how many words there are...

    Declare @address table (address VARchar(50))

    insert into @address

    select 'Lincoln'

    union all

    select 'Kentucky Kentucky'

    union all

    select 'Walla Walla'

    union all

    select 'New York New York'

    union all

    select 'New York CITY New York CITY'

    union all

    select 'Ohio'

    union all

    select 'Iowa'

    SELECT CASE

    WHEN LEFT(Address,LEN(Address)/2)=RIGHT(Address,LEN(Address)/2)

    THEN LEFT(Address,LEN(Address)/2)

    ELSE Address

    END

    FROM @Address

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, your response is valuable to the original poster in that it gives him a straight-forward, scalable solution, but may be more valuable to me and others on SSC in that it demonstrates the practice of cutting to the core requirement. I had started with the question as posed -- how to parse the field with charindex() -- but the real problem was how to find fields with the first half the same as the second.

    Somewhere on SSC a while ago, I saw the expression "You think you need an answer? No. You need a question." When stated "how to find first half equal to second half", the simpler and better solution almost writes itself. This is to say, the genius was in stating the problem accurately.

    Let's make it work with address defined as a char(50) by using RTRIM() so that the RIGHT() function works as intended. Note that the LEN() function does an implied RTRIM already.--Declare @address table (address VARchar(50))

    Declare @address table (address char(50))

    insert into @address

    select 'Lincoln'

    union all

    select 'Kentucky Kentucky'

    union all

    select 'Walla Walla'

    union all

    select 'New York New York'

    union all

    select 'New York CITY New York CITY'

    union all

    select 'Ohio'

    union all

    select 'Iowa'

    SELECT

    CASE

    WHEN LEFT(Address,LEN(Address)/2)=RIGHT(rtrim(Address),LEN(rtrim(Address))/2)

    --WHEN LEFT(Address,LEN(Address)/2)=RIGHT(Address,LEN(Address)/2)

    THEN LEFT(Address,LEN(Address)/2)

    ELSE Address

    END

    FROM @Address

  • Thanks for the awesome feedback, John. I really appreciate it.

    Thanks for including an explanation, as well. I normally try to explain the code for the OP in the form of embedded comments, but it was getting late and I got a bit lazy. You covered it for me very nicely.

    As a sidebar, perhaps we can make one small optimization that probably won't matter unless there are millions of rows at hand... it'll also give me a chance to redeem myself for being so lazy last night...

    [font="Courier New"]--===== Create and populate a test table.

         -- This is NOT a part of the solution.

    DECLARE @Address TABLE (Address CHAR(50))

     INSERT INTO @Address

            (Address)

     SELECT 'Lincoln' UNION ALL

     SELECT 'Kentucky Kentucky' UNION ALL

     SELECT 'Walla Walla' UNION ALL

     SELECT 'New York New York' UNION ALL

     SELECT 'New York CITY New York CITY' UNION ALL

     SELECT 'New York CITYNew York CITY' UNION ALL

     SELECT 'New York CITYNew York CITY       ' UNION ALL

     SELECT 'Ohio' UNION ALL

     SELECT 'Iowa'

     

    --===== Return only single addresses.

         -- If you compare the left half of the string to the right half

         -- and both halves are the same, then use just the left half.

         -- If the halves are not the same, then the address has not been

         -- doubled up and we can use the address as it is.

    --

         -- Notice that each half can be either separated by a space or 

         -- jammed together (see test data above) when a dupe is present.  

         -- The divide by 2 in the code below works in either case because

         -- LEN() procudes an INT and the 2 is an INT which forces only 

         -- Integer math to take place.  For example, if your string is

         -- "AB AB" or "ABAB", the length of one half for the first is

         -- 5/2 = 2 and 4/2 = 2 for the second. Both equal 2 in integer math.

         -- The cool part about LEN() is that it won't count trailing spaces

         -- so no special calculation for CHAR(50) needs to be made.  We do,

         -- however, have to RTRIM() the RIGHT side of the address when we

         -- split the address in half so we don't pick up on any trailing

         -- spaces.

     SELECT CASE    

            WHEN LEFT(Address,LEN(Address)/2RIGHT(RTRIM(Address),LEN(Address)/2)    

            THEN LEFT(Address,LEN(Address)/2)        

            ELSE Address         

            END AS SingleAddress

       FROM @Address[/font]

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks so much for your help Jeff & John, really appreciate it! 🙂

  • hugh.mileshkin (6/9/2009)


    Thanks so much for your help Jeff & John, really appreciate it! 🙂

    Thanks, Hugh. Good feedback like that is the only pay we get for this type of thing. I appreciate it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Good explanation Jeff. A small point, your solution fails if there is more than 1 space between the 2 words. e.g 'Lincoln Lincoln' has length 16. So the left half is 'Lincoln ' and the right half is ' Lincoln'.

    This is fixed by adding RTRIM and LTRIM to left and right half respectively:

    WHEN RTRIM(LEFT(Address,LEN(Address)/2)) = LTRIM(RIGHT(RTRIM(Address),LEN(Address)/2))

    Of course, there are probably other scenarios that don't work, but the OP needs to test with his data. 🙂



    Terri

    To speak algebraically, Mr. M. is execrable, but Mr. C. is
    (x+1)-ecrable.
    Edgar Allan Poe
    [Discussing fellow writers Cornelius Mathews and William Ellery Channing.]

  • Terri (6/10/2009)


    Good explanation Jeff. A small point, your solution fails if there is more than 1 space between the 2 words. e.g 'Lincoln Lincoln' has length 16. So the left half is 'Lincoln ' and the right half is ' Lincoln'.

    This is fixed by adding RTRIM and LTRIM to left and right half respectively:

    WHEN RTRIM(LEFT(Address,LEN(Address)/2)) = LTRIM(RIGHT(RTRIM(Address),LEN(Address)/2))

    Of course, there are probably other scenarios that don't work, but the OP needs to test with his data. 🙂

    Yep... good enhancement. The best thing, of course, would be to pork chop the silly person who made the mistake of combining the data to begin with. 😉 Solutions for to correct these types of mistakes are bound to have a flaw here and there.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply