Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


charindex help


charindex help

Author
Message
hugh.mileshkin
hugh.mileshkin
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: General Forum Members
Points: 2 Visits: 5
Hi guys, need some help with a mailing file problem. On some customer addresses, we have received data where the address appears twice within the same field - mainly on rural property names eg. Penrose Penrose or Kentucky Kentucky. I need to re-format it for mailing. I'm ok where it's a 1-word property that's repeated - I used this code below to update the field [addr_1] to just the first word -

update ##tbl_xx
set addr_1 = substring(addr_1, 1, (CHARINDEX(' ',addr_1)-1) )
where substring(addr_1,1,charindex(' ',addr_1)) = substring(addr_1,(charindex(' ',addr_1)+1),99)

But having problems where it's two-worded name that is repeated eg. wilson farm wilson farm

I need a query that 1) finds these repeated addresses, and then 2) updates the field to just select the first two words. The closest I can get is the code below -

select addr_1,
rtrim(substring(addr_1,1,charindex(' ',addr_1)) +substring(addr_1,(charindex(' ',addr_1)+1), charindex(' ',addr_1))) as reformatted_addr
from ##tbl_xx
where addr_1 like '% % % %' and addr_1 not like '%[0-9]%'

I tried using patindex but don't have much experience with it, so couldn't get it to work for me. Help!
john.arnott
john.arnott
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1466 Visits: 3059
The only difference between PATINDEX and CHARINDEX is that PATINDEX supports wildcard characters for the match whereas CHARINDEX does not.

This code uses Charindex to fill in parameters of SUBSTRING. The first WHEN in the case statement uses it to find the first space in field "address" and then compares the substring from the start of address to the space with the substring from after the space to the end. The second when is similar, but it uses charindex in the "start position" parameter to find the next space. It then can compare the two halves of address, before and after that second space. In either case, if a match is found, the first half is used as the returned value.

One other thought does occur here. In general, it can be dangerous to completely automate data clean-up efforts like this. My friends in a certain town in Washington state would probably wonder when they were moved to "Walla". You may want to consider using this sort of query to produce a candidates listing for manual review and action.
Declare @address table (address char(50))
insert into @address
select 'Lincoln'
union all
select 'Kentucky Kentucky'
union all
select 'Walla Walla'
union all
select 'New York New York'
union all
select 'Ohio'
union all
select 'Iowa'

select case when substring(address, 1, charindex(' ',rtrim(address))) --Substring up to 1st space
= substring(address, charindex(' ', rtrim(address)) +1, len(address)) --Substring from 1st space to end
then substring(address, 1, charindex(' ',rtrim(address))) --Substring up to 1st space
when substring(address, 1, charindex(' ', rtrim(address), charindex(' ', rtrim(address)) +1) ) --Substring up to 2nd space
= substring(address, charindex(' ',rtrim(address), charindex(' ',rtrim(address)) +1 ) +1, len(address)) --Substring from 2nd space to end
then substring(address, 1, charindex(' ',rtrim(address),charindex(' ',rtrim(address)) +1) ) --Substring up to 2nd space
else address
end
from @address


Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44940 Visits: 39859
Borrowing on John's good data creation code and assuming the column is actually a VARCHAR instead of a char, here's a super simple way to take care of double, triple, or more words without having to know how many words there are...

Declare @address table (address VARchar(50))
insert into @address
select 'Lincoln'
union all
select 'Kentucky Kentucky'
union all
select 'Walla Walla'
union all
select 'New York New York'
union all
select 'New York CITY New York CITY'
union all
select 'Ohio'
union all
select 'Iowa'

SELECT CASE
WHEN LEFT(Address,LEN(Address)/2)=RIGHT(Address,LEN(Address)/2)
THEN LEFT(Address,LEN(Address)/2)
ELSE Address
END
FROM @Address



--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is usually not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
john.arnott
john.arnott
UDP Broadcaster
UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)UDP Broadcaster (1.5K reputation)

Group: General Forum Members
Points: 1466 Visits: 3059
Jeff, your response is valuable to the original poster in that it gives him a straight-forward, scalable solution, but may be more valuable to me and others on SSC in that it demonstrates the practice of cutting to the core requirement. I had started with the question as posed -- how to parse the field with charindex() -- but the real problem was how to find fields with the first half the same as the second.

Somewhere on SSC a while ago, I saw the expression "You think you need an answer? No. You need a question." When stated "how to find first half equal to second half", the simpler and better solution almost writes itself. This is to say, the genius was in stating the problem accurately.

Let's make it work with address defined as a char(50) by using RTRIM() so that the RIGHT() function works as intended. Note that the LEN() function does an implied RTRIM already.
--Declare @address table (address VARchar(50))
Declare @address table (address char(50))
insert into @address
select 'Lincoln'
union all
select 'Kentucky Kentucky'
union all
select 'Walla Walla'
union all
select 'New York New York'
union all
select 'New York CITY New York CITY'
union all
select 'Ohio'
union all
select 'Iowa'

SELECT
CASE
WHEN LEFT(Address,LEN(Address)/2)=RIGHT(rtrim(Address),LEN(rtrim(Address))/2)
--WHEN LEFT(Address,LEN(Address)/2)=RIGHT(Address,LEN(Address)/2)
THEN LEFT(Address,LEN(Address)/2)
ELSE Address
END

FROM @Address


Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44940 Visits: 39859
Thanks for the awesome feedback, John. I really appreciate it.

Thanks for including an explanation, as well. I normally try to explain the code for the OP in the form of embedded comments, but it was getting late and I got a bit lazy. You covered it for me very nicely.

As a sidebar, perhaps we can make one small optimization that probably won't matter unless there are millions of rows at hand... it'll also give me a chance to redeem myself for being so lazy last night...

--===== Create and populate a test table.
     -- This is NOT a part of the solution.
DECLARE @Address TABLE (Address CHAR(50))
 
INSERT INTO @Address
        
(Address)
 
SELECT 'Lincoln' UNION ALL
 
SELECT 'Kentucky Kentucky' UNION ALL
 
SELECT 'Walla Walla' UNION ALL
 
SELECT 'New York New York' UNION ALL
 
SELECT 'New York CITY New York CITY' UNION ALL
 
SELECT 'New York CITYNew York CITY' UNION ALL
 
SELECT 'New York CITYNew York CITY       ' UNION ALL
 
SELECT 'Ohio' UNION ALL
 
SELECT 'Iowa'
 
--===== Return only single addresses.
     -- If you compare the left half of the string to the right half
     -- and both halves are the same, then use just the left half.
     -- If the halves are not the same, then the address has not been
     -- doubled up and we can use the address as it is.
--
     -- Notice that each half can be either separated by a space or 
     -- jammed together (see test data above) when a dupe is present.  
     -- The divide by 2 in the code below works in either case because
     -- LEN() procudes an INT and the 2 is an INT which forces only 
     -- Integer math to take place.  For example, if your string is
     -- "AB AB" or "ABAB", the length of one half for the first is
     -- 5/2 = 2 and 4/2 = 2 for the second. Both equal 2 in integer math.
     -- The cool part about LEN() is that it won't count trailing spaces
     -- so no special calculation for CHAR(50) needs to be made.  We do,
     -- however, have to RTRIM() the RIGHT side of the address when we
     -- split the address in half so we don't pick up on any trailing
     -- spaces.
 
SELECT CASE    
        
WHEN LEFT(Address,LEN(Address)/2RIGHT(RTRIM(Address),LEN(Address)/2)    
        
THEN LEFT(Address,LEN(Address)/2)        
        
ELSE Address         
        
END AS SingleAddress
   
FROM @Address


--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is usually not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
hugh.mileshkin
hugh.mileshkin
Forum Newbie
Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)Forum Newbie (2 reputation)

Group: General Forum Members
Points: 2 Visits: 5
Thanks so much for your help Jeff & John, really appreciate it! Smile
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44940 Visits: 39859
hugh.mileshkin (6/9/2009)
Thanks so much for your help Jeff & John, really appreciate it! Smile


Thanks, Hugh. Good feedback like that is the only pay we get for this type of thing. I appreciate it.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is usually not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Terri-92562
Terri-92562
SSC Veteran
SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)SSC Veteran (248 reputation)

Group: General Forum Members
Points: 248 Visits: 1242
Good explanation Jeff. A small point, your solution fails if there is more than 1 space between the 2 words. e.g 'Lincoln Lincoln' has length 16. So the left half is 'Lincoln ' and the right half is ' Lincoln'.
This is fixed by adding RTRIM and LTRIM to left and right half respectively:
WHEN RTRIM(LEFT(Address,LEN(Address)/2)) = LTRIM(RIGHT(RTRIM(Address),LEN(Address)/2))

Of course, there are probably other scenarios that don't work, but the OP needs to test with his data. :-)



Terri

To speak algebraically, Mr. M. is execrable, but Mr. C. is
(x+1)-ecrable.
Edgar Allan Poe
[Discussing fellow writers Cornelius Mathews and William Ellery Channing.]
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44940 Visits: 39859
Terri (6/10/2009)
Good explanation Jeff. A small point, your solution fails if there is more than 1 space between the 2 words. e.g 'Lincoln Lincoln' has length 16. So the left half is 'Lincoln ' and the right half is ' Lincoln'.
This is fixed by adding RTRIM and LTRIM to left and right half respectively:
WHEN RTRIM(LEFT(Address,LEN(Address)/2)) = LTRIM(RIGHT(RTRIM(Address),LEN(Address)/2))

Of course, there are probably other scenarios that don't work, but the OP needs to test with his data. :-)


Yep... good enhancement. The best thing, of course, would be to pork chop the silly person who made the mistake of combining the data to begin with. ;-) Solutions for to correct these types of mistakes are bound to have a flaw here and there.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is usually not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search