Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

charindex help Expand / Collapse
Author
Message
Posted Thursday, June 4, 2009 7:50 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, June 14, 2009 4:58 PM
Points: 2, Visits: 5
Hi guys, need some help with a mailing file problem. On some customer addresses, we have received data where the address appears twice within the same field - mainly on rural property names eg. Penrose Penrose or Kentucky Kentucky. I need to re-format it for mailing. I'm ok where it's a 1-word property that's repeated - I used this code below to update the field [addr_1] to just the first word -

update ##tbl_xx
set addr_1 = substring(addr_1, 1, (CHARINDEX(' ',addr_1)-1) )
where substring(addr_1,1,charindex(' ',addr_1)) = substring(addr_1,(charindex(' ',addr_1)+1),99)

But having problems where it's two-worded name that is repeated eg. wilson farm wilson farm

I need a query that 1) finds these repeated addresses, and then 2) updates the field to just select the first two words. The closest I can get is the code below -

select addr_1,
rtrim(substring(addr_1,1,charindex(' ',addr_1)) +substring(addr_1,(charindex(' ',addr_1)+1), charindex(' ',addr_1))) as reformatted_addr
from ##tbl_xx
where addr_1 like '% % % %' and addr_1 not like '%[0-9]%'

I tried using patindex but don't have much experience with it, so couldn't get it to work for me. Help!
Post #729370
Posted Friday, June 5, 2009 5:25 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, August 28, 2014 8:53 PM
Points: 1,388, Visits: 3,039
The only difference between PATINDEX and CHARINDEX is that PATINDEX supports wildcard characters for the match whereas CHARINDEX does not.

This code uses Charindex to fill in parameters of SUBSTRING. The first WHEN in the case statement uses it to find the first space in field "address" and then compares the substring from the start of address to the space with the substring from after the space to the end. The second when is similar, but it uses charindex in the "start position" parameter to find the next space. It then can compare the two halves of address, before and after that second space. In either case, if a match is found, the first half is used as the returned value.

One other thought does occur here. In general, it can be dangerous to completely automate data clean-up efforts like this. My friends in a certain town in Washington state would probably wonder when they were moved to "Walla". You may want to consider using this sort of query to produce a candidates listing for manual review and action.
Declare @address table (address char(50))
insert into @address
select 'Lincoln'
union all
select 'Kentucky Kentucky'
union all
select 'Walla Walla'
union all
select 'New York New York'
union all
select 'Ohio'
union all
select 'Iowa'

select case when substring(address, 1, charindex(' ',rtrim(address))) --Substring up to 1st space
= substring(address, charindex(' ', rtrim(address)) +1, len(address)) --Substring from 1st space to end
then substring(address, 1, charindex(' ',rtrim(address))) --Substring up to 1st space
when substring(address, 1, charindex(' ', rtrim(address), charindex(' ', rtrim(address)) +1) ) --Substring up to 2nd space
= substring(address, charindex(' ',rtrim(address), charindex(' ',rtrim(address)) +1 ) +1, len(address)) --Substring from 2nd space to end
then substring(address, 1, charindex(' ',rtrim(address),charindex(' ',rtrim(address)) +1) ) --Substring up to 2nd space
else address
end
from @address

Post #730106
Posted Friday, June 5, 2009 8:29 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 1:47 PM
Points: 35,215, Visits: 31,667
Borrowing on John's good data creation code and assuming the column is actually a VARCHAR instead of a char, here's a super simple way to take care of double, triple, or more words without having to know how many words there are...

Declare @address table (address VARchar(50))
insert into @address
select 'Lincoln'
union all
select 'Kentucky Kentucky'
union all
select 'Walla Walla'
union all
select 'New York New York'
union all
select 'New York CITY New York CITY'
union all
select 'Ohio'
union all
select 'Iowa'

SELECT CASE
WHEN LEFT(Address,LEN(Address)/2)=RIGHT(Address,LEN(Address)/2)
THEN LEFT(Address,LEN(Address)/2)
ELSE Address
END
FROM @Address



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #730127
Posted Saturday, June 6, 2009 1:05 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Thursday, August 28, 2014 8:53 PM
Points: 1,388, Visits: 3,039
Jeff, your response is valuable to the original poster in that it gives him a straight-forward, scalable solution, but may be more valuable to me and others on SSC in that it demonstrates the practice of cutting to the core requirement. I had started with the question as posed -- how to parse the field with charindex() -- but the real problem was how to find fields with the first half the same as the second.

Somewhere on SSC a while ago, I saw the expression "You think you need an answer? No. You need a question." When stated "how to find first half equal to second half", the simpler and better solution almost writes itself. This is to say, the genius was in stating the problem accurately.

Let's make it work with address defined as a char(50) by using RTRIM() so that the RIGHT() function works as intended. Note that the LEN() function does an implied RTRIM already.
--Declare @address table (address VARchar(50))
Declare @address table (address char(50))
insert into @address
select 'Lincoln'
union all
select 'Kentucky Kentucky'
union all
select 'Walla Walla'
union all
select 'New York New York'
union all
select 'New York CITY New York CITY'
union all
select 'Ohio'
union all
select 'Iowa'

SELECT
CASE
WHEN LEFT(Address,LEN(Address)/2)=RIGHT(rtrim(Address),LEN(rtrim(Address))/2)
--WHEN LEFT(Address,LEN(Address)/2)=RIGHT(Address,LEN(Address)/2)
THEN LEFT(Address,LEN(Address)/2)
ELSE Address
END

FROM @Address

Post #730290
Posted Saturday, June 6, 2009 6:44 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 1:47 PM
Points: 35,215, Visits: 31,667
Thanks for the awesome feedback, John. I really appreciate it.

Thanks for including an explanation, as well. I normally try to explain the code for the OP in the form of embedded comments, but it was getting late and I got a bit lazy. You covered it for me very nicely.

As a sidebar, perhaps we can make one small optimization that probably won't matter unless there are millions of rows at hand... it'll also give me a chance to redeem myself for being so lazy last night...

--===== Create and populate a test table.
     -- This is NOT a part of the solution.
DECLARE @Address TABLE (Address CHAR(50))
 
INSERT INTO @Address
        
(Address)
 
SELECT 'Lincoln' UNION ALL
 
SELECT 'Kentucky Kentucky' UNION ALL
 
SELECT 'Walla Walla' UNION ALL
 
SELECT 'New York New York' UNION ALL
 
SELECT 'New York CITY New York CITY' UNION ALL
 
SELECT 'New York CITYNew York CITY' UNION ALL
 
SELECT 'New York CITYNew York CITY       ' UNION ALL
 
SELECT 'Ohio' UNION ALL
 
SELECT 'Iowa'
 
--===== Return only single addresses.
     -- If you compare the left half of the string to the right half
     -- and both halves are the same, then use just the left half.
     -- If the halves are not the same, then the address has not been
     -- doubled up and we can use the address as it is.
--
     -- Notice that each half can be either separated by a space or 
     -- jammed together (see test data above) when a dupe is present.  
     -- The divide by 2 in the code below works in either case because
     -- LEN() procudes an INT and the 2 is an INT which forces only 
     -- Integer math to take place.  For example, if your string is
     -- "AB AB" or "ABAB", the length of one half for the first is
     -- 5/2 = 2 and 4/2 = 2 for the second. Both equal 2 in integer math.
     -- The cool part about LEN() is that it won't count trailing spaces
     -- so no special calculation for CHAR(50) needs to be made.  We do,
     -- however, have to RTRIM() the RIGHT side of the address when we
     -- split the address in half so we don't pick up on any trailing
     -- spaces.
 
SELECT CASE    
        
WHEN LEFT(Address,LEN(Address)/2RIGHT(RTRIM(Address),LEN(Address)/2)    
        
THEN LEFT(Address,LEN(Address)/2)        
        
ELSE Address         
        
END AS SingleAddress
   
FROM @Address



--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #730307
Posted Tuesday, June 9, 2009 5:06 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Sunday, June 14, 2009 4:58 PM
Points: 2, Visits: 5
Thanks so much for your help Jeff & John, really appreciate it! :)
Post #731944
Posted Tuesday, June 9, 2009 7:16 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 1:47 PM
Points: 35,215, Visits: 31,667
hugh.mileshkin (6/9/2009)
Thanks so much for your help Jeff & John, really appreciate it! :)


Thanks, Hugh. Good feedback like that is the only pay we get for this type of thing. I appreciate it.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #731975
Posted Wednesday, June 10, 2009 9:14 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, October 3, 2012 8:35 AM
Points: 242, Visits: 1,242
Good explanation Jeff. A small point, your solution fails if there is more than 1 space between the 2 words. e.g 'Lincoln Lincoln' has length 16. So the left half is 'Lincoln ' and the right half is ' Lincoln'.
This is fixed by adding RTRIM and LTRIM to left and right half respectively:
WHEN RTRIM(LEFT(Address,LEN(Address)/2)) = LTRIM(RIGHT(RTRIM(Address),LEN(Address)/2))

Of course, there are probably other scenarios that don't work, but the OP needs to test with his data.




Terri

To speak algebraically, Mr. M. is execrable, but Mr. C. is
(x+1)-ecrable.
Edgar Allan Poe
[Discussing fellow writers Cornelius Mathews and William Ellery Channing.]
Post #732376
Posted Friday, June 12, 2009 7:36 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 1:47 PM
Points: 35,215, Visits: 31,667
Terri (6/10/2009)
Good explanation Jeff. A small point, your solution fails if there is more than 1 space between the 2 words. e.g 'Lincoln Lincoln' has length 16. So the left half is 'Lincoln ' and the right half is ' Lincoln'.
This is fixed by adding RTRIM and LTRIM to left and right half respectively:
WHEN RTRIM(LEFT(Address,LEN(Address)/2)) = LTRIM(RIGHT(RTRIM(Address),LEN(Address)/2))

Of course, there are probably other scenarios that don't work, but the OP needs to test with his data.


Yep... good enhancement. The best thing, of course, would be to pork chop the silly person who made the mistake of combining the data to begin with. Solutions for to correct these types of mistakes are bound to have a flaw here and there.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #734216
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse