SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Find a sub string in a column based on values in a reference table


Find a sub string in a column based on values in a reference table

Author
Message
fparker 20089
fparker 20089
SSC-Enthusiastic
SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)

Group: General Forum Members
Points: 130 Visits: 58
Good morning,
I am not quite sure how to approach the problem. I have a table for customer and vendor with addresses in them. In many cases the city name is not in the right place. I have a table with a list of valid cities by country.

What I want to do is for each customer or vendor look in one column and see if there is a city name in there based on my reference table.

So if in Address it has '123 Ocean BL Long Beach' I want to be able to find Long Beach in there.

Thoughts on how I go after this?

TIA
John Mitchell-245523
John Mitchell-245523
SSC Guru
SSC Guru (125K reputation)SSC Guru (125K reputation)SSC Guru (125K reputation)SSC Guru (125K reputation)SSC Guru (125K reputation)SSC Guru (125K reputation)SSC Guru (125K reputation)SSC Guru (125K reputation)

Group: General Forum Members
Points: 125267 Visits: 18852
SELECT v.Address, c.City
FROM Vendors v
LEFT JOIN Cities c ON v.Address = '%' + c.City + '%'

Edit:- added c.City to SELECT list.

John
sgmunson
sgmunson
SSC Guru
SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)SSC Guru (79K reputation)

Group: General Forum Members
Points: 79627 Visits: 6523
John Mitchell-245523 - Thursday, December 21, 2017 8:41 AM
SELECT v.Address
FROM Vendors v
LEFT JOIN Cities c ON v.Address = '%' + c.City + '%'


John

Important that you understand that with a large number of rows, this will not perform well, because a normal index will not help. You'd need a FULL TEXT INDEX to be able to get that kind of search to perform better, which would take a rather sizable chunk of disk space and a lot of complexity. However, if this is a one-time search, then it takes as long as it takes.


Steve
‌(aka sgmunson)
Smile Smile Smile
Health & Nutrition
fparker 20089
fparker 20089
SSC-Enthusiastic
SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)

Group: General Forum Members
Points: 130 Visits: 58
I figured performance would be sluggish.
This isn't a regular production run. We are moving data to a new system and would like to clean some stuff up. The legacy systems did not account for non USA based addresses so a lot of them are buggered up with city in the address line and country in the state or zip code line.
Performance is not an issue. It is being run on a non prod server and if it takes a week to run that is fine.
fparker 20089
fparker 20089
SSC-Enthusiastic
SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)

Group: General Forum Members
Points: 130 Visits: 58
This is not working with the % but it does work without it.

SELECT ,[Name] ,[Address] ,[Address2] ,c.[City] ,r.[city_name] ,[State] ,[ZipCode] ,[TerritoryCode] ,[CountryCode]
FROM [Customer] c
left join [xRefCityNames] r ON c.Address2 = r.[city_name]
where c.city = '' and address2 = 'new delhi'

This works but when I use ON c.Address2 = '%' + r.[city_name] + '%'
I get NULL for city_name.

Thoughts?
drew.allen
drew.allen
SSC Guru
SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)

Group: General Forum Members
Points: 58299 Visits: 15287
fparker 20089 - Thursday, December 21, 2017 9:31 AM
This is not working with the % but it does work without it.

SELECT ,[Name] ,[Address] ,[Address2] ,c.[City] ,r.[city_name] ,[State] ,[ZipCode] ,[TerritoryCode] ,[CountryCode]
FROM [Customer] c
left join [xRefCityNames] r ON c.Address2 = r.[city_name]
where c.city = '' and address2 = 'new delhi'

This works but when I use ON c.Address2 = '%' + r.[city_name] + '%'
I get NULL for city_name.

Thoughts?

Yes, it should be LIKE not =
ON c.Address2 LIKE '%' + r.[city_name] + '%' 


Drew


J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
How to post data/code on a forum to get the best help.
How to Post Performance Problems
Make sure that you include code in the appropriate IFCode tags, e.g. [code=sql]<your code here>[/code]. You can find the IFCode tags under the INSERT options when you are writing a post.
fparker 20089
fparker 20089
SSC-Enthusiastic
SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)

Group: General Forum Members
Points: 130 Visits: 58
That does work except I will get 2 results for that one. I have both New Dehli and Dehli as cities which are both in my city list for India. Not bad for a cleanup effort though.
drew.allen
drew.allen
SSC Guru
SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)

Group: General Forum Members
Points: 58299 Visits: 15287
fparker 20089 - Thursday, December 21, 2017 9:52 AM
That does work except I will get 2 results for that one. I have both New Dehli and Dehli as cities which are both in my city list for India. Not bad for a cleanup effort though.

That is unavoidable when trying to clean up data. You will get false positives and may miss records. You could rewrite your query to only give one, but that might cause you to miss more records.

Here is one rewrite that will only ever give you one possible match.
SELECT [Name] ,[Address] ,[Address2] ,c.[City] ,r.[city_name] ,[State] ,[ZipCode] ,[TerritoryCode] ,[CountryCode]
FROM [Customer] c
CROSS APPLY
(
SELECT TOP(1) r.city_name
FROM [xRefCityNames] r
WHERE c.Address2 LIKE '%' + r.[city_name] + '%'
ORDER BY LEN(r.city_name) DESC
) r


Drew


J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
How to post data/code on a forum to get the best help.
How to Post Performance Problems
Make sure that you include code in the appropriate IFCode tags, e.g. [code=sql]<your code here>[/code]. You can find the IFCode tags under the INSERT options when you are writing a post.
fparker 20089
fparker 20089
SSC-Enthusiastic
SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)SSC-Enthusiastic (130 reputation)

Group: General Forum Members
Points: 130 Visits: 58
Thanks a bunch. I think I am good, this gives me a better look than I had before. Some of these will need to be eyeballed anyway so having a few extras won't hurt.

I think taking that top on the longest one will work.
drew.allen
drew.allen
SSC Guru
SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)SSC Guru (58K reputation)

Group: General Forum Members
Points: 58299 Visits: 15287
fparker 20089 - Thursday, December 21, 2017 10:17 AM
Thanks a bunch. I think I am good, this gives me a better look than I had before. Some of these will need to be eyeballed anyway so having a few extras won't hurt.

I think taking that top on the longest one will work.

That's exactly what the CROSS APPLY is doing. You can also use a CTE with ROW_NUMBER().

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
How to post data/code on a forum to get the best help.
How to Post Performance Problems
Make sure that you include code in the appropriate IFCode tags, e.g. [code=sql]<your code here>[/code]. You can find the IFCode tags under the INSERT options when you are writing a post.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search