SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Find a sub string in a column based on values in a reference table


Find a sub string in a column based on values in a reference table

Author
Message
fparker 20089
fparker 20089
Grasshopper
Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)

Group: General Forum Members
Points: 21 Visits: 37
Good morning,
I am not quite sure how to approach the problem. I have a table for customer and vendor with addresses in them. In many cases the city name is not in the right place. I have a table with a list of valid cities by country.

What I want to do is for each customer or vendor look in one column and see if there is a city name in there based on my reference table.

So if in Address it has '123 Ocean BL Long Beach' I want to be able to find Long Beach in there.

Thoughts on how I go after this?

TIA
John Mitchell-245523
John Mitchell-245523
SSC Guru
SSC Guru (81K reputation)SSC Guru (81K reputation)SSC Guru (81K reputation)SSC Guru (81K reputation)SSC Guru (81K reputation)SSC Guru (81K reputation)SSC Guru (81K reputation)SSC Guru (81K reputation)

Group: General Forum Members
Points: 81084 Visits: 17956
SELECT v.Address, c.City
FROM Vendors v
LEFT JOIN Cities c ON v.Address = '%' + c.City + '%'

Edit:- added c.City to SELECT list.

John
sgmunson
sgmunson
SSC-Forever
SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)SSC-Forever (43K reputation)

Group: General Forum Members
Points: 43692 Visits: 5422
John Mitchell-245523 - Thursday, December 21, 2017 8:41 AM
SELECT v.Address
FROM Vendors v
LEFT JOIN Cities c ON v.Address = '%' + c.City + '%'


John

Important that you understand that with a large number of rows, this will not perform well, because a normal index will not help. You'd need a FULL TEXT INDEX to be able to get that kind of search to perform better, which would take a rather sizable chunk of disk space and a lot of complexity. However, if this is a one-time search, then it takes as long as it takes.


Steve
‌(aka sgmunson)
Smile Smile Smile
Health & Nutrition
fparker 20089
fparker 20089
Grasshopper
Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)

Group: General Forum Members
Points: 21 Visits: 37
I figured performance would be sluggish.
This isn't a regular production run. We are moving data to a new system and would like to clean some stuff up. The legacy systems did not account for non USA based addresses so a lot of them are buggered up with city in the address line and country in the state or zip code line.
Performance is not an issue. It is being run on a non prod server and if it takes a week to run that is fine.
fparker 20089
fparker 20089
Grasshopper
Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)

Group: General Forum Members
Points: 21 Visits: 37
This is not working with the % but it does work without it.

SELECT ,[Name] ,[Address] ,[Address2] ,c.[City] ,r.[city_name] ,[State] ,[ZipCode] ,[TerritoryCode] ,[CountryCode]
FROM [Customer] c
left join [xRefCityNames] r ON c.Address2 = r.[city_name]
where c.city = '' and address2 = 'new delhi'

This works but when I use ON c.Address2 = '%' + r.[city_name] + '%'
I get NULL for city_name.

Thoughts?
drew.allen
drew.allen
SSC-Dedicated
SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)

Group: General Forum Members
Points: 37090 Visits: 13737
fparker 20089 - Thursday, December 21, 2017 9:31 AM
This is not working with the % but it does work without it.

SELECT ,[Name] ,[Address] ,[Address2] ,c.[City] ,r.[city_name] ,[State] ,[ZipCode] ,[TerritoryCode] ,[CountryCode]
FROM [Customer] c
left join [xRefCityNames] r ON c.Address2 = r.[city_name]
where c.city = '' and address2 = 'new delhi'

This works but when I use ON c.Address2 = '%' + r.[city_name] + '%'
I get NULL for city_name.

Thoughts?

Yes, it should be LIKE not =
ON c.Address2 LIKE '%' + r.[city_name] + '%' 


Drew


J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
How to post data/code on a forum to get the best help.
How to Post Performance Problems
Make sure that you include code in the appropriate IFCode tags, e.g. [code=sql]<your code here>[/code]. You can find the IFCode tags under the INSERT options when you are writing a post.
fparker 20089
fparker 20089
Grasshopper
Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)

Group: General Forum Members
Points: 21 Visits: 37
That does work except I will get 2 results for that one. I have both New Dehli and Dehli as cities which are both in my city list for India. Not bad for a cleanup effort though.
drew.allen
drew.allen
SSC-Dedicated
SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)

Group: General Forum Members
Points: 37090 Visits: 13737
fparker 20089 - Thursday, December 21, 2017 9:52 AM
That does work except I will get 2 results for that one. I have both New Dehli and Dehli as cities which are both in my city list for India. Not bad for a cleanup effort though.

That is unavoidable when trying to clean up data. You will get false positives and may miss records. You could rewrite your query to only give one, but that might cause you to miss more records.

Here is one rewrite that will only ever give you one possible match.
SELECT [Name] ,[Address] ,[Address2] ,c.[City] ,r.[city_name] ,[State] ,[ZipCode] ,[TerritoryCode] ,[CountryCode]
FROM [Customer] c
CROSS APPLY
(
SELECT TOP(1) r.city_name
FROM [xRefCityNames] r
WHERE c.Address2 LIKE '%' + r.[city_name] + '%'
ORDER BY LEN(r.city_name) DESC
) r


Drew


J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
How to post data/code on a forum to get the best help.
How to Post Performance Problems
Make sure that you include code in the appropriate IFCode tags, e.g. [code=sql]<your code here>[/code]. You can find the IFCode tags under the INSERT options when you are writing a post.
fparker 20089
fparker 20089
Grasshopper
Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)Grasshopper (21 reputation)

Group: General Forum Members
Points: 21 Visits: 37
Thanks a bunch. I think I am good, this gives me a better look than I had before. Some of these will need to be eyeballed anyway so having a few extras won't hurt.

I think taking that top on the longest one will work.
drew.allen
drew.allen
SSC-Dedicated
SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)SSC-Dedicated (37K reputation)

Group: General Forum Members
Points: 37090 Visits: 13737
fparker 20089 - Thursday, December 21, 2017 10:17 AM
Thanks a bunch. I think I am good, this gives me a better look than I had before. Some of these will need to be eyeballed anyway so having a few extras won't hurt.

I think taking that top on the longest one will work.

That's exactly what the CROSS APPLY is doing. You can also use a CTE with ROW_NUMBER().

Drew

J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
How to post data/code on a forum to get the best help.
How to Post Performance Problems
Make sure that you include code in the appropriate IFCode tags, e.g. [code=sql]<your code here>[/code]. You can find the IFCode tags under the INSERT options when you are writing a post.
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum








































































































































































SQLServerCentral


Search