Searching for text

Question

Searching for text

mick burden

SSCarpal Tunnel

Points: 4432
More actions
May 30, 2017 at 5:59 pm

#402107

High everyone,
I've a simple table that holds UK postcodes and I want to do a search on the postcode. The thing is the same postcode could be stored in different formats due to who ever imputed the data, as an example the postcode 'SW183FD' could appear as 'SW18 3FD' or 'SW183FD' or 'SW 183 FD'. I know I could have set rules in the data entry coding to take out the spaces before the postcode is saved to the table, which would have made searching easier, unfortunately I've inherited the table from someone else and there was no such ruling in place.
I did try 'SELECT * FROM POSTCODES WHERE [POSTCODE} LIKE '%SW183FD%' but that obviously doesnt work. Can anyone suggest a way round this?

Viewing 13 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Ed Wagner SSC Guru Points: 287026 More actions · Answer 1

There was another topic similar to this one recently, but I can't find it.

If it were me, I'd create a persisted computed column (populated with the spaces removed) query it instead. I don't know much about UK postal codes other than what I've read online, so I picked a length of 12 for the length and inserted a bunch of different possibilities for the data.

IF OBJECT_ID('tempdb.dbo.Zips', 'u') IS NOT NULL DROP TABLE dbo.Zips; CREATE TABLE dbo.Zips ( ID Integer identity(1, 1), CONSTRAINT Zips_PK PRIMARY KEY (ID), PostalCode Varchar(12), CleanedCode AS REPLACE(PostalCode, ' ', '') PERSISTED );

INSERT INTO dbo.Zips(PostalCode) VALUES('410000'), ('100 000'), (' 222 333 '), ('N6F 8D1'), ('SW 18 3FD'), ('SW 18 3F D'), ('SW1 83 FD'), ('SW18 3FD'), ('SW1 8 3FD');

Then you can query against CleanedCode using search values without spaces.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

mick burden SSCarpal Tunnel Points: 4432 More actions · Answer 2

Thanks Ed for that snippet, I can see many uses for it, unfortunately it's not quite what I was looking for.

DesNorton SSC-Insane Points: 24638 More actions · Answer 3

Ed's idea of a new persisted column is your best bet. You can then even index the column for better performance when seraching.

But, it seems like you are looking for a non-performant query to do the work. So here goes
The fact that you are doing a wildcard search at the start of your string means that SQL has to do a full table/index scan to find your data anyway.
SELECT * FROM POSTCODES WHERE REPLACE([POSTCODE], ' ', '') LIKE '%SW183FD%';

How to post data/code on a forum to get the best help.

Ed Wagner SSC Guru Points: 287026 More actions · Answer 4

DesNorton - Wednesday, May 31, 2017 12:28 AM
Ed's idea of a new persisted column is your best bet. You can then even index the column for better performance when seraching.
But, it seems like you are looking for a non-performant query to do the work. So here goes
The fact that you are doing a wildcard search at the start of your string means that SQL has to do a full table/index scan to find your data anyway.
SELECT * FROM POSTCODES WHERE REPLACE([POSTCODE], ' ', '') LIKE '%SW183FD%';

Thanks, Des.

Mick, what do you have in mind?

Whatever solution you end up with, look at the reads and execution plans of the candidates. If you're searching for addresses, then it probably isn't a simple lookup table and you're going to have a lot of rows. Your solution should scale well so it doesn't have to be rewritten later.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

mick burden SSCarpal Tunnel Points: 4432 More actions · Answer 5

Many thanks gentlemen, both solutions look good to me. I'm still learning SQL and these snippets are a useful way of adding to my knowledge. I'm going to try both methods on my project to see which performs better, before settling on the solution. I'm grateful for the quick response to my question.

Regards

Mick

Ed Wagner SSC Guru Points: 287026 More actions · Answer 6

mick burden - Wednesday, May 31, 2017 9:24 AM
Many thanks gentlemen, both solutions look good to me. I'm still learning SQL and these snippets are a useful way of adding to my knowledge. I'm going to try both methods on my project to see which performs better, before settling on the solution. I'm grateful for the quick response to my question.
Regards
Mick

Don't forget to add a nonclustered index to the persisted computed column. I expect it'll fly.

Tally Tables - Performance Personified
String Splitting with True Performance
Best practices on how to ask questions

mick burden SSCarpal Tunnel Points: 4432 More actions · Answer 7

mick burden

SSCarpal Tunnel

Points: 4432

May 31, 2017 at 11:41 am

#1944787

Will do, thanks Ed

Eric M Russell SSC Guru Points: 125620 More actions · Answer 8

Ed Wagner - Wednesday, May 31, 2017 5:16 AM
DesNorton - Wednesday, May 31, 2017 12:28 AM
Ed's idea of a new persisted column is your best bet. You can then even index the column for better performance when seraching.
But, it seems like you are looking for a non-performant query to do the work. So here goes
The fact that you are doing a wildcard search at the start of your string means that SQL has to do a full table/index scan to find your data anyway.
SELECT * FROM POSTCODES WHERE REPLACE([POSTCODE], ' ', '') LIKE '%SW183FD%';
Thanks, Des.
Mick, what do you have in mind?
Whatever solution you end up with, look at the reads and execution plans of the candidates. If you're searching for addresses, then it probably isn't a simple lookup table and you're going to have a lot of rows. Your solution should scale well so it doesn't have to be rewritten later.

I'd suggest the indexed computed column as well.
However, if we go the route searching on the original un-scrubbed column, and we're willing to assume that the users are not going to enter leading spaces, then leaving off the % at the front of the LIKE expression will at least perform a partial index scan.

SELECT * FROM POSTCODES WHERE [POSTCODE] LIKE 'SW%183%FD%';

"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

DesNorton SSC-Insane Points: 24638 More actions · Answer 9

mick burden - Tuesday, May 30, 2017 5:59 PM
High everyone,
I've a simple table that holds UK postcodes and I want to do a search on the postcode. The thing is the same postcode could be stored in different formats due to who ever imputed the data, as an example the postcode 'SW183FD' could appear as 'SW18 3FD' or 'SW183FD' or 'SW 183 FD'. I know I could have set rules in the data entry coding to take out the spaces before the postcode is saved to the table, which would have made searching easier, unfortunately I've inherited the table from someone else and there was no such ruling in place.
I did try 'SELECT * FROM POSTCODES WHERE [POSTCODE} LIKE '%SW183FD%' but that obviously doesnt work. Can anyone suggest a way round this?

You could still add rules to strip the spaces before saving the postcode.

Then fix the existing data
UPDATE POSTCODES SET [POSTCODE] = REPLACE([POSTCODE], ' ', '');

If you don't already have one, add an index on the [POSTCODE] column.

Finally, you can have pretty decent index seeks with something like this
SELECT * FROM POSTCODES WHERE [POSTCODE] = 'SW183FD';

OR
SELECT * FROM POSTCODES WHERE [POSTCODE] LIKE 'SW18%';

How to post data/code on a forum to get the best help.

mick burden SSCarpal Tunnel Points: 4432 More actions · Answer 10

Thank you again Gentlemen, all those suggestions are great and gives me more to play with. I hadn't realised I could have fixed the 'problem' data as easy as that. It's one of the great things about this forum, that not only do you members post possible solutions but you add why this would work.

Regards Mick

Jeff Moden SSC Guru Points: 1004683 More actions · Answer 11

mick burden - Tuesday, May 30, 2017 6:37 PM
Thanks Ed for that snippet, I can see many uses for it, unfortunately it's not quite what I was looking for.

Not sure what you're looking for then, Mick. Would you clarify, please?

As a bit of a sidebar, I use the same wonderful trick for many things. You just need a function to convert (delete the spaces, whatever) what the user inputs as search criteria and the search will be lightning fast.

{Edit} Guess I posted the nearly the same time as you just did. Are you all set for sure?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

mick burden SSCarpal Tunnel Points: 4432 More actions · Answer 12

mick burden

SSCarpal Tunnel

Points: 4432

June 1, 2017 at 8:40 am

#1944911

Yes Jeff, Thanks