Searching for text

  • High everyone,
    I've a simple table that holds UK postcodes and I want to do a search on the postcode. The thing is the same postcode could be stored in different formats due to who ever imputed the data, as an example the postcode 'SW183FD' could appear as 'SW18 3FD' or 'SW183FD' or 'SW 183 FD'. I know I could have set rules in the data entry coding to take out the spaces before the postcode is saved to the table, which would have made searching easier, unfortunately I've inherited the table from someone else and there was no such ruling in place.
    I did try 'SELECT * FROM POSTCODES WHERE [POSTCODE} LIKE '%SW183FD%' but that obviously doesnt work. Can anyone suggest a way round this?

  • There was another topic similar to this one recently, but I can't find it.

    If it were me, I'd create a persisted computed column (populated with the spaces removed) query it instead.  I don't know much about UK postal codes other than what I've read online, so I picked a length of 12 for the length and inserted a bunch of different possibilities for the data.

    IF OBJECT_ID('tempdb.dbo.Zips', 'u') IS NOT NULL DROP TABLE dbo.Zips;
    CREATE TABLE dbo.Zips (
    ID Integer identity(1, 1),
    CONSTRAINT Zips_PK PRIMARY KEY (ID),
    PostalCode Varchar(12),
    CleanedCode AS REPLACE(PostalCode, ' ', '') PERSISTED );

    INSERT INTO dbo.Zips(PostalCode)
    VALUES('410000'),
       ('100 000'),
       (' 222 333 '),
       ('N6F 8D1'),
       ('SW 18 3FD'),
       ('SW 18 3F D'),
       ('SW1 83 FD'),
       ('SW18 3FD'),
       ('SW1 8 3FD');

    Then you can query against CleanedCode using search values without spaces.

  • Thanks Ed for that snippet, I can see many uses for it, unfortunately it's not quite what I was looking for.

  • Ed's idea of a new persisted column is your best bet.  You can then even  index the column for better performance when seraching.

    But, it seems like you are looking for a non-performant query to do the work.  So here goes
    The fact that you are doing a wildcard search at the start of your string means that SQL has to do a full table/index scan to find your data anyway.

    SELECT *
    FROM POSTCODES
    WHERE REPLACE([POSTCODE], ' ', '') LIKE '%SW183FD%';

  • DesNorton - Wednesday, May 31, 2017 12:28 AM

    Ed's idea of a new persisted column is your best bet.  You can then even  index the column for better performance when seraching.

    But, it seems like you are looking for a non-performant query to do the work.  So here goes
    The fact that you are doing a wildcard search at the start of your string means that SQL has to do a full table/index scan to find your data anyway.

    SELECT *
    FROM POSTCODES
    WHERE REPLACE([POSTCODE], ' ', '') LIKE '%SW183FD%';

    Thanks, Des.

    Mick, what do you have in mind?

    Whatever solution you end up with, look at the reads and execution plans of the candidates.  If you're searching for addresses, then it probably isn't a simple lookup table and you're going to have a lot of rows.  Your solution should scale well so it doesn't have to be rewritten later.

  • Many thanks gentlemen, both solutions look good to me. I'm still learning SQL and these snippets are a useful way of adding to my knowledge. I'm going to try both methods on my project to see which performs better, before settling on the solution. I'm grateful for the quick response to my question. 

    Regards

    Mick

  • mick burden - Wednesday, May 31, 2017 9:24 AM

    Many thanks gentlemen, both solutions look good to me. I'm still learning SQL and these snippets are a useful way of adding to my knowledge. I'm going to try both methods on my project to see which performs better, before settling on the solution. I'm grateful for the quick response to my question. 

    Regards

    Mick

    Don't forget to add a nonclustered index to the persisted computed column.  I expect it'll fly.

  • Will do, thanks Ed

  • Ed Wagner - Wednesday, May 31, 2017 5:16 AM

    DesNorton - Wednesday, May 31, 2017 12:28 AM

    Ed's idea of a new persisted column is your best bet.  You can then even  index the column for better performance when seraching.

    But, it seems like you are looking for a non-performant query to do the work.  So here goes
    The fact that you are doing a wildcard search at the start of your string means that SQL has to do a full table/index scan to find your data anyway.

    SELECT *
    FROM POSTCODES
    WHERE REPLACE([POSTCODE], ' ', '') LIKE '%SW183FD%';

    Thanks, Des.

    Mick, what do you have in mind?

    Whatever solution you end up with, look at the reads and execution plans of the candidates.  If you're searching for addresses, then it probably isn't a simple lookup table and you're going to have a lot of rows.  Your solution should scale well so it doesn't have to be rewritten later.

    I'd suggest the indexed computed column as well.
    However, if we go the route searching on the original un-scrubbed column, and we're willing to assume that the users are not going to enter leading spaces, then leaving off the % at the front of the LIKE expression will at least perform a partial index scan.


    SELECT *
    FROM POSTCODES
    WHERE [POSTCODE] LIKE 'SW%183%FD%';

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • mick burden - Tuesday, May 30, 2017 5:59 PM

    High everyone,
    I've a simple table that holds UK postcodes and I want to do a search on the postcode. The thing is the same postcode could be stored in different formats due to who ever imputed the data, as an example the postcode 'SW183FD' could appear as 'SW18 3FD' or 'SW183FD' or 'SW 183 FD'. I know I could have set rules in the data entry coding to take out the spaces before the postcode is saved to the table, which would have made searching easier, unfortunately I've inherited the table from someone else and there was no such ruling in place.
    I did try 'SELECT * FROM POSTCODES WHERE [POSTCODE} LIKE '%SW183FD%' but that obviously doesnt work. Can anyone suggest a way round this?

    You could still add rules to strip the spaces before saving the postcode.

    Then fix the existing data

    UPDATE POSTCODES
    SET [POSTCODE] = REPLACE([POSTCODE], ' ', '');

    If you don't already have one, add an index on the [POSTCODE] column.

    Finally, you can have pretty decent index seeks with something like this

    SELECT *
    FROM POSTCODES
    WHERE [POSTCODE] = 'SW183FD';

    OR

    SELECT *
    FROM POSTCODES
    WHERE [POSTCODE] LIKE 'SW18%';

  • Thank you again Gentlemen, all those suggestions are great and gives me more to play with. I hadn't realised I could have fixed the 'problem' data as easy as that. It's one of the great things about this forum, that not only do you members post possible solutions but you add why this would work.

    Regards Mick

  • mick burden - Tuesday, May 30, 2017 6:37 PM

    Thanks Ed for that snippet, I can see many uses for it, unfortunately it's not quite what I was looking for.

    Not sure what you're looking for then, Mick.  Would you clarify, please?

    As a bit of a sidebar,  I use the same wonderful trick for many things.  You just need a function to convert (delete the spaces, whatever) what the user inputs as search criteria and the search will be lightning fast.

    {Edit} Guess I posted the nearly the same time as you just did.  Are you all set for sure?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes Jeff, Thanks

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply