Ignore reading signs

  • I am in need of a way to ignore reading signs, for example:

    - I search on (like?) 921996 it will find the record with 92.1996 or 1219!96

    Is there a way to achieve this?

  • not efficiently, because you would have to manipulate each column value.

    what are we doing? stripping out just two specific values? stripping all non numeric characters? what about alpha characters?

    with a function on a column name(rated one of the top 3 bad practices),

    it will require a table scan every time you query, can be costly time wise and not very efficient...it's a table scan, pulling all data into memory so it can be manipulated to strip out characters.

    on a trivial table like my example below, it's quick enough to not be relevant, but do this to a million row table,and your performance drops dramatically into tens of minutes, which is why I mention the alternatives below

    IF OBJECT_ID('tempdb.[dbo].[#Temp]') IS NOT NULL

    DROP TABLE [dbo].[#Temp]

    GO

    CREATE TABLE [dbo].[#Temp] (

    [ID] INT NOT NULL,

    [Animal] VARCHAR(30) NOT NULL,

    [SignToScan] VARCHAR(30) NOT NULL)

    INSERT INTO [#Temp]([ID],[Animal],[SignToScan])

    SELECT '34','parrot','1219!96' UNION ALL

    SELECT '52','ground hog','92.1996' UNION ALL

    SELECT '47','marmoset','' UNION ALL

    SELECT '85','chameleon','9219!96' UNION ALL

    SELECT '84','whale','14.22!54' UNION ALL

    SELECT '182','tapir','9m219!96'

    --Two different examples below

    --replace only two specific ones

    SELECT *,REPLACE(REPLACE([SignToScan],'.',''),'!','')

    FROM [#Temp]

    WHERE REPLACE(REPLACE([SignToScan],'.',''),'!','') LIKE '921996%'

    --replace anything non-numeric, so it strips out the 'm' in the tapir example

    SELECT *

    FROM [#Temp]

    CROSS APPLY master.dbo.[StripNonNumeric_ITVF]([SignToScan]) AS [snni]

    WHERE [snni].[CleanedText] LIKE '921996%'

    so to be efficient, you want another column with the "Cleaned" text.

    can you either clean the actual column, or add a new column?

    a permanent one with the clean data is best, then you can easily index it.

    An alternative is you could try to add a persisted, calculated column which strips the characters out, and then try to index tat.

    yet another alternative is to create a view with the data all cleaned up, and then add an index on the view.

    which ones work, and the way to tackle it, depends on your real data and requirements

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • The following is an example of Lowell's suggestion about implementing a computed column (ie: NonSignedProductCode). You can then create an index on that column and use it for searching.

    Whether computed column is persisted is optional and debatable. I'm thinking that if the column is indexed, it can be non-persisted without a performance impact, if it's not included in the resultset.

    CREATE TABLE Product

    (

    ProductCode VARCHAR(20) NOT NULL,

    NonSignedProductCode

    AS CAST(REPLACE(REPLACE(ProductCode,'.',''),'!','') AS VARCHAR(20))

    );

    CREATE INDEX IX_NonSignedProductCode

    ON dbo.Product ( NonSignedProductCode );

    SELECT ProductCode FROM Product WHERE NonSignedProductCode = '997223';

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • peter 67432 (12/28/2016)


    I am in need of a way to ignore reading signs, for example:

    - I search on (like?) 921996 it will find the record with 92.1996 or 1219!96

    Is there a way to achieve this?

    It it's uber important, create a persisted computed column on the table that strips all non digit characters from the column and use that to search on.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the help, I will add a computed column.

  • peter 67432 (1/2/2017)


    Thanks for the help, I will add a computed column.

    Like Jeff said, consider making it a persisted one.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply