Non-alphanumeric Search Frustration

  • I am trying to search a column in a table for non-alphanumeric characters (.,% etc) so I can report these issues to our end users. But I'm running into a problem. The usual "NOT LIKE %[^a-z0-9A-Z]%" bit is not working quite like I need it to.

    The column in question can have any alphanumeric character in it, or just be comprised of numbers, or just be comprised of letters. Spaces are only allowed at the end (because the column is CHAR, not VARCHAR) and a hyphen is only allowed after the first character. Other than that, all other non-alphanumeric characters are forbidden.

    Here's what I've got (good values are noted and should be ignored):

    IF (SELECT OBJECT_ID('tempdb..#MyTemp')) IS NOT NULL

    DROP TABLE #MyTemp;

    CREATE TABLE #MyTemp (QuestionableColumn CHAR(20));

    INSERT INTO #MyTemp (QuestionableColumn)

    VALUES ('ABC1GKUCEEF7AR169582'), --Good value

    ('ABC2G1WG5E37D1157775'), --Good value

    ('BCD3GTU2TEC9FG119962'), --Good value

    ('- '),

    ('--- '),

    ('-7Z559677 '),

    ('...ZMY1708617 '),

    (';DW238831 '),

    ('? '),

    ('[AA007462 '),

    ('`FFF-B4010103 '),

    ('{2} '),

    ('000000 '), --Good value

    ('123459678 '), --Good value

    ('ABCDEFGHIJ '), --Good value

    ('ABC-1245Z'); --Good value

    SELECT QuestionableColumn

    FROM #MyTemp

    WHERE QuestionableColumn NOT LIKE '%[^a-z0-9A-Z]%';

    --NOT LIKE With the carrot character

    --Produces results with proper characters instead of results with incorrect characters

    SELECT QuestionableColumn

    FROM #MyTemp

    WHERE QuestionableColumn NOT LIKE '%[a-z0-9A-Z]%';

    --NOT LIKE Without the carrot character

    --Produces only results with no alphanumeric characters at all, missing values with both

    SELECT QuestionableColumn

    FROM #MyTemp

    WHERE QuestionableColumn LIKE '%[^a-z0-9A-Z]%';

    --LIKE With the carrot character

    --Produces results with incorrect characters, but includes results that are alpha or numeric only

    SELECT QuestionableColumn

    FROM #MyTemp

    WHERE QuestionableColumn LIKE '%[a-z0-9A-Z]%';

    --LIKE Without the carrot character

    --Correctly produces results with correct characters, but includes results with incorrect characters

    SELECT QuestionableColumn

    FROM #MyTemp

    WHERE RTRIM(QuestionableColumn) LIKE '%[^a-z0-9A-Z]%' ;

    --Mostly works, but still reports hyphen in the middle value.

    I'm so close, but I can't quite figure out how to evaluate for a hyphen as the last or first character, but to ignore it when it's in any other position. I tried using CHARINDEX, but since I don't use that function much, I think I did it wrong because it's still reporting that final value when it shouldn't.

    SELECT QuestionableColumn

    FROM #MyTemp

    WHERE RTRIM(QuestionableColumn) LIKE '%[^a-z0-9A-Z]%'

    OR CHARINDEX('-',QuestionableColumn,1) = 1

    OR CHARINDEX('-',REVERSE(RTRIM(QuestionableColumn)),1) = 1;

    Thoughts?

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I'm a little confused,on one part. You stated that "a hyphen is only allowed after the first character", however that the value "ABC-1245Z" is acceptable. The hyphen is after the 3rd character. Why is this acceptable?

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A (1/4/2017)


    I'm a little confused,on one part. You stated that "a hyphen is only allowed after the first character", however that the value "ABC-1245Z" is acceptable. The hyphen is after the 3rd character. Why is this acceptable?

    I'm speaking in general terms. The hyphen cannot be the first or last character. It can be any other character (allowed after the first character).

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Brandie Tarvin (1/4/2017)


    Thom A (1/4/2017)


    I'm a little confused,on one part. You stated that "a hyphen is only allowed after the first character", however that the value "ABC-1245Z" is acceptable. The hyphen is after the 3rd character. Why is this acceptable?

    I'm speaking in general terms. The hyphen cannot be the first or last character. It can be any other character (allowed after the first character).

    Ahh I see, I read that as that it must be after the 1st character (therefore the second), rather than it can be any character other than the first 🙂

    Does this therefore work?

    SELECT QuestionableColumn

    FROM #MyTemp

    WHERE RTRIM(QuestionableColumn) LIKE '%[^a-z0-9A-Z-]%'

    OR LEFT(QuestionableColumn,1) = '-'

    OR RIGHT(RTRIM(QuestionableColumn),1) = '-';

    Edit, sorry, cannot be the last either

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Does this works?

    IF (SELECT OBJECT_ID('tempdb..#MyTemp')) IS NOT NULL

    DROP TABLE #MyTemp;

    CREATE TABLE #MyTemp (QuestionableColumn CHAR(20));

    INSERT INTO #MyTemp (QuestionableColumn)

    VALUES ('ABC1GKUCEEF7AR169582'), --Good value

    ('ABC2G1WG5E37D1157775'), --Good value

    ('BCD3GTU2TEC9FG119962'), --Good value

    ('- '),

    ('--- '),

    ('-7Z559677 '),

    ('...ZMY1708617 '),

    (';DW238831 '),

    ('? '),

    ('[AA007462 '),

    ('`FFF-B4010103 '),

    ('{2} '),

    ('000000 '), --Good value

    ('123459678 '), --Good value

    ('ABCDEFGHIJ '), --Good value

    ('ABC-1245Z'); --Good value

    SELECT *

    FROM #MyTemp

    WHERE QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%'

    OR QuestionableColumn LIKE '-%'

    Tip: don't try to do everything in a single pattern. Sometimes is easier to use several patterns.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • As a side note, it would be great if all valid codes could be pre-loaded into a primary keyed master table, then data validation could be enforced with a foreign key constraint.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (1/4/2017)


    As a side note, it would be great if all valid codes could be pre-loaded into a primary keyed master table, then data validation could be enforced with a foreign key constraint.

    That did make me think, the OP did say that this seems to be user error, so you could add a check constraint such as:

    USE DevTestDB;

    GO

    CREATE TABLE SampleData (QuestionableColumn CHAR(20)

    CONSTRAINT CK_QuestionColumn

    CHECK (QuestionableColumn NOT LIKE '%[^a-zA-Z0-9 -]%'

    AND LEFT(QuestionableColumn,1) != '-'

    AND RIGHT(RTRIM(QuestionableColumn),1) != '-'));

    GO

    INSERT INTO SampleData

    SELECT 'ABC1GKUCEEF7AR169582';

    GO

    INSERT INTO SampleData

    SELECT 'ABC2G1WG5E37D1157775';

    GO

    INSERT INTO SampleData

    SELECT 'BCD3GTU2TEC9FG119962';

    GO

    INSERT INTO SampleData

    SELECT '- ';

    GO

    INSERT INTO SampleData

    SELECT '--- ';

    GO

    INSERT INTO SampleData

    SELECT '-7Z559677 ';

    GO

    INSERT INTO SampleData

    SELECT '...ZMY1708617 ';

    GO

    INSERT INTO SampleData

    SELECT ';DW238831 ';

    GO

    INSERT INTO SampleData

    SELECT '? ';

    GO

    INSERT INTO SampleData

    SELECT '[AA007462 ';

    GO

    INSERT INTO SampleData

    SELECT '`FFF-B4010103 ';

    GO

    INSERT INTO SampleData

    SELECT '{2} ';

    GO

    INSERT INTO SampleData

    SELECT '000000 ';

    GO

    INSERT INTO SampleData

    SELECT '123459678 ';

    GO

    INSERT INTO SampleData

    SELECT 'ABCDEFGHIJ ';

    GO

    INSERT INTO SampleData

    SELECT 'ABC-1245Z';

    GO

    SELECT *

    FROM SampleData;

    GO

    DROP TABLE SampleData

    GO

    This returns the 7 good value rows only in the SELECT statement:

    QuestionableColumn

    --------------------

    ABC1GKUCEEF7AR169582

    ABC2G1WG5E37D1157775

    BCD3GTU2TEC9FG119962

    000000

    123459678

    ABCDEFGHIJ

    ABC-1245Z

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A (1/4/2017)


    Eric M Russell (1/4/2017)


    As a side note, it would be great if all valid codes could be pre-loaded into a primary keyed master table, then data validation could be enforced with a foreign key constraint.

    That did make me think, the OP did say that this seems to be user error, so you could add a check constraint such as:

    A check constraint would be a better choice than a PK constraint. However, I think that the data needs to be cleansed before implementing constraints.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (1/4/2017)


    Thom A (1/4/2017)


    Eric M Russell (1/4/2017)


    As a side note, it would be great if all valid codes could be pre-loaded into a primary keyed master table, then data validation could be enforced with a foreign key constraint.

    That did make me think, the OP did say that this seems to be user error, so you could add a check constraint such as:

    A check constraint would be a better choice than a PK constraint. However, I think that the data needs to be cleansed before implementing constraints.

    True, I did forget to mention this. Will hopefully stop the issue in the future though once the OP has resolved existing dirty data.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A (1/4/2017)


    Brandie Tarvin (1/4/2017)


    Thom A (1/4/2017)


    I'm a little confused,on one part. You stated that "a hyphen is only allowed after the first character", however that the value "ABC-1245Z" is acceptable. The hyphen is after the 3rd character. Why is this acceptable?

    I'm speaking in general terms. The hyphen cannot be the first or last character. It can be any other character (allowed after the first character).

    Ahh I see, I read that as that it must be after the 1st character (therefore the second), rather than it can be any character other than the first 🙂

    Does this therefore work?

    SELECT QuestionableColumn

    FROM #MyTemp

    WHERE RTRIM(QuestionableColumn) LIKE '%[^a-z0-9A-Z-]%'

    OR LEFT(QuestionableColumn,1) = '-'

    OR RIGHT(RTRIM(QuestionableColumn),1) = '-';

    Edit, sorry, cannot be the last either

    You forgot to exclude internal spaces, too. I came up with a slightly different approach (using the fact that LIKE ignores trailing spaces).

    SELECT *

    FROM #MyTemp

    WHERE QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%'

    OR QuestionableColumn LIKE '-%'

    OR QuestionableColumn LIKE '%-'

    OR QuestionableColumn LIKE '% [^ ]%'

    Drew

    PS: As a note to the OP, you only include one data point with a trailing hyphen, but it also contains a leading hyphen, so people might miss that they haven't coded for a trailing hyphen. And none of your data has an internal space.

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thom A (1/4/2017)


    Brandie Tarvin (1/4/2017)


    Thom A (1/4/2017)


    I'm a little confused,on one part. You stated that "a hyphen is only allowed after the first character", however that the value "ABC-1245Z" is acceptable. The hyphen is after the 3rd character. Why is this acceptable?

    I'm speaking in general terms. The hyphen cannot be the first or last character. It can be any other character (allowed after the first character).

    Ahh I see, I read that as that it must be after the 1st character (therefore the second), rather than it can be any character other than the first 🙂

    Does this therefore work?

    SELECT QuestionableColumn

    FROM #MyTemp

    WHERE RTRIM(QuestionableColumn) LIKE '%[^a-z0-9A-Z-]%'

    OR LEFT(QuestionableColumn,1) = '-'

    OR RIGHT(RTRIM(QuestionableColumn),1) = '-';

    Edit, sorry, cannot be the last either

    Nope. The OR guarantees the good values will be included with the bad.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Luis Cazares (1/4/2017)


    Does this works?

    IF (SELECT OBJECT_ID('tempdb..#MyTemp')) IS NOT NULL

    DROP TABLE #MyTemp;

    CREATE TABLE #MyTemp (QuestionableColumn CHAR(20));

    INSERT INTO #MyTemp (QuestionableColumn)

    VALUES ('ABC1GKUCEEF7AR169582'), --Good value

    ('ABC2G1WG5E37D1157775'), --Good value

    ('BCD3GTU2TEC9FG119962'), --Good value

    ('- '),

    ('--- '),

    ('-7Z559677 '),

    ('...ZMY1708617 '),

    (';DW238831 '),

    ('? '),

    ('[AA007462 '),

    ('`FFF-B4010103 '),

    ('{2} '),

    ('000000 '), --Good value

    ('123459678 '), --Good value

    ('ABCDEFGHIJ '), --Good value

    ('ABC-1245Z'); --Good value

    SELECT *

    FROM #MyTemp

    WHERE QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%'

    OR QuestionableColumn LIKE '-%'

    Tip: don't try to do everything in a single pattern. Sometimes is easier to use several patterns.

    NICE, Luis. And I can use a REVERSE(RTRIM(QuestionableColumn)) to check the last character. I didn't think about that. Thanks.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Eric M Russell (1/4/2017)


    As a side note, it would be great if all valid codes could be pre-loaded into a primary keyed master table, then data validation could be enforced with a foreign key constraint.

    Unfortunately, this particular column is sort of like an order number. It's supposed to be unique and will never have a way of pre-identifying the data.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Just to clarify. The data is coming from a vendor-created database. I cannot make changes to it. I can, however, send emails to the users saying "Fix this item, it has bad data in it." Which is what I'm working on.

    So no constraints. No PKs. No schema changes allowed.

    The note about internal spaces is well taken, but I believe the initial search takes care of that. ('%[^a-z0-9A-Z]%' translates to any character not in the A-Z or 0-9 range, which includes spaces. I do an RTRIM on the other side of the operator to truncate trailing spaces and I'm good.)

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • If the user entered codes are not relationally tied to any other table, then what are the ramifications of them being entered "wrong" ?

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply