Non-alphanumeric Search Frustration

  • Eric M Russell (1/4/2017)


    If the user entered codes are not relationally tied to any other table, then what are the ramifications of them being entered "wrong" ?

    I guess is like phone numbers. The numbers might not be relationally tied, but you wouldn't expect that @$%^ would be a valid number.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Brandie Tarvin (1/4/2017)


    Luis Cazares (1/4/2017)


    Does this works?

    Tip: don't try to do everything in a single pattern. Sometimes is easier to use several patterns.

    NICE, Luis. And I can use a REVERSE(RTRIM(QuestionableColumn)) to check the last character. I didn't think about that. Thanks.

    No need to use functions on the column (reverse is expensive). The following works even if there are trailing spaces.

    SELECT *

    FROM #MyTemp

    WHERE QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%'

    OR QuestionableColumn LIKE '-%'

    OR QuestionableColumn LIKE '%-'

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Eric M Russell (1/4/2017)


    If the user entered codes are not relationally tied to any other table, then what are the ramifications of them being entered "wrong" ?

    As I said, these are not really codes. They are more like order numbers, but not just numbers. The table they are getting entered in is a core table of the system and these items are then pulled out of this table to feed into financial systems, customer systems, other product systems. Most of these systems are very picky about what is allowed in their fields and reject non-standard characters. Or have their own constraints.

    Because this is a vended system, used by other companies, we cannot change their schema or dictate what this column allows. Other companies may very well allow special characters. All we can do is compensate for it so downstream processes stop freaking out. Hence my creation of a data quality notification for our internal operations people so they can fix the problem.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I think Luis' solution is the way to go. To answer your question about the trailing hyphen - you could go with:

    AND QuestionableColumn NOT LIKE '%-'

    As Drew mentioned, this will treat the hyphen like the last character even if there's trailing spaces (Unlike RIGHT(<column>,1) which is why Thom had to use RTRIM).

    I'm going to change the sample data to use a perm table (you'll see why in a moment). Let's use:

    USE tempdb

    GO

    IF (SELECT OBJECT_ID('tempdb.dbo.MyTemp')) IS NOT NULL DROP TABLE dbo.MyTemp;

    CREATE TABLE dbo.MyTemp

    (

    SomeID int identity NOT NULL,

    QuestionableColumn char(20) NOT NULL DEFAULT('')

    );

    INSERT INTO dbo.MyTemp (QuestionableColumn) VALUES

    ('ABC1GKUCEEF7AR169582'), --Good value

    ('ABC2G1WG5E37D1157775'), --Good value

    ('BCD3GTU2TEC9FG119962'), --Good value

    ('- '),

    (' - '),

    ('000000 '), --Good value

    ('--- '),

    ('-7Z559677 '),

    ('...ZMY1708617 '),

    (';DW238831 '),

    ('? '),

    ('[AA007462 '),

    ('`FFF-B4010103 '),

    ('{2} '),

    ('000000 '), --Good value

    ('000000- '), --Alan.B added for testing...

    ('123459678 '), --Good value

    ('ABCDEFGHIJ '), --Good value

    ('ABC-1245Z'); --Good value

    The way I'd get valid data would be:

    SELECT SomeID, QuestionableColumn

    FROM dbo.MyTemp

    WHERE NOT

    (QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%' OR

    QuestionableColumn LIKE '-%' OR

    QuestionableColumn LIKE '%-');

    Switching gears a little to performance...

    This is obviously not SARGable. If I had a lot of rows, needed to retain the bad data, but frequently need to grab the good rows (the story of my life as an ETL guy) I would want a filtered index. Since we can't use LIKE in filtered indexes I would create an indexed view...

    CREATE VIEW dbo.vMyTemp WITH SCHEMABINDING AS

    SELECT SomeID, QuestionableColumn

    FROM dbo.MyTemp

    WHERE NOT

    (QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%' OR

    QuestionableColumn LIKE '-%' OR

    QuestionableColumn LIKE '%-');

    GO

    CREATE UNIQUE CLUSTERED INDEX uci_vMyTemp ON dbo.vMyTemp(QuestionableColumn, someID);

    GO

    Now I have an filtered index available for the optimizer use if it chooses (by filtered index I mean a filtered subset of only valid data). My indexed view enabled me to make this query SARGable:

    SELECT SomeID, QuestionableColumn

    FROM dbo.MyTemp

    WHERE NOT

    (QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%' OR

    QuestionableColumn LIKE '-%' OR

    QuestionableColumn LIKE '%-')

    AND QuestionableColumn LIKE '[A-Za-z0-9]%';

    Note that this get's me an index seek:

    SELECT QuestionableColumn

    FROM dbo.vMyTemp WITH (NOEXPAND) -- I included for testing

    WHERE QuestionableColumn LIKE '[A-Za-z0-9]%';

    Sorry for the off-topic rant; just some food for thought.

    Update/Edit: Note that I started this reply before the Brandi's reply that begins at pg 2 of this thread.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001

  • Alan.B (1/4/2017)


    Sorry for the off-topic rant; just some food for thought.

    Not at all off topic. You made some very good points. As did Luis about using the functions.

    Thank you both for continuing to post.

    EDIT: As much as I would love to do a VIEW, I have an issue with not being able to create new things on a vended database solution. WITH SCHEMA_BINDING would definitely screw up our vendor and cause all sorts of political problems in the office. So I'm stuck with the non-SARGable option.

    Still, this was a great learning experience and many good points were made. Thanks, everyone.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Viewing 5 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply