January 4, 2017 at 7:41 am
I am trying to search a column in a table for non-alphanumeric characters (.,% etc) so I can report these issues to our end users. But I'm running into a problem. The usual "NOT LIKE %[^a-z0-9A-Z]%" bit is not working quite like I need it to.
The column in question can have any alphanumeric character in it, or just be comprised of numbers, or just be comprised of letters. Spaces are only allowed at the end (because the column is CHAR, not VARCHAR) and a hyphen is only allowed after the first character. Other than that, all other non-alphanumeric characters are forbidden.
Here's what I've got (good values are noted and should be ignored):
IF (SELECT OBJECT_ID('tempdb..#MyTemp')) IS NOT NULL
DROP TABLE #MyTemp;
CREATE TABLE #MyTemp (QuestionableColumn CHAR(20));
INSERT INTO #MyTemp (QuestionableColumn)
VALUES ('ABC1GKUCEEF7AR169582'), --Good value
('ABC2G1WG5E37D1157775'), --Good value
('BCD3GTU2TEC9FG119962'), --Good value
('- '),
('--- '),
('-7Z559677 '),
('...ZMY1708617 '),
(';DW238831 '),
('? '),
('[AA007462 '),
('`FFF-B4010103 '),
('{2} '),
('000000 '), --Good value
('123459678 '), --Good value
('ABCDEFGHIJ '), --Good value
('ABC-1245Z'); --Good value
SELECT QuestionableColumn
FROM #MyTemp
WHERE QuestionableColumn NOT LIKE '%[^a-z0-9A-Z]%';
--NOT LIKE With the carrot character
--Produces results with proper characters instead of results with incorrect characters
SELECT QuestionableColumn
FROM #MyTemp
WHERE QuestionableColumn NOT LIKE '%[a-z0-9A-Z]%';
--NOT LIKE Without the carrot character
--Produces only results with no alphanumeric characters at all, missing values with both
SELECT QuestionableColumn
FROM #MyTemp
WHERE QuestionableColumn LIKE '%[^a-z0-9A-Z]%';
--LIKE With the carrot character
--Produces results with incorrect characters, but includes results that are alpha or numeric only
SELECT QuestionableColumn
FROM #MyTemp
WHERE QuestionableColumn LIKE '%[a-z0-9A-Z]%';
--LIKE Without the carrot character
--Correctly produces results with correct characters, but includes results with incorrect characters
SELECT QuestionableColumn
FROM #MyTemp
WHERE RTRIM(QuestionableColumn) LIKE '%[^a-z0-9A-Z]%' ;
--Mostly works, but still reports hyphen in the middle value.
I'm so close, but I can't quite figure out how to evaluate for a hyphen as the last or first character, but to ignore it when it's in any other position. I tried using CHARINDEX, but since I don't use that function much, I think I did it wrong because it's still reporting that final value when it shouldn't.
SELECT QuestionableColumn
FROM #MyTemp
WHERE RTRIM(QuestionableColumn) LIKE '%[^a-z0-9A-Z]%'
OR CHARINDEX('-',QuestionableColumn,1) = 1
OR CHARINDEX('-',REVERSE(RTRIM(QuestionableColumn)),1) = 1;
Thoughts?
January 4, 2017 at 7:50 am
I'm a little confused,on one part. You stated that "a hyphen is only allowed after the first character", however that the value "ABC-1245Z" is acceptable. The hyphen is after the 3rd character. Why is this acceptable?
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
January 4, 2017 at 7:54 am
Thom A (1/4/2017)
I'm a little confused,on one part. You stated that "a hyphen is only allowed after the first character", however that the value "ABC-1245Z" is acceptable. The hyphen is after the 3rd character. Why is this acceptable?
I'm speaking in general terms. The hyphen cannot be the first or last character. It can be any other character (allowed after the first character).
January 4, 2017 at 8:00 am
Brandie Tarvin (1/4/2017)
Thom A (1/4/2017)
I'm a little confused,on one part. You stated that "a hyphen is only allowed after the first character", however that the value "ABC-1245Z" is acceptable. The hyphen is after the 3rd character. Why is this acceptable?I'm speaking in general terms. The hyphen cannot be the first or last character. It can be any other character (allowed after the first character).
Ahh I see, I read that as that it must be after the 1st character (therefore the second), rather than it can be any character other than the first 🙂
Does this therefore work?
SELECT QuestionableColumn
FROM #MyTemp
WHERE RTRIM(QuestionableColumn) LIKE '%[^a-z0-9A-Z-]%'
OR LEFT(QuestionableColumn,1) = '-'
OR RIGHT(RTRIM(QuestionableColumn),1) = '-';
Edit, sorry, cannot be the last either
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
January 4, 2017 at 8:09 am
Does this works?
IF (SELECT OBJECT_ID('tempdb..#MyTemp')) IS NOT NULL
DROP TABLE #MyTemp;
CREATE TABLE #MyTemp (QuestionableColumn CHAR(20));
INSERT INTO #MyTemp (QuestionableColumn)
VALUES ('ABC1GKUCEEF7AR169582'), --Good value
('ABC2G1WG5E37D1157775'), --Good value
('BCD3GTU2TEC9FG119962'), --Good value
('- '),
('--- '),
('-7Z559677 '),
('...ZMY1708617 '),
(';DW238831 '),
('? '),
('[AA007462 '),
('`FFF-B4010103 '),
('{2} '),
('000000 '), --Good value
('123459678 '), --Good value
('ABCDEFGHIJ '), --Good value
('ABC-1245Z'); --Good value
SELECT *
FROM #MyTemp
WHERE QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%'
OR QuestionableColumn LIKE '-%'
Tip: don't try to do everything in a single pattern. Sometimes is easier to use several patterns.
January 4, 2017 at 9:35 am
As a side note, it would be great if all valid codes could be pre-loaded into a primary keyed master table, then data validation could be enforced with a foreign key constraint.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
January 4, 2017 at 9:52 am
Eric M Russell (1/4/2017)
As a side note, it would be great if all valid codes could be pre-loaded into a primary keyed master table, then data validation could be enforced with a foreign key constraint.
That did make me think, the OP did say that this seems to be user error, so you could add a check constraint such as:
USE DevTestDB;
GO
CREATE TABLE SampleData (QuestionableColumn CHAR(20)
CONSTRAINT CK_QuestionColumn
CHECK (QuestionableColumn NOT LIKE '%[^a-zA-Z0-9 -]%'
AND LEFT(QuestionableColumn,1) != '-'
AND RIGHT(RTRIM(QuestionableColumn),1) != '-'));
GO
INSERT INTO SampleData
SELECT 'ABC1GKUCEEF7AR169582';
GO
INSERT INTO SampleData
SELECT 'ABC2G1WG5E37D1157775';
GO
INSERT INTO SampleData
SELECT 'BCD3GTU2TEC9FG119962';
GO
INSERT INTO SampleData
SELECT '- ';
GO
INSERT INTO SampleData
SELECT '--- ';
GO
INSERT INTO SampleData
SELECT '-7Z559677 ';
GO
INSERT INTO SampleData
SELECT '...ZMY1708617 ';
GO
INSERT INTO SampleData
SELECT ';DW238831 ';
GO
INSERT INTO SampleData
SELECT '? ';
GO
INSERT INTO SampleData
SELECT '[AA007462 ';
GO
INSERT INTO SampleData
SELECT '`FFF-B4010103 ';
GO
INSERT INTO SampleData
SELECT '{2} ';
GO
INSERT INTO SampleData
SELECT '000000 ';
GO
INSERT INTO SampleData
SELECT '123459678 ';
GO
INSERT INTO SampleData
SELECT 'ABCDEFGHIJ ';
GO
INSERT INTO SampleData
SELECT 'ABC-1245Z';
GO
SELECT *
FROM SampleData;
GO
DROP TABLE SampleData
GO
This returns the 7 good value rows only in the SELECT statement:
QuestionableColumn
--------------------
ABC1GKUCEEF7AR169582
ABC2G1WG5E37D1157775
BCD3GTU2TEC9FG119962
000000
123459678
ABCDEFGHIJ
ABC-1245Z
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
January 4, 2017 at 10:37 am
Thom A (1/4/2017)
Eric M Russell (1/4/2017)
As a side note, it would be great if all valid codes could be pre-loaded into a primary keyed master table, then data validation could be enforced with a foreign key constraint.That did make me think, the OP did say that this seems to be user error, so you could add a check constraint such as:
A check constraint would be a better choice than a PK constraint. However, I think that the data needs to be cleansed before implementing constraints.
January 4, 2017 at 10:40 am
Luis Cazares (1/4/2017)
Thom A (1/4/2017)
Eric M Russell (1/4/2017)
As a side note, it would be great if all valid codes could be pre-loaded into a primary keyed master table, then data validation could be enforced with a foreign key constraint.That did make me think, the OP did say that this seems to be user error, so you could add a check constraint such as:
A check constraint would be a better choice than a PK constraint. However, I think that the data needs to be cleansed before implementing constraints.
True, I did forget to mention this. Will hopefully stop the issue in the future though once the OP has resolved existing dirty data.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
January 4, 2017 at 11:02 am
Thom A (1/4/2017)
Brandie Tarvin (1/4/2017)
Thom A (1/4/2017)
I'm a little confused,on one part. You stated that "a hyphen is only allowed after the first character", however that the value "ABC-1245Z" is acceptable. The hyphen is after the 3rd character. Why is this acceptable?I'm speaking in general terms. The hyphen cannot be the first or last character. It can be any other character (allowed after the first character).
Ahh I see, I read that as that it must be after the 1st character (therefore the second), rather than it can be any character other than the first 🙂
Does this therefore work?
SELECT QuestionableColumn
FROM #MyTemp
WHERE RTRIM(QuestionableColumn) LIKE '%[^a-z0-9A-Z-]%'
OR LEFT(QuestionableColumn,1) = '-'
OR RIGHT(RTRIM(QuestionableColumn),1) = '-';
Edit, sorry, cannot be the last either
You forgot to exclude internal spaces, too. I came up with a slightly different approach (using the fact that LIKE ignores trailing spaces).
SELECT *
FROM #MyTemp
WHERE QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%'
OR QuestionableColumn LIKE '-%'
OR QuestionableColumn LIKE '%-'
OR QuestionableColumn LIKE '% [^ ]%'
Drew
PS: As a note to the OP, you only include one data point with a trailing hyphen, but it also contains a leading hyphen, so people might miss that they haven't coded for a trailing hyphen. And none of your data has an internal space.
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 4, 2017 at 11:17 am
Thom A (1/4/2017)
Brandie Tarvin (1/4/2017)
Thom A (1/4/2017)
I'm a little confused,on one part. You stated that "a hyphen is only allowed after the first character", however that the value "ABC-1245Z" is acceptable. The hyphen is after the 3rd character. Why is this acceptable?I'm speaking in general terms. The hyphen cannot be the first or last character. It can be any other character (allowed after the first character).
Ahh I see, I read that as that it must be after the 1st character (therefore the second), rather than it can be any character other than the first 🙂
Does this therefore work?
SELECT QuestionableColumn
FROM #MyTemp
WHERE RTRIM(QuestionableColumn) LIKE '%[^a-z0-9A-Z-]%'
OR LEFT(QuestionableColumn,1) = '-'
OR RIGHT(RTRIM(QuestionableColumn),1) = '-';
Edit, sorry, cannot be the last either
Nope. The OR guarantees the good values will be included with the bad.
January 4, 2017 at 11:20 am
Luis Cazares (1/4/2017)
Does this works?
IF (SELECT OBJECT_ID('tempdb..#MyTemp')) IS NOT NULL
DROP TABLE #MyTemp;
CREATE TABLE #MyTemp (QuestionableColumn CHAR(20));
INSERT INTO #MyTemp (QuestionableColumn)
VALUES ('ABC1GKUCEEF7AR169582'), --Good value
('ABC2G1WG5E37D1157775'), --Good value
('BCD3GTU2TEC9FG119962'), --Good value
('- '),
('--- '),
('-7Z559677 '),
('...ZMY1708617 '),
(';DW238831 '),
('? '),
('[AA007462 '),
('`FFF-B4010103 '),
('{2} '),
('000000 '), --Good value
('123459678 '), --Good value
('ABCDEFGHIJ '), --Good value
('ABC-1245Z'); --Good value
SELECT *
FROM #MyTemp
WHERE QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%'
OR QuestionableColumn LIKE '-%'
Tip: don't try to do everything in a single pattern. Sometimes is easier to use several patterns.
NICE, Luis. And I can use a REVERSE(RTRIM(QuestionableColumn)) to check the last character. I didn't think about that. Thanks.
January 4, 2017 at 11:21 am
Eric M Russell (1/4/2017)
As a side note, it would be great if all valid codes could be pre-loaded into a primary keyed master table, then data validation could be enforced with a foreign key constraint.
Unfortunately, this particular column is sort of like an order number. It's supposed to be unique and will never have a way of pre-identifying the data.
January 4, 2017 at 11:26 am
Just to clarify. The data is coming from a vendor-created database. I cannot make changes to it. I can, however, send emails to the users saying "Fix this item, it has bad data in it." Which is what I'm working on.
So no constraints. No PKs. No schema changes allowed.
The note about internal spaces is well taken, but I believe the initial search takes care of that. ('%[^a-z0-9A-Z]%' translates to any character not in the A-Z or 0-9 range, which includes spaces. I do an RTRIM on the other side of the operator to truncate trailing spaces and I'm good.)
January 4, 2017 at 11:36 am
If the user entered codes are not relationally tied to any other table, then what are the ramifications of them being entered "wrong" ?
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply