January 4, 2017 at 12:23 pm
Eric M Russell (1/4/2017)
If the user entered codes are not relationally tied to any other table, then what are the ramifications of them being entered "wrong" ?
I guess is like phone numbers. The numbers might not be relationally tied, but you wouldn't expect that @$%^ would be a valid number.
January 4, 2017 at 12:26 pm
Brandie Tarvin (1/4/2017)
Luis Cazares (1/4/2017)
Does this works?
Tip: don't try to do everything in a single pattern. Sometimes is easier to use several patterns.
NICE, Luis. And I can use a REVERSE(RTRIM(QuestionableColumn)) to check the last character. I didn't think about that. Thanks.
No need to use functions on the column (reverse is expensive). The following works even if there are trailing spaces.
SELECT *
FROM #MyTemp
WHERE QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%'
OR QuestionableColumn LIKE '-%'
OR QuestionableColumn LIKE '%-'
January 4, 2017 at 12:50 pm
Eric M Russell (1/4/2017)
If the user entered codes are not relationally tied to any other table, then what are the ramifications of them being entered "wrong" ?
As I said, these are not really codes. They are more like order numbers, but not just numbers. The table they are getting entered in is a core table of the system and these items are then pulled out of this table to feed into financial systems, customer systems, other product systems. Most of these systems are very picky about what is allowed in their fields and reject non-standard characters. Or have their own constraints.
Because this is a vended system, used by other companies, we cannot change their schema or dictate what this column allows. Other companies may very well allow special characters. All we can do is compensate for it so downstream processes stop freaking out. Hence my creation of a data quality notification for our internal operations people so they can fix the problem.
January 4, 2017 at 12:52 pm
I think Luis' solution is the way to go. To answer your question about the trailing hyphen - you could go with:
AND QuestionableColumn NOT LIKE '%-'
As Drew mentioned, this will treat the hyphen like the last character even if there's trailing spaces (Unlike RIGHT(<column>,1) which is why Thom had to use RTRIM).
I'm going to change the sample data to use a perm table (you'll see why in a moment). Let's use:
USE tempdb
GO
IF (SELECT OBJECT_ID('tempdb.dbo.MyTemp')) IS NOT NULL DROP TABLE dbo.MyTemp;
CREATE TABLE dbo.MyTemp
(
SomeID int identity NOT NULL,
QuestionableColumn char(20) NOT NULL DEFAULT('')
);
INSERT INTO dbo.MyTemp (QuestionableColumn) VALUES
('ABC1GKUCEEF7AR169582'), --Good value
('ABC2G1WG5E37D1157775'), --Good value
('BCD3GTU2TEC9FG119962'), --Good value
('- '),
(' - '),
('000000 '), --Good value
('--- '),
('-7Z559677 '),
('...ZMY1708617 '),
(';DW238831 '),
('? '),
('[AA007462 '),
('`FFF-B4010103 '),
('{2} '),
('000000 '), --Good value
('000000- '), --Alan.B added for testing...
('123459678 '), --Good value
('ABCDEFGHIJ '), --Good value
('ABC-1245Z'); --Good value
The way I'd get valid data would be:
SELECT SomeID, QuestionableColumn
FROM dbo.MyTemp
WHERE NOT
(QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%' OR
QuestionableColumn LIKE '-%' OR
QuestionableColumn LIKE '%-');
Switching gears a little to performance...
This is obviously not SARGable. If I had a lot of rows, needed to retain the bad data, but frequently need to grab the good rows (the story of my life as an ETL guy) I would want a filtered index. Since we can't use LIKE in filtered indexes I would create an indexed view...
CREATE VIEW dbo.vMyTemp WITH SCHEMABINDING AS
SELECT SomeID, QuestionableColumn
FROM dbo.MyTemp
WHERE NOT
(QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%' OR
QuestionableColumn LIKE '-%' OR
QuestionableColumn LIKE '%-');
GO
CREATE UNIQUE CLUSTERED INDEX uci_vMyTemp ON dbo.vMyTemp(QuestionableColumn, someID);
GO
Now I have an filtered index available for the optimizer use if it chooses (by filtered index I mean a filtered subset of only valid data). My indexed view enabled me to make this query SARGable:
SELECT SomeID, QuestionableColumn
FROM dbo.MyTemp
WHERE NOT
(QuestionableColumn LIKE '%[^a-zA-Z0-9 -]%' OR
QuestionableColumn LIKE '-%' OR
QuestionableColumn LIKE '%-')
AND QuestionableColumn LIKE '[A-Za-z0-9]%';
Note that this get's me an index seek:
SELECT QuestionableColumn
FROM dbo.vMyTemp WITH (NOEXPAND) -- I included for testing
WHERE QuestionableColumn LIKE '[A-Za-z0-9]%';
Sorry for the off-topic rant; just some food for thought.
Update/Edit: Note that I started this reply before the Brandi's reply that begins at pg 2 of this thread.
-- Itzik Ben-Gan 2001
January 5, 2017 at 4:29 am
Alan.B (1/4/2017)
Sorry for the off-topic rant; just some food for thought.
Not at all off topic. You made some very good points. As did Luis about using the functions.
Thank you both for continuing to post.
EDIT: As much as I would love to do a VIEW, I have an issue with not being able to create new things on a vended database solution. WITH SCHEMA_BINDING would definitely screw up our vendor and cause all sorts of political problems in the office. So I'm stuck with the non-SARGable option.
Still, this was a great learning experience and many good points were made. Thanks, everyone.
Viewing 5 posts - 16 through 20 (of 20 total)
You must be logged in to reply to this topic. Login to reply