July 6, 2012 at 5:15 am
Hello there,
I have this table having a varchar column which contains
numeric data + all kind of junk data (non-numeric chars) - not something in our environment control.
For some data processing I want to convert these values to numeric whenever possible -
so need to filter records which contain numeric values only.
As numeric value rules, the value should *only* have
- one or more digits [0-9]*
- zero or more comma before decimal point [,]+
- zero or one decimal point [.]?
- an optional(zero or one) minus/plus sign *in the beginning* [\+\-]?
I tried to use PATINDEX() as
WHERE PATINDEX('[\+\-]?[0-9,]*[.]?[0-9]*', varcharValue) > 0
But this doesn't seem to work
It seems PATINDEX() doesn't allow this kind of pattern matching and I need to go via CLR route.
Can anyone please confirm.
July 6, 2012 at 5:37 am
The following will determine which rows have valid numerics. If you need to convert the string to a number you may need another routine such as my BigInt converter.
declare @t table (strval varchar(10))
insert into @t values ('+123.45');
insert into @t values ('-12,345.67');
insert into @t values ('-1-2');
select isnumeric(strval) from @t;
/*
-- Tests pass isnumeric AND fail IsBigInt AND fail cast(vc as bigint)
-- range
SELECT IsNumeric('-9223372036854775809'), dbo.IsBigInt('-9223372036854775809')
SELECT IsNumeric('9223372036854775808'), dbo.IsBigInt('9223372036854775808')
-- invalid chars
SELECT IsNumeric('-5d2'), dbo.IsBigInt('-5d2')
SELECT IsNumeric('-5e2'), dbo.IsBigInt('-5e2')
SELECT IsNumeric('+3,4'), dbo.IsBigInt('+3,4')
SELECT IsNumeric('+3.4'), dbo.IsBigInt('+3.4')
-- pass this strange case
SELECT IsNumeric('00000000000000000000000000001'), dbo.IsBigInt('00000000000000000000000000001')
*/
IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'dbo.IsBigInt') AND type IN (N'FN', N'IF', N'TF', N'FS', N'FT'))
DROP FUNCTION dbo.IsBigInt
GO
CREATE FUNCTION dbo.IsBigInt (@a varchar(30))
returns bit
AS
BEGIN
-- Submitted to SqlServerCentral by William Talada
DECLARE
@s varchar(30),
@i int,
@IsNeg bit,
@valid int
-- assume the best
SET @valid = 1
SET @IsNeg=0
SET @s = ltrim(rtrim(@a))
-- strip OFF negative sign
IF len(@s) > 0
AND LEFT(@s, 1) = '-'
BEGIN
SET @IsNeg=1
SET @s = RIGHT(@s, len(@s) - 1)
END
-- strip OFF positive sign
IF len(@s) > 0
AND LEFT(@s, 1) = '+'
BEGIN
SET @s = RIGHT(@a, len(@a) - 1)
END
-- strip leading zeros
while len(@s) > 1 and left(@s,1) = '0'
set @s = right(@s, len(@s) - 1)
-- 19 digits max
IF len(@s) > 19 SET @valid = 0
-- the rest must be numbers only
SET @i = len(@s)
WHILE @i >= 1
BEGIN
IF charindex(substring(@s, @i, 1), '0123456789') = 0 SET @valid = 0
SET @i = @i - 1
END
-- check range
IF @valid = 1
AND len(@s) = 19
BEGIN
IF @isNeg = 1 AND @s > '9223372036854775808' SET @valid = 0
IF @IsNeg = 0 AND @s > '9223372036854775807' SET @valid = 0
END
RETURN @valid
END
go
July 6, 2012 at 9:23 am
Yes, that's what we are currentlly using, but it is giving unexpected result sometimes -
for instance -
- IsNumeric(x) returns true - when varchar value has both comma and a tab character (Char(9)),
- IsNumeric(x) returns false - when varchar value has only tab character and no comma
July 9, 2012 at 5:18 am
SELECT *
FROM (SELECT ID = 1, somecolumn = '399,243.01' UNION ALL -- YES
SELECT ID = 2, '£399,243.01' UNION ALL -- NO
SELECT ID = 3, '39924301' UNION ALL -- YES
SELECT ID = 4, '399243.01' UNION ALL -- YES
SELECT ID = 5, '399,24301' UNION ALL -- YES
SELECT ID = 6, '399.243,01' UNION ALL -- YES
SELECT ID = 7, '399 243,01' UNION ALL -- NO
SELECT ID = 8, '-399.243,01' UNION ALL -- YES
SELECT ID = 9, '+399.243,01' UNION ALL -- YES
SELECT ID = 10, '_399.243,01' -- NO
) a
WHERE somecolumn NOT LIKE '%[^,.0-9+-]%'
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 9, 2012 at 2:56 pm
If you go via a CLR regex function, you can try this regex to see if it satisfies your spec:
(?xm)^[+-]?\d{1,3}(,\d{3})+(\.\d+)?$
|
^[+-]?\d{1,3}(\.\d+)?$
tested in RegexBuddy vs this input:
-4.012
6berry
+8
-234,345,234.012
7
pbo.PoBox34
got mathes:
-4.012
+8
-234,345,234.012
7
without seeing your real input strings it's hard to write a regex that would fit it all though.
July 9, 2012 at 4:21 pm
I think this gets you pretty close, w/o having to resort to CLR:
WHERE
--one +/- optional, but must be first if used
PATINDEX('%[+-]%', varcharValue) IN (0, 1) AND
varcharValue NOT LIKE '%[+-]%[+-]%' AND
--only one decimal allowed
varcharValue NOT LIKE '%.%.%' AND
--only chars shown allowed; '-' must be LAST, or it will be taken as a range indicator, not a char
varcharValue NOT LIKE '%[^0-9,.+-]%'
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 6 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy