Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 

SQL Validation Functions - 4 Cool UDF's to have handy

Everyone knows that in SQL 2000, yes 2000, Microsoft jumped aboard the bandwagon of other platforms, and gave the developers the abilitiy to create user defined functions.  This was sorely missing from the TSQL toolset, and the ability to CREATE FUNCTIONs, has reduced the development time and simplified coding.  Finally this powerful capability opened up new ways to integrate the code logic into TSQL via these functions without hundreds of lines of complicated code and use of temp tables to store results. Now the functionality has gotten even better.   You can review the msdn library reference for Creating User Defined Functions.

The book, which refers to the latest release of SQL Server, 'Microsoft SQL Server 2008 R2 Unleashed', has an entire chapter dedicated to Creating and Managing User-Defined Functions, including discussions on what's new in 2008 R2, why use User Defined Functions, the types of UDFs, creating and managing UDFs, rewriting stored procedures as UDFs, and creating and using CLR functions.  It is quite in-depth about the topic, and that's only one chapter!  Definitely consider having this latest reference on your bookshelf or in your digital library!

UDFs allow you to add custom solutions for unique application-specific problems. Tasks can be as simple as calculating a value or as complex as defining and implementing table constraints. SQL Server UDFs are subroutines of encapsulated Transact-SQL (T-SQL) logic that you can call into action via other T-SQL code to return a scalar (single) value or a table results.

Some of the main advantages of UDF's (over using stored procedures) are:

  • UDFs return table variables.
  • You can call a UDF from a SELECT statement.
  • You can join to a UDF.

So, since I was recently assigned a task to come up with some validation functions, those that would take an input parameter, and either validate it against the code for things like format, syntax, invalid characters, length, is it numeric, etc. 

More specifically, to explain it in practical terms, have you ever had to fill in a form on-line?  Well, unless you've been in a cave for the last 25 or so years, and never ordered anything on the Internet, then the answer is obviously, "YES!".  What are the typical things a web-form will ask you for?  Besides your name, most forms will require your email address, phone number, and maybe your web-site address (URL).  Some credit cards, banks and government agencies may also require your Social Security Number for security purposes as well.  Of course we assume that you know the site is legitimate, and when you enter this info, it is on a highly secured site using SSL.  Right?  Well, that's a security/phishing topic for a whole other blog, but sometimes you will need to enter your SSN to continue through the site.

Once you enter this information, often times the site highlights your entries in red, or outputs a message that for some reason your email address is invalid.  Maybe, you forgot or misplace the '.' in .com, or the '@' symbol, for example. For your phone number or SSN, perhaps it is not properly formatted - ie your phone number has invalid characters or not numeric.  Heck, most often as long as you enter 10 valid digits for your telephone (we're talking local not international), the site will format it for you!  It may even recognize a bum web-site, and tell you its not a valid URL.   Therefore, the question I seek to answer is, "how are these sites so smart?" and "how do they know your information is invalid".  Almost for certain, the code behind the application is using validation functions

Using SQL Server User Defined Functions, I will demonstrate this here.  First, you must open up your imagination, and be able to take these basic functions, and modify the heck outta them, to customize to your needs.  The following functions don't delete any data, but they are "as-is" with no warranty.  Feel free to use these in your own application/development.

These functions can help you scrub, validate and format data before it is committed to your database(s).  Remember, good data in, good data out.  Conversely, bad data in, badder data out. :-)

Rather than re-invent the wheel, I scoured the internet and Googled, and found some neat and handy validation functions that I am sharing with all of you right here on the PearlKnows blog.  Although I made some of my own modifications, they are generally available on the Internet, and I am in no way taking credit for the original works.

Most of these return a bit as output - in this case 1=Valid, and 0=Invalid - but you can modify them to return a string/message.  Of course, you can use the bit flag in your app code as well to return a more sophisticated output, even a specific message for why its invalid. You will also see in the code, comments on what each line of code is checking for. But, I will leave you to your own devices.  So, take these as they are, and GO NUTS!
 


 1. Email Validation:  This function below basically checks for Nulls, invalid or duplicate characters, and looks for '@' and '.' in the formatting.  Once created, you will run the following t-sql:

   select dbo.udf_ValidateEmail('myemailaddress@somedomain.com')

Create FUNCTION udf_ValidateEmail (@email varChar(255))

RETURNS bit
AS
begin
return
(
select 
	Case 
		When 	@Email is null then 0	                	--NULL Email is invalid
		When	charindex(' ', @email) 	<> 0 or		--Check for invalid character
				charindex('/', @email) 	<> 0 or --Check for invalid character
				charindex(':', @email) 	<> 0 or --Check for invalid character
				charindex(';', @email) 	<> 0 then 0 --Check for invalid character
		When len(@Email)-1 <= charindex('.', @Email) then 0--check for '%._' at end of string
		When 	@Email like '%@%@%'or 
				@Email Not Like '%@%.%'  then 0--Check for duplicate @ or invalid format
		Else 1
	END
)
end


2. Social Security Number - SSN Validation:  We all know that our SSNs are 9-digits long, and follows the standard format of xxx-xx-xxxx.  Therefore, we would want to validate that it is in fact 9-digits, the characters are numeric [0-9], and ensure that the format is indeed the right length for each section - ie: 3-2-4.  After you create this nice function, you can test it by running various offshoots (try to enter invalid data), but basically looks like:

select dbo.udf_ValidateSSN('123-45-6789') 

 

 

CREATE FUNCTION udf_ValidateSSN(@SSN varchar(50))
RETURNS BIT
AS
BEGIN 

DECLARE @ValidFlag BIT


SET @ValidFlag = 0


IF @SSN LIKE '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'
BEGIN

IF LEFT(@SSN,3) != '000' AND SUBSTRING(@SSN,5,2) != '00'
AND RIGHT(@SSN,4) != '0000'
BEGIN
SET @ValidFlag = 1
RETURN ( @ValidFlag)
END
END

IF @SSN LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
BEGIN

IF LEFT(@SSN,3) != '000' AND SUBSTRING(@SSN,4,2) != '00'
AND RIGHT(@SSN,4) != '0000'
BEGIN
SET @ValidFlag = 1
RETURN ( @ValidFlag)
END
END

RETURN ( @ValidFlag)
END
GO


3. Validate/Format Phone Number:  There's always confusion on how to enter a phone number, unless there's a note next to the field that says: "xxx-xxx-xxxx" or "(xxx) xxx-xxxx" or even xxxxxxxxxx.  A smart function will take all inputs and simply format it the way the end-user wants it stored in the database.  For this we can validate the input, and standardize the format of the entered telephone number.  This cool function is a bit more sophisticated, in that it does two things, validate the data; and format it to a standard form.  It first checks if the data is numeric, is 10-digits in length, does not have a '1' at the beginning, etc.  What it wants the user to enter is, 10-digits, no dashes, and it will format '3124457689' to '312-445-7689'.  I'm sure you can modify it to accept the dashes.  You can test it out as:

 select [dbo].[udf_ValFormatPhone] ('3124445678')

CREATE FUNCTION [dbo].[udf_ValFormatPhone] (@phone NVARCHAR(255))
RETURNS NVARCHAR(255)
AS
BEGIN--function
	declare @i int, @repCount int
	declare @current_char char(1)
	declare @phone_new varchar(50)
	set @phone_new = rtrim(ltrim(@phone))

	if left(@phone_new, 1) = '1'
		set @phone_new = right(@phone_new, len(@phone_new) -1)
	

	set @i = 1
	while @i <= len(@phone)
	begin
		set @repCount = 0
		if @i > len(@phone_new)
			break

		set @current_char = substring(@phone_new, @i, 1)

		if isnumeric(@current_char) <> 1
		begin
			set @repCount = len(@phone_new) - len(replace(@phone_new, @current_char, ''))
			set @phone_new = replace(@phone_new, @current_char, '')
		end

		set @i = @i + 1 - @repCount
	end

	if isnumeric(@phone_new) = 1 and len(@phone_new) = 10 
		set @phone_new =
			substring(@phone_new, 1,3) + '-' + 
			substring(@phone_new, 4,3) + '-' + 
			substring(@phone_new, 7,4)
	else
		set @phone_new = 'invalid entry'

	return @phone_new
END --function

4. Validate URL - This one is an interesting and neat function, a bit unique, and one that I can source back to and credit to Adam Machanic. (<click on the link for original source)  It creates a UDF, and uses Object Automation to validate the site from SQL Server. And let me emphasize, it actually validates the URL itself, not just the string!  So, it will attempt to connect to the internet, and check if the URL actually resolves.  It tries to address the timeout connection, and you will need to enable the 'OLE Automation' configuration option for this to work. 

sp_configure 'Show Advanced Options',1
go
reconfigure
go
sp_configure 'Ole Automation Procedures',1
go
reconfigure
go

And, then, once you create the function, you can run:

SELECT dbo.udf_ValidateURL('http://www.microsoft.com/sql')
---
1 <--if valid will return 1, otherwise 0
 

CREATE FUNCTION dbo.ValidateURL(@URL VARCHAR(300))
RETURNS BIT
AS BEGIN	
DECLARE @Object INT	
DECLARE @Return TINYINT	
DECLARE @Valid BIT SET @Valid = 0 --default to false		
--create the XMLHTTP object	
EXEC @Return = sp_oacreate 'MSXML2.ServerXMLHTTP.3.0', @Object OUTPUT	
IF @Return = 0	
BEGIN		
DECLARE @Method VARCHAR(350)		--define setTimeouts method --Resolve, Connect, Send, Receive		
SET @Method = 'setTimeouts(45000, 45000, 45000, 45000)'	--set the timeouts
EXEC @Return = sp_oamethod @Object, @Method
IF @Return = 0		
BEGIN			--define open method			
SET @Method = 'open("GET", "' + @URL + '", false)'--Open the connection			
EXEC @Return = sp_oamethod @Object, @Method		
END			IF @Return = 0		
BEGIN			--SEND the request			
EXEC @Return = sp_oamethod @Object, 'send()'		
END			
IF @Return = 0		
BEGIN			
DECLARE @Output INT			
EXEC @Return = sp_oamethod @Object, 'status', @Output OUTPUT
				IF @Output = 200		
	BEGIN
				SET @Valid = 1		
	END		
END
	END		--destroy the object	EXEC sp_oadestroy @Object	
RETURN (@Valid)
END

So there you have it! Four handy user validation functions that you can use in your code to validate, format, and scrub data as needed.  I hope this information has been helpful!  Happy coding!

 

Comments

Posted by Jason Brimhall on 5 April 2011

Highly useful scripts - thanks

Posted by Steve Jones on 5 April 2011

Very cool set of scripts. I'll have to save these off somewhere for future use.

Posted by srikant maurya on 7 April 2011

I have saved these fuction for future use.

Posted by cjs on 7 April 2011

Very nice, thanks.  Makes me think of some I've used often, for example this one used in ORDER BY to sort IP addresses:

CREATE      FUNCTION AlignAddress(@aIP as varchar(15))  

RETURNS char(15)

-- Pad parts of an IP address with leading zeros so they sort correctly

-- CJS 11/28/03

-- cjs 4/20/04 rtrim address

AS

BEGIN

  IF LEN(@aIP) = 15  RETURN @aIP  -- many need no change

  SET @aIP = RTRIM(@aIP)

  RETURN

     RIGHT('00' + PARSENAME(@aIP, 4), 3) + '.' +

     RIGHT('00' + PARSENAME(@aIP, 3), 3) + '.' +

     RIGHT('00' + PARSENAME(@aIP, 2), 3) + '.' +

     RIGHT('00' + PARSENAME(@aIP, 1), 3)

END

There must be zillions of these out there.

Posted by BalaMurugan on 7 April 2011

The function return valid for this email ID ----@---.--

select dbo.udf_ValidateEmail('----@---.--') . There is still the function need to be validated

Posted by qubit on 7 April 2011

rather use regular expressions, add a clr that uses .net System.Text.RegularExpressions objects!

Posted by sql_jr on 7 April 2011

Excellent article and collection of scripts!  I was looking for something like this recently.  Thanks!

Posted by Mad Myche on 7 April 2011

"END        --destroy the object    EXEC sp_oadestroy @Object"

Wouldn't the destroy @object not fire as it is commented out?

I appreciate articles that actually work on 2000, as 2 of my servers are still using it

Posted by Robert Pearl on 7 April 2011

Mike, though I said the scripts are "as-is", good catch! - I think it was due to a missing line-break. (I could fix it, but not sure if republishing will upset something here :-)

Speaking of "as-is", Balu, you're welcome to modify it (use another case statement) to exclude certain reoccurrences of characters.

Thank you for all for your comments!

Posted by Chad Carter on 7 April 2011

I may just be paranoid but isn't having your SQL server actually go out and connect to a URL being provided by an end user a security risk?  I am not that familiar with the Ole steps but it would seem (as well as the comments state) that you are actually opening the URL to validate it.

Posted by vishal.gamji on 7 April 2011

Thanks. These surely will help.

Posted by Matt Penner on 7 April 2011

I agree with Warren.  I use regex all the time and there are some decent ways to add this to SQL Server (though I wish Microsoft baked it in).  There are some extremely powerful regex expressions to validate the same items above, usually with simpler and easier to understand code (if you know regex that is).

Caution should be used with the phone validation.  Users will need to consider whether they need to validate international numbers and handle formatting.

Posted by wayne.talbot on 7 April 2011

I tried the [udf_ValFormatPhone] and it did not work for me. Trying to debug it for my system I found that my SQL Server 2008 had a problem with the '-' character so if I sent it 1-800-000-1111 it would error. I found in the help documentation that ISNUMERIC considers '$', '+', ',', '-', '.', and '\' as numeric. SO I changed the code to test for them. Also, the function required that the first test was for the leading '1' so I moved it to after the deletion of all non-numeric characters.

This is the resulting code:

USE [DCSReporting]

GO

/****** Object:  UserDefinedFunction [dbo].[udf_ValFormatPhone]    Script Date: 04/07/2011 13:04:25 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

ALTER FUNCTION [dbo].[udf_ValFormatPhone] (@phone NVARCHAR(255))

RETURNS NVARCHAR(255)

AS

BEGIN --function

DECLARE @i INT, @repCount INT

DECLARE @current_char CHAR(1)

DECLARE @phone_new VARCHAR(255)

SET @phone_new = RTRIM(LTRIM(@phone))

-- Parse the phone number to clean out all non-numerics

SET @i = LEN(@phone)

SET @repCount = 1

WHILE @i > 0

BEGIN

SET @current_char = SUBSTRING(@phone_new, @repcount, 1)

IF (ISNUMERIC(@current_char) <> 1

OR @current_char = '$'

OR @current_char = '+'

OR @current_char = ','

OR @current_char = '-'

OR @current_char = '.'

OR @current_char = '\')

BEGIN

SET @phone_new = REPLACE(@phone_new, @current_char, '')

SET @repCount = @repCount -1

END

SET @i = @i -1

SET @repCount = @repCount +1

END

-- Remove the leading '1' if it exists

IF LEFT(@phone_new, 1) = '1'

SET @phone_new = RIGHT(@phone_new, LEN(@phone_new) -1)

-- When the @phone_new is parsed, then insert the formatting

IF ISNUMERIC(@phone_new) = 1 AND LEN(@phone_new) = 10

SET @phone_new =

SUBSTRING(@phone_new, 1,3) + '-' +

SUBSTRING(@phone_new, 4,3) + '-' +

SUBSTRING(@phone_new, 7,4)

ELSE

SET @phone_new = 'invalid entry'

RETURN @phone_new

END --function

To test it, I wrote a quick routine that passed it a long string to parse out the phone number. It should handle any string that contains the correct number of numbers.

Test code:

DECLARE @vlll nvarchar(255)

SET @vlll = 'Embedded in this sentence is the number 1-800-000-1111 and this function can find it.'

SET @vlll = dbo.udf_ValFormatPhone(@vlll)

print @vlll

I hope this helps anyone else who hit the same problem I did.

Posted by Knut Boehnert on 8 April 2011

Nice collection - the one item to improve on the email check is that each letter adheres to RFC3522 specification - unless there is a filter programmed beforehand that no Unicode letters can be even thought of.

I use the following code snippet to do this:

-- 3) check letters are good

-- set based operation breaking up string into letters and finding these letters in list of approved characters

-- special characters are . (dot), @ (at) for full string and ; (semi colon) for concatenated email addresses

SELECT @ValidLetter = IsNull( Sum(Unicode(SubString(@InputEmail, N.Number, 1))), 0)

FROM dbo.Numbers N

WHERE N.Number <= Len(@InputEmail)

AND CharIndex( SubString(@InputEmail, N.Number, 1), '''abcdefghijklmnopqrstuvwxyz01234567890_-.@;' ) = 0

-- reduce to 1 (correct) or 0 (invalid)

-- if invalid characters have been found, @ValidLetter equals the sum of their unicode values

-- if no invalid characters have been found, @ValidLetter equals zero

-- so inverse operation to get to 1 being correct and 0 being invalid

SELECT @ValidLetter = 1 - Sign(@ValidLetter)

And sure enough there is a Numbers table involved to make the check a set based operation instead of RBAR (tm = Jeff Moden in a way).

Posted by JMasciantoni 71211 on 9 April 2011

Not only are the scripts useful and appreciated but so are all the great comment suggestions and corrections. Thanks for sharing.

Posted by Alexander Kuznetsov on 10 April 2011

I am not with you on "SQL 2000, yes 2000, Microsoft jumped aboard the bandwagon of other platforms, and gave the developers the abilitiy to create user defined functions". Oralce 8 was released in '99, and it did have UDFs, and packages and more. More to the point, Oracle's implementation was lightning fast, and it allowed TRY..CATCH in the body - scalar UDFs provided with SQL 2008 R2 have a long way to catch up with the equivalent ones provided with Oracle 8, which was released more than a decade ago.

Posted by ffoerster on 11 April 2011

Hi, interesting scripts. However, I would recommend wrapping such format specific validation into SQL CLR functions using Regular Expressions. As for 4) ... you just want to validate the format of a URL or do you actually want to see if the URL has content behind? The latter one could be quite a burden if called on a bigger set of rows. For URL validation you can also revert to URI classes from the .Net Framework. Regards, Frank

Posted by GGSqler on 11 April 2011

Email Validation is not a correct name, the function doesn't check for a valid email address. This function might be a high level start, but it needs to be heavily modified in order for it to be useful. I'm sure there are plenty more examples, but here are a few:

Declare @Email1 varchar(255)

Declare @Email2 varchar(255)

Declare @Email3 varchar(255)

--This is a valid email, function says it's not.

Set @Email1 = 'sql\@server"@www.com'

--Periods before the @ sign are not permitted, function returns true

Set @Email2 = 'test.@sql.kz'

--Two periods are not allowed, function returns true

Set @Email3 = 'test..k@sql.kz'

Leave a Comment

Please register or log in to leave a comment.