DigitsOnlyEE and AlphaNumericOnly

Michael Colling, 2019-03-15 (first published: 2016-05-17)

Updated 20190307 with correctly applied ORDER  BY statements for each function

Intro

In June, 2014 someone asked if anyone had a StripNonNumeric ITVF function. This request would lead to a few RBAR-free, set-based functions designed to clean strings. For the next few months members of the SQL Server community put their heads together and, after a bunch of testing came up with DigitsOnlyEE, the fastest T-SQL function for removing non-numeric characters from a string available today. Designing DigitsOnlyEE as an inline table valued function makes it possible to speed it up even more by forcing a parallel execution plan with Adam Machanic's make_parallel. A couple tweaks later we also had an AlphaNumericOnly ITVF. In April, 2017 I added AlphaOnly which removes all non-alphabetical characters. 

What about PatExclude8K or PatReplace8K?

PatExclude8K and PatReplace8K came out of the aforementioned effort and could be used for removing non-numeric or non-alphanumeric characters as shown below.

  --===== PatExclude8K:
  SELECT NewString FROM dbo.PatExclude8K('???ABC-123!!!','[^0-9]'); -- remove non-numeric characters
  SELECT NewString FROM dbo.PatExclude8K('???ABC-123!!!','[^0-9A-Za-z]'); -- remove non-alphanumeric 
  --===== PatReplace8K:
  SELECT NewString FROM dbo.PatReplace8K('???ABC-123!!!','[^0-9]',''); -- remove non-numeric characters
  SELECT NewString FROM dbo.PatReplace8K('???ABC-123!!!','[^0-9A-Za-z]',''); -- remove non-alphanumeric

DigitsOnlyEE and AlphaNumericOnly are much faster and better suited for this task, let me explain why: PatExclude8K and PatReplace8K both use a tally table to split the string into unigrams, then use the FOR XML PATH('') trick to put the string back together excluding characters that match the exclusion pattern (@pattern). Here's the code from PatExlclude8K:

...
SELECT NewString =
(
  SELECT SUBSTRING(@String,N,1)
  FROM iTally
  WHERE 0 = PATINDEX(@Pattern,SUBSTRING(@String COLLATE Latin1_General_BIN,N,1))
  FOR XML PATH(''),TYPE
).value('.[1]','varchar(8000)');
Note the TYPE and .value('.[1]', 'varchar(8000)') code. Removing this code would make the function notably faster. Unfortunately, PatExclude8K and PatReplace8K require this code in case the preserved text includes special XML characters. Wayne Sheffield discusses this topic in his article: Creating a comma-separated list (SQL Spackle):
The TYPE clause specifies to return the data as an XML type. The .value('.','varchar(max)') takes each value, and converts it into a varchar(max) data type. The combination of the TYPE and .value means that values are created at XML tags (such as the ampersand (&), and the greater than (>) and less than (<) signs), will not be tokenized into their XML representations and will remain as is.

We don't need to include the TYPE and .value('.[1]', 'varchar(8000)') code for DigitsOnlyEE or AlphaNumericOnly because the preserved characters are numbers and letters only. There are other performance enhancements which are documented in the comment section of each function. 

Happy string manipulating!

DigitsOnlyEE

IF OBJECT_ID('dbo.digitsOnlyEE', 'IF') IS NOT NULL DROP FUNCTION dbo.digitsOnlyEE;
GO
CREATE FUNCTION dbo.digitsOnlyEE
(
  @pString VARCHAR(8000)
)
/****************************************************************************************
[Purpose]:
 Given a VARCHAR(8000) or less string, return only the numeric digits from the string.
[Author]:
  Alan Burstein, EE, Jeff Moden
[Compatibility]:
 SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse & Parallel Data Warehouse
[Parameters]:
 @pString = VARCHAR(8000); Input string to be cleaned
[Returns]:
 digitsOnly = NVARCHAR(MAX) 

[Syntax]:
--===== Autonomous
 SELECT ca.digitsOnly
 FROM   dbo.digitsOnly(@pString) AS ca;
--===== CROSS APPLY example
 SELECT      ca.digitsOnly
 FROM        dbo.SomeTable
 CROSS APPLY dbo.digitsOnly(SomeVarcharCol) AS ca;
[Dependencies]:
 N/A
[Developer Notes]:
 1. This is an iTVF (Inline Table Valued Function) that performs the same task as a 
    scalar user defined function (UDF) accept that it requires the APPLY table operator. 
    Note the usage examples below and See this article for more details: 
    http://www.sqlservercentral.com/articles/T-SQL/91724/ 
    The function will be slightly more complicated to use than a scalar UDF but will yeild
    much better performance. For example - unlike a scalar UDF, this function does not 
    restrict the query optimizer's ability generate a parallel query plan. Initial testing
    showed that the function generally gets a parallel execution plan
 2. Runs 2-4 times faster with a parallel query plan. For optimal performance use Adam
    Machanic's make_parallel() function (provided that you are on a machine with two or 
    more logical CPUs). make_parallel can be found here: 
 sqlblog.com/blogs/adam_machanic/archive/2013/07/11/next-level-parallel-plan-porcing.aspx
 3. This is an iTVF (Inline Table Valued Function) that will be used as an iSF 
    (Inline Scalar Function) in that it returns a single value in the returned 
    table and should normally be used in the FROM clause as with any other iTVF.
 4. CHECKSUM returns an INT and will return the exact number given if given an 
    INT to begin with. It's also faster than a CAST or CONVERT and is used as a 
    performance enhancer by changing the bigint of ROW_NUMBER() to a more 
    appropriately sized INT.
 5. Another performance enhancement is using a WHERE clause calculation to prevent 
    the relatively expensive XML PATH concatentation of empty strings normally 
    determined by a CASE statement in the XML "loop".
 6. Another performance enhancement is not making this function a generic function 
    that could handle a pattern. That allows us to use all integer math to do the 
    comparison using the high speed ASCII function convert characters to their numeric
    equivalent.  ASCII characters 48 through 57 are the digit characters of 0 through 9.
 7. Keep in mind that digitsOnlyEE returns an nvarchar(max) value. If you are returning
    small numbers consider casting or converting yout values to a numeric data type if 
    you are inserting the return value into a new table or using it for joins or comparison
    purposes. 
 8. digitsOnlyEE is deterministic; for more about deterministic and nondeterministic 
    functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx
Kudos:
 1. Hats off to Eirikur Eiriksson for the ASCII conversion idea and for the reminders 
    that dedicated functions will always be faster than generic functions and that 
    integer math beats the tar out of character comparisons that use LIKE or PATINDEX.
 2. Hats off to all of the good people that submitted and tested their code on the 
    following thread. It's this type of participation and interest that makes code 
    better.  You've just gotta love this commmunity.
    http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx#bm1629360
[Examples]:
--===== 1. Basic use against a literal
 SELECT f.digitsOnly 
 FROM   dbo.digitsOnlyEE('xxx123abc999!!!') AS f;
--===== 2. Against a table 
 DECLARE @sampleTxt TABLE (txtID int identity, txt varchar(100));
 INSERT @sampleTxt(txt) VALUES ('abc123'),('!!!555!!!'),('000.999'),(NULL);
 SELECT      st.txtID, OldTxt = st.txt, f.digitsOnly
 FROM        @sampleTxt                AS st
 CROSS APPLY dbo.digitsOnlyEE(st.txt) AS f;
---------------------------------------------------------------------------------------
[Revision History]:
 Rev 00 - 20141029 - Initial Creation - Jeff Moden 
 Rev 01 - 20141210 - TOP clause changed to handle NULL inputs - Eirikur Eiriksson
 Rev 02 - 20160512 - Substantial updates to the comments & examples - Alan Burstein
 Rev 03 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside
                     the final cte (digitsonly) so that ORDER BY N does not get sorted.
****************************************************************************************/
RETURNS TABLE WITH SCHEMABINDING AS RETURN
WITH 
E1(N) AS 
(
 SELECT N 
 FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N)
),
iTally(N) AS 
(
  SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) 
    (ROW_NUMBER() OVER (ORDER BY (SELECT NULL)))
  FROM E1 AS a CROSS JOIN E1 AS b CROSS JOIN E1 AS c CROSS JOIN E1 AS d
)
SELECT digitsOnly = 
(
  SELECT   SUBSTRING(@pString,CHECKSUM(N),1)
  FROM     iTally AS i
  WHERE    ((ASCII(SUBSTRING(@pString,CHECKSUM(N),1)) - 48) & 0x7FFF) < 10
  ORDER BY i.N
  FOR XML PATH('')
);
GO

AlphaNumericOnly


IF OBJECT_ID('dbo.alphaNumericOnly8K', 'IF') IS NOT NULL
  DROP FUNCTION dbo.alphaNumericOnly8K;
GO
CREATE FUNCTION dbo.alphaNumericOnly8K
(
  @pString varchar(8000)
) 
RETURNS TABLE WITH SCHEMABINDING AS RETURN
/****************************************************************************************
[Purpose]:
 Given a varchar(8000) string or smaller, this function strips all but the alphanumeric 
 characters that exist in @pString.
[Author]:
 Alan Burstein, EE, Jeff Moden
[Compatibility]:
 SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse
[Parameters]:
 @pString = VARCHAR(8000); Input string to be cleaned
[Returns]:
 AlphaNumericOnly - VARCHAR(8000) 
[Syntax]:
--===== Autonomous
 SELECT ca.AlphaNumericOnly
 FROM   dbo.AlphaNumericOnly(@pString) AS ca;
--===== CROSS APPLY example
 SELECT      ca.AlphaNumericOnly
 FROM        dbo.SomeTable                           AS st
 CROSS APPLY dbo.AlphaNumericOnly(st.SomeVarcharCol) AS ca;
[Dependencies]:
 N/A
[Developer Notes]:
 1. Based on Jeff Moden/Eirikur Eiriksson's DigitsOnlyEE function. For more details see:
   http://www.sqlservercentral.com/Forums/Topic1585850-391-2.aspx#bm1629360
 2. This is an iTVF (Inline Table Valued Function) that performs the same task as a 
    scalar user defined function (UDF) accept that it requires the APPLY table operator. 
    Note the usage examples below and see this article for more details: 
    http://www.sqlservercentral.com/articles/T-SQL/91724/ 
    The function will be slightly more complicated to use than a scalar UDF but will yeild
    much better performance. For example - unlike a scalar UDF, this function does not 
    restrict the query optimizer's ability generate a parallel query plan. Initial testing
    showed that the function generally gets a parallel execution plan. 
 3. AlphaNumericOnly runs 2-4 times faster when using make_parallel() (provided that you 
    have two or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance).
 4. This is an iTVF (Inline Table Valued Function) that will be used as an iSF (Inline 
    Scalar Function) in that it returns a single value in the returned table and should 
    normally be used in the FROM clause as with any other iTVF.
 5. CHECKSUM returns an INT and will return the exact number given if given an INT to 
    begin with. It's also faster than a CAST or CONVERT and is used as a performance 
    enhancer by changing the bigint of ROW_NUMBER() to a more appropriately sized INT.
 6. Another performance enhancement is using a WHERE clause calculation to prevent 
    the relatively expensive XML PATH concatentation of empty strings normally 
    determined by a CASE statement in the XML "loop".
 7. Note that AlphaNumericOnly returns an nvarchar(max) value. If you are returning small 
    numbers consider casting or converting yout values to a numeric data type if you are 
    inserting the return value into a new table or using it for joins or comparison purposes.
 8. AlphaNumericOnly is deterministic; for more about deterministic and nondeterministi
    functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx
[Examples]:
--===== 1. Basic use against a literal
 SELECT ao.AlphaNumericOnly 
 FROM   dbo.alphaNumericOnly8K('xxx123abc999!!!') AS ao;
--===== 2. Against a table 
 DECLARE @sampleTxt TABLE (txtID int identity, txt varchar(100));
 INSERT @sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
 SELECT      st.txtID, OldTxt = st.txt, f.AlphaNumericOnly
 FROM        @sampleTxt                     AS st
 CROSS APPLY dbo.alphaNumericOnly8K(st.txt) AS f;
---------------------------------------------------------------------------------------
 Rev 00 - 20150526 - Inital Creation - Alan Burstein
 Rev 01 - 20150526 - 3rd line in WHERE clause to correct something that was missed
                   - Eirikur Eiriksson
 Rev 02 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside
                     the final cte (digitsonly) so that ORDER BY N does not get sorted.
****************************************************************************************/ 
WITH 
E1(N) AS 
(
  SELECT N 
  FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS x(N)
), 
iTally(N) AS 
( 
  SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
  FROM E1 AS a CROSS JOIN E1 AS b CROSS JOIN E1 AS c CROSS JOIN E1 AS d 
)
SELECT AlphaNumericOnly = 
( 
  SELECT SUBSTRING(@pString,CHECKSUM(N),1) 
  FROM   iTally AS i
  WHERE 
     ((ASCII(SUBSTRING(@pString,CHECKSUM(N),1)) - 48) & 0x7FFF) < 10 
  OR ((ASCII(SUBSTRING(@pString,CHECKSUM(N),1)) - 65) & 0x7FFF) < 26
  OR ((ASCII(SUBSTRING(@pString,CHECKSUM(N),1)) - 97) & 0x7FFF) < 26 
ORDER BY i.N
  FOR XML PATH(''));
GO

AlphaOnly

IF OBJECT_ID('samd.alphaOnly8K') IS NOT NULL DROP FUNCTION samd.alphaOnly8K;
GO
CREATE FUNCTION samd.alphaOnly8K
(
  @pString VARCHAR(8000)
)
RETURNS TABLE WITH SCHEMABINDING AS RETURN
/*****************************************************************************************
[Purpose]:
 Given a VARCHAR(8000) or less string, returns only the alphabetical digits from the 
 input string (@pString).
[Author]:
  Alan Burstein, EE, Jeff Moden
[Compatibility]: 
 SQL Server 2008+, Azure SQL Database, Azure SQL Data Warehouse
[Parameters]:
 @pString = varchar(8000); Input string to be cleaned
[Returns]:
 AlphaOnly = varchar(8000) 
[Syntax]:
--===== Autonomous
 SELECT ca.AlphaOnly
 FROM   dbo.AlphaOnly(@pString) AS ca;

--===== CROSS APPLY example
 SELECT ca.AlphaOnly
 FROM dbo.SomeTable st
 CROSS APPLY dbo.AlphaOnly(st.SomeVarcharCol) AS ca;
[Dependencies]:
 N/A
[Developer Notes]:
 1. Based on Jeff Moden/Eirikur Eiriksson's DigitsOnlyEE function. For more details see:
    https://goo.gl/Qo8rpy
 2. This is an iTVF (Inline Table Valued Function) that performs the same task as a 
    scalar user defined function (UDF) accept that it requires the APPLY table operator. 
    Note the usage examples below and see this article for more details: 
    https://goo.gl/3utbgx 
    The function will be slightly more complicated to use than a scalar UDF but will yeild
    much better performance. For example - unlike a scalar UDF, this function does not 
    restrict the query optimizer's ability generate a parallel query plan. Initial testing
    showed that the function generally gets a 
 3. AlphaOnly runs 2-4 times faster when using make_parallel() (provided that you
    have two or more logical CPU's and MAXDOP is not set to 1 on your SQL Instance).
 4. CHECKSUM returns an INT and will return the exact number given if given an INT to 
    begin with. It's also faster than a CAST or CONVERT and is used as a performance 
    enhancer by changing the bigint of ROW_NUMBER() to a more appropriately sized INT.
 5. Another performance enhancement is using a WHERE clause calculation to prevent 
    the relatively expensive XML PATH concatentation of empty strings normally 
    determined by a CASE statement in the XML "loop".
 6. Note that AlphaOnly returns an nvarchar(max) value. If you are returning small 
    numbers consider casting or converting yout values to a numeric data type if you are 
    inserting the return value into a new table or using it for joins or comparison 
    purposes.
 8. AlphaOnly is deterministic; for more about deterministic and nondeterministic
    functions see https://msdn.microsoft.com/en-us/library/ms178091.aspx
[Examples]:
--===== 1. Basic use against a literal
 SELECT ao.AlphaOnly 
 FROM samd.alphaOnly8K('xxx123abc999!!!') AS ao;
--===== 2. Against a table 
 DECLARE @sampleTxt TABLE (txtID int identity, txt varchar(100));
 INSERT @sampleTxt(txt) VALUES ('!!!A555A!!!'),(NULL),('AAA.999');
 SELECT st.txtID, OldTxt = st.txt, AlphaOnly
 FROM @sampleTxt AS st
 CROSS APPLY samd.alphaOnly8K(st.txt) AS f;
---------------------------------------------------------------------------------------
[Revision History]:
 Rev 00 - 20170411 - Inital Creation - Alan Burstein
 Rev 01 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside
                     the final cte (digitsonly) so that ORDER BY N does not get sorted
 Rev 03 - 20180624 - ADDED ORDER BY N; now performing CHECKSUM conversion to INT inside
                     the final cte (digitsonly) so that ORDER BY N does not get sorted.
****************************************************************************************/
WITH 
E1        AS (SELECT N FROM (VALUES ($),($),($),($),($),($),($),($),($),($)) AS x(N)),
iTally(N) AS 
( 
  SELECT TOP (LEN(ISNULL(@pString,CHAR(32)))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) 
  FROM E1 AS a CROSS JOIN E1 AS b CROSS JOIN E1 AS c CROSS JOIN E1 AS d 
) 
SELECT AlphaOnly = 
( 
  SELECT    SUBSTRING(@pString,CHECKSUM(i.N),1) 
  FROM      iTally AS i
  WHERE    ((ASCII(SUBSTRING(@pString,CHECKSUM(i.N),1)) - 65) & 0x7FFF) < 26 
  OR       ((ASCII(SUBSTRING(@pString,CHECKSUM(i.N),1)) - 97) & 0x7FFF) < 26 
  ORDER BY  i.N
  FOR XML PATH('')
);
GO

    

Rate

4.43 (7)

Share

Share

Rate

4.43 (7)