# do you have a StripNonNumeric ITVF function?

• Lowell

SSC Guru

Points: 323444

we all know Inline Table Value Functions perform at least a couple of order of magnitude faster than a scalar function, but there are some functions I have in my collection that are scalar, that , so far, I don't have an ITVF equivilent for.

I've got a slow running process, that i found is using a scalar function to strip out non numeric characters; my knee jerk reaction was "I'll just replace that and change to a cross apply /outer apply to do the same thing!Q, and then realized i wasn't making any headway on modifying an existing example to become an ITVF.

before i reinvent the wheel, does anyone have one?

here's an example of a scalar function, that is featuring a Tally Table to build the "right" string. run that on a million rows, and it bogs things down a wee bit. :w00t:

``` IF OBJECT_ID('[dbo].[StripNonNumeric]') IS NOT NULL DROP FUNCTION [dbo].[StripNonNumeric] GO CREATE FUNCTION StripNonNumeric(@OriginalText VARCHAR(8000)) RETURNS VARCHAR(8000) BEGIN DECLARE @CleanedText VARCHAR(8000) ;WITH tally (N) as (SELECT TOP 10000 row_number() OVER (ORDER BY sc1.id) FROM Master.dbo.SysColumns sc1 CROSS JOIN Master.dbo.SysColumns sc2) SELECT @CleanedText = ISNULL(@CleanedText,'') + CASE --ascii numbers are 48(for '0') thru 57 (for '9') WHEN ASCII(SUBSTRING(@OriginalText,Tally.N,1)) BETWEEN 48 AND 57 THEN SUBSTRING(@OriginalText,Tally.N,1) ELSE '' END FROM tally WHERE Tally.N <= LEN(@OriginalText) RETURN @CleanedText END ```

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

• Sean Lange

SSC Guru

Points: 286529

Taking your fine code as a starting point I came up with this. I didn't test against a million rows but I think maybe I will do that now.

``` CREATE FUNCTION StripNonNumeric_itvf(@OriginalText VARCHAR(8000)) RETURNS TABLE WITH SCHEMABINDING AS return WITH E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)), E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max Tally(N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ) select STUFF( ( SELECT SUBSTRING(@OriginalText, t.N, 1) FROM tally t WHERE ASCII(SUBSTRING(@OriginalText, t.N, 1)) BETWEEN 48 AND 57 FOR XML PATH('') ), 1 ,0 , '') as CleanedText ```

_______________________________________________________________

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

• Lowell

SSC Guru

Points: 323444

I didn't think of using FOR XML to handle the concatenation; that, sir, is beautiful and inspired.

You'd think i'd know better, so double kudos to you.

i'll test it on my process, but I'm pretty sure it'll be quicker than what is there now.

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

• Sean Lange

SSC Guru

Points: 286529

Lowell (6/25/2014)

I didn't think of using FOR XML to handle the concatenation; that, sir, is beautiful and inspired.

You'd think i'd know better, so double kudos to you.

i'll test it on my process, but I'm pretty sure it'll be quicker than what is there now.

Awesome. Glad that will work. Sometime it is easier to see a solution when aren't familiar with the current process. I have added that to my own toolbox as that could definitely come in handy at some point.

_______________________________________________________________

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

• Alan Burstein

SSC Guru

Points: 61076

Gents

First: Dang it Sean - you beet me to it! I had a very similar solution but you posted yours first. That said, I did do a performance test and ran into something I cannot figure out.

on my laptop (Windows 8, 2 cores/4 logical cpu/16GB run SQL Server 2014 Ent) I ran your solution (with results on) and it takes 31 seconds complete (my test harness data below). I tried to improve the time in your query but had no success.

I have an ngrams function that I use for this kind of thing. Below is the function with my notes on how to use it:

``` IF OBJECT_ID('tempdb.dbo.nGrams8K') IS NOT NULL DROP FUNCTION dbo.nGrams8K GO CREATE FUNCTION dbo.nGrams8K (@string varchar(8000), @k int) /******************************************************************** Created by: Alan Burstein Created on: 3/10/2014 Last Updated on: 5/22/2015 n-gram defined: In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. For more information see: http://en.wikipedia.org/wiki/N-gram Use: Outputs a stream of tokens based on an input string. Similar to mdq.nGrams: http://msdn.microsoft.com/en-us/library/ff487027(v=sql.105).aspx. Except it only returns characters as long as K. nGrams8K also includes the position of the "Gram" in the string. Examples of how to use included as comments after the code. ********************************************************************/ RETURNS TABLE WITH SCHEMABINDING AS RETURN WITH E1(N) AS (SELECT 1 FROM (VALUES (null),(null),(null),(null),(null)) x(n)), E3(N) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b CROSS JOIN E1 c), iTally(N) AS ( SELECT TOP (LEN(@string)-(@k-1)) CHECKSUM(ROW_NUMBER() OVER (ORDER BY (SELECT NULL))) FROM E3 a CROSS JOIN E3 b ) SELECT position = N, token = SUBSTRING(@string,N,@k) FROM iTally; GO /******************************************************************** (1) Basic Use ----------------------------------------------------------- -- (A) With @k as 1,2 & 3: SELECT position, token FROM dbo.nGrams8K('abcdefg',1); SELECT position, token FROM dbo.nGrams8K('abcdefg',2); SELECT position, token FROM dbo.nGrams8K('abcdefg',3); -- (B) Using variables DECLARE @string varchar(20) = 'abcdefg12345', @tokenLen tinyint = 3; SELECT position, token FROM dbo.nGrams8K(@string,@tokenLen) GO (2) Character Count (including 0 Counts) ----------------------------------------------------------- -- (A) Basic character count SELECT token = ISNULL(token,'Total:'), count(*) FROM dbo.nGrams8K(newid(),1) GROUP BY token WITH ROLLUP; -- Added rollup for display -- (B) Character Account including gaps DECLARE @alphabet VARCHAR(26)='ABCDEFGHIJKLMNOPQRSTUVWXYZ', @string VARCHAR(100)='The quick green fox jumps over the lazy dog and the lazy dogs.'; SELECT a.token, COUNT(b.token) ttl FROM dbo.nGrams8K(@alphabet,1) a LEFT JOIN dbo.nGrams8K(@string,1) b ON a.token=b.token GROUP BY a.token ORDER BY a.token; -- not required, for display only GO -- (C) Let's try French ordered by most frequent DECLARE @alphabet VARCHAR(36)='abcdefghijklmnopqrstuvwxyzéèçëòôöùàâ', @string VARCHAR(100)='Le renard vert rapide saute par dessus le chien paresseux et le chien paresseux juste posé là.'; WITH charcount AS ( SELECT a.token, COUNT(b.token) ttl FROM dbo.nGrams8K(@alphabet,1) a LEFT JOIN dbo.nGrams8K(@string,1) b ON a.token=b.token GROUP BY a.token ) SELECT * FROM charcount ORDER BY ttl DESC; GO (3) a *SIMPLE* word count ----------------------------------------------------------- DECLARE @string varchar(100)='THE QUICK Green FOX JUMPED OVER THE LAZY DOGS BACK'; SELECT @string AS string, count(*)+1 AS words FROM dbo.nGrams8K(@string,1) WHERE [token]=' ' GO (4) search for occurances and location of a substring ----------------------------------------------------------- DECLARE @string VARCHAR(100)='The green fox jumps over the dog and the dog just laid there.', @searchString VARCHAR(100)='The'; SELECT position, token AS searchString FROM dbo.nGrams8K(@string,LEN(@searchstring)) b WHERE token=@searchString; GO (5) Strip non-numeric characters from a string ----------------------------------------------------------- DECLARE @string varchar(100) = 'abc123xyz555xxx999!'; WITH stripNonNumeric(cleanstring) AS ( SELECT token+'' FROM dbo.nGrams8K(@string,1) WHERE token LIKE '[0-9]' FOR XML PATH('') ) SELECT cleanstring FROM stripNonNumeric; (6) Find all occurances of a pattern in a string ------------------------------------------------------------ DECLARE @string varchar(100) = 'zz12x345xxx555abc1234zz5xxx', @pat varchar(100) = '[a-z][a-z][0-9]', @len int = 3; SELECT start_pos = position, token FROM dbo.nGrams8K(@string,@len) WHERE token LIKE @pat SELECT * FROM dbo.findpat8k(@string,@pat,3); GO (7) Find the longest common substring between 2 strings ------------------------------------------------------------ -- (A) The Function IF EXISTS ( SELECT * FROM INFORMATION_SCHEMA.ROUTINES WHERE ROUTINE_SCHEMA='dbo' AND ROUTINE_NAME = 'lcss' ) DROP FUNCTION dbo.lcss; GO CREATE FUNCTION dbo.lcss(@string1 varchar(100), @string2 varchar(100)) RETURNS TABLE WITH SCHEMABINDING AS RETURN WITH iTally(N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM dbo.nGrams8K ( CASE WHEN LEN(@string1)<=LEN(@string2) THEN @string1 ELSE @string2 END,1 ) ) SELECT TOP (1) with ties token FROM iTally CROSS APPLY dbo.nGrams8K ( CASE WHEN LEN(@string1)<=LEN(@string2) THEN @string1 ELSE @string2 END, n ) WHERE N <= CASE WHEN LEN(@string1)<=LEN(@string2) THEN LEN(@string1) ELSE LEN(@string2) END AND charindex ( token, CASE WHEN LEN(@string1)<=LEN(@string2) THEN @string2 ELSE @string1 END ) > 0 ORDER BY len(token) DESC; GO -- (B) example of how to use lcss DECLARE @string1 varchar(100) = '999xxyyyzaa99xxyyyzz', @string2 varchar(100) = '000xxxyyyzzz'; SELECT string1 = @string1, string2 = @string2, token FROM dbo.lcss(@string1, @string2); ********************************************************************/ GO ```

I ran the following 10K Row Test:

`SET NOCOUNT ON;`

``` DBCC FREEPROCCACHE WITH NO_INFOMSGS DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS SET STATISTICS IO ON; SET STATISTICS TIME ON; PRINT 'Sean'+char(13); SELECT CleanedText FROM #strings CROSS APPLY dbo.StripNonNumeric_itvf(string); PRINT 'Alan'+char(13); SELECT CleanedText FROM #strings CROSS APPLY dbo.StripNonNumeric_itvf_ajb(string); SET STATISTICS IO OFF; ```

`SET STATISTICS TIME OFF;`

... and here's the results:

``` Sean SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table '#strings____________________________________________________________________________________________________________00000000004C'. Scan count 1, logical reads 63, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 31172 ms, elapsed time = 31213 ms. Alan SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Table '#strings____________________________________________________________________________________________________________00000000004C'. Scan count 1, logical reads 63, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 328 ms, elapsed time = 384 ms. SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. ```

I have no idea why my NGrams solution is so much faster considering that we are essentially doing the same thing. I included the query plans for both.

Lowell: I hope this helps.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

• Sean Lange

SSC Guru

Points: 286529

Alan.B (6/25/2014)

Gents

First: Dang it Sean - you beet me to it! I had a very similar solution but you posted yours first. That said, I did do a performance test and ran into something I cannot figure out.

Nice. I didn't see your test data and haven't had a chance to roll my own yet. I am guessing that the big difference is that my code uses STUFF and yours does not.

If you still have your test harness handy try with this.

``` ALTER FUNCTION [dbo].[StripNonNumeric_itvf](@OriginalText VARCHAR(8000)) RETURNS TABLE WITH SCHEMABINDING AS return WITH E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)), E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max Tally(N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ) , clean_text(CleanedText) as ( SELECT SUBSTRING(@OriginalText, t.N, 1) FROM tally t WHERE ASCII(SUBSTRING(@OriginalText, t.N, 1)) BETWEEN 48 AND 57 FOR XML PATH('') ) select CleanedText from clean_text ```

_______________________________________________________________

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

• Alan Burstein

SSC Guru

Points: 61076

Sean Lange (6/25/2014)

Alan.B (6/25/2014)

Gents

First: Dang it Sean - you beet me to it! I had a very similar solution but you posted yours first. That said, I did do a performance test and ran into something I cannot figure out.

Nice. I didn't see your test data and haven't had a chance to roll my own yet. I am guessing that the big difference is that my code uses STUFF and yours does not.

If you still have your test harness handy try with this.

``` ALTER FUNCTION [dbo].[StripNonNumeric_itvf](@OriginalText VARCHAR(8000)) RETURNS TABLE WITH SCHEMABINDING AS return WITH E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)), E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max Tally(N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ) , clean_text(CleanedText) as ( SELECT SUBSTRING(@OriginalText, t.N, 1) FROM tally t WHERE ASCII(SUBSTRING(@OriginalText, t.N, 1)) BETWEEN 48 AND 57 FOR XML PATH('') ) select CleanedText from clean_text ```

What you posted was getting the same results (31 seconds)... I added this to the where clause:

AND n <= len(@OriginalText)[/code]

Final Version

``` ALTER FUNCTION [dbo].[StripNonNumeric_itvf_sean2](@OriginalText VARCHAR(8000)) RETURNS TABLE WITH SCHEMABINDING AS return WITH E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)), E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max Tally(N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ) , clean_text(CleanedText) as ( SELECT SUBSTRING(@OriginalText, t.N, 1) FROM tally t WHERE ASCII(SUBSTRING(@OriginalText, t.N, 1)) BETWEEN 48 AND 57 AND n <= len(@OriginalText) -- added by ajb FOR XML PATH('') ) select CleanedText from clean_text GO ```

It now runs in Milliseconds, same speed as mine. ~30 seconds for a million rows on my system using the test harness I posted earlier.

Edit: Some of my code got turned into HTML tags. Underlined my change.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

• Sean Lange

SSC Guru

Points: 286529

Alan.B (6/25/2014)

Sean Lange (6/25/2014)

Alan.B (6/25/2014)

Gents

First: Dang it Sean - you beet me to it! I had a very similar solution but you posted yours first. That said, I did do a performance test and ran into something I cannot figure out.

Nice. I didn't see your test data and haven't had a chance to roll my own yet. I am guessing that the big difference is that my code uses STUFF and yours does not.

If you still have your test harness handy try with this.

``` ALTER FUNCTION [dbo].[StripNonNumeric_itvf](@OriginalText VARCHAR(8000)) RETURNS TABLE WITH SCHEMABINDING AS return WITH E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)), E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max Tally(N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ) , clean_text(CleanedText) as ( SELECT SUBSTRING(@OriginalText, t.N, 1) FROM tally t WHERE ASCII(SUBSTRING(@OriginalText, t.N, 1)) BETWEEN 48 AND 57 FOR XML PATH('') ) select CleanedText from clean_text ```

What you posted was getting the same results (31 seconds)... I added this to the where clause:

`AND n <= len(@OriginalText)`

Final Version

``` ALTER FUNCTION [dbo].[StripNonNumeric_itvf_sean2](@OriginalText VARCHAR(8000)) RETURNS TABLE WITH SCHEMABINDING AS return WITH E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)), E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max Tally(N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ) , clean_text(CleanedText) as ( SELECT SUBSTRING(@OriginalText, t.N, 1) FROM tally t WHERE ASCII(SUBSTRING(@OriginalText, t.N, 1)) BETWEEN 48 AND 57 AND n <= len(@OriginalText) -- added by ajb FOR XML PATH('') ) select CleanedText from clean_text GO ```

It now runs in Milliseconds, same speed as mine. ~30 seconds for a million rows on my system using the test harness I posted earlier.

Edit: Some of my code got turned into HTML tags.

That makes sense. No point evaluating all the 10,000 rows when they won't meet the condition anyway. I suspect if we changed my original (with the STUFF) it would perform much the same with the inclusion of the additional predicate in the where clause. Awesome work Alan.

_______________________________________________________________

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

• Alan Burstein

SSC Guru

Points: 61076

That makes sense. No point evaluating all the 10,000 rows when they won't meet the condition anyway. I suspect if we changed my original (with the STUFF) it would perform much the same with the inclusion of the additional predicate in the where clause. Awesome work Alan.

Thanks Sean!

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

• mickyT

SSChampion

Points: 10360

Hi

Interestingly enough I tried a similar test on 2012 for the originals and got elapsed times that so close as to be the same. It may have been the strings I was using though.

Bring on a PATREPLACE function 😀

ChrisM's pattern splitter from Dwain's article [/url]may also be able to be used here

`select cleanedstring`

``` FROM #strings s CROSS APPLY ( SELECT Item + '' FROM PatternSplitCM(string,'[^0-9]') WHERE MATCHED = 0 FOR XML PATH ('') ```

` ) x (cleanedstring)`

• Alan Burstein

SSC Guru

Points: 61076

mickyT (6/25/2014)

Hi

Interestingly enough I tried a similar test on 2012 for the originals and got elapsed times that so close as to be the same. It may have been the strings I was using though.

Bring on a PATREPLACE function 😀

ChrisM's pattern splitter from Dwain's article [/url]may also be able to be used here

`select cleanedstring`

``` FROM #strings s CROSS APPLY ( SELECT Item + '' FROM PatternSplitCM(string,'[^0-9]') WHERE MATCHED = 0 FOR XML PATH ('') ```

` ) x (cleanedstring)`

I recently came up with an iTVF PatExclude function that removes characters based on a pattern and then I remembered this post. Check out the examples in the code...

`CREATE FUNCTION dbo.PatExclude8K`

``` ( @string varchar(8000), @exclude_pattern varchar(50) ) RETURNS TABLE AS /* -- remove letters SELECT * FROM dbo.PatExclude8K('abc123!', '[a-z]'); -- remove numbers SELECT * FROM dbo.PatExclude8K('abc123!', '[0-9]'); -- only include letters and numbers SELECT * FROM dbo.PatExclude8K('###abc123!!!', '[^0-9a-z]'); -- Remove spaces SELECT * FROM dbo.PatExclude8K('XXX 123 ZZZ', '['+char(32)+']'); -- only include letters and "!, ? or ." SELECT * FROM dbo.PatExclude8K('123# What?!... ', '[^a-z!?.]') */ RETURN WITH E1(n) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)), E2(n) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b), --10E+2 or 100 rows E4(n) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b), --10E+4 or 10,000 rows max iTally(N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), stringToTable AS ( SELECT TOP (len(@string)) n, s = substring(@string,n,1) FROM iTally WHERE n <= len(@string) ), newstring(newstring) AS ( SELECT s+'' FROM stringToTable WHERE s NOT LIKE @exclude_pattern ORDER BY n FOR XML PATH('') ) SELECT newstring FROM newstring; ```

`GO`

Next I tested it against PatternSplitCM for removing non-numeric....

100K row test harness:

``` IF OBJECT_ID('tempdb..#val') IS NOT NULL DROP TABLE #val; CREATE TABLE #val (txt varchar(36) primary key); INSERT INTO #val SELECT TOP (100000) NEWID() FROM sys.all_columns a CROSS JOIN sys.all_columns b; GO dbcc freeproccache; dbcc dropcleanbuffers; DECLARE @string varchar(50); SET STATISTICS TIME ON; PRINT 'PatExcludeAB:'; SELECT @string = newstring FROM #val CROSS APPLY dbo.PatExclude8K(txt, '[^0-9]'); PRINT 'PatternSplitCM:'; select @string = cleanedstring FROM #val s CROSS APPLY ( SELECT Item + '' FROM dbo.PatternSplitCM(txt,'[^0-9]') WHERE MATCHED = 0 FOR XML PATH ('') ) x (cleanedstring); SET STATISTICS TIME OFF; GO ```

Results:

PatExcludeAB:

SQL Server Execution Times:

CPU time = 2250 ms, elapsed time = 2271 ms.

PatternSplitCM:

SQL Server Execution Times:

CPU time = 7109 ms, elapsed time = 7274 ms[/code]

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

• Jeff Moden

SSC Guru

Points: 996119

Out of all of these submittals, which post has the fastest code? I'd check for myself but I'm on the run right now but want to check the "best" against an old script that I have for this.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
"If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
"Change is inevitable... change for the better is not."

How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

• Alan Burstein

SSC Guru

Points: 61076

Jeff Moden (10/27/2014)

Out of all of these submittals, which post has the fastest code? I'd check for myself but I'm on the run right now but want to check the "best" against an old script that I have for this.

Basically we have four:

The first solution Sean posted, my solution using nGrams, Micky's using paternSplitCM and the one I posted today. I wrapped them all in functions:

``` ----------------------------------------------------------------------- -- (1) Sean's origninal solution ----------------------------------------------------------------------- CREATE FUNCTION dbo.StripNonNumeric_itvf(@OriginalText VARCHAR(8000)) RETURNS TABLE WITH SCHEMABINDING AS return WITH E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)), E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max Tally(N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ) select STUFF( ( SELECT SUBSTRING(@OriginalText, t.N, 1) FROM tally t WHERE ASCII(SUBSTRING(@OriginalText, t.N, 1)) BETWEEN 48 AND 57 FOR XML PATH('') ), 1 ,0 , '') as CleanedText; ----------------------------------------------------------------------- -- (2) My Solution using nGrams8K ----------------------------------------------------------------------- ;-- nGrams CREATE FUNCTION [dbo].[nGrams8K] ( @string VARCHAR(8000), @n TINYINT, @pad BIT=0 ) RETURNS TABLE WITH SCHEMABINDING AS RETURN WITH E1(n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) t(n)), E2(n) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b), E4(n) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b), iTally(n) AS ( SELECT TOP (len(@string)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), NewString(NewString) AS ( SELECT REPLICATE(CASE @pad WHEN 0 THEN '' ELSE ' ' END,@n-1)+@string+ REPLICATE(CASE @pad WHEN 0 THEN '' ELSE ' ' END,@n-1) ) SELECT n AS sequence, SUBSTRING(NewString,n,@n) AS token FROM iTally CROSS APPLY NewString WHERE n < ((@n)+LEN(@string)); GO -- stripNonNumeric_itvf_ajb CREATE FUNCTION dbo.StripNonNumeric_itvf_ajb(@OriginalText VARCHAR(8000)) RETURNS TABLE WITH SCHEMABINDING AS return WITH ngrams AS ( SELECT n = [sequence], c = token FROM dbo.nGrams8K(@OriginalText,1,1) ), clean_txt(CleanedText) AS ( SELECT c+'' FROM ngrams WHERE ascii(substring(@OriginalText,n,1)) BETWEEN 48 AND 57 FOR XML PATH('') ) SELECT CleanedText FROM clean_txt; GO ----------------------------------------------------------------------- -- (3) Solution Using PatExclude8K ----------------------------------------------------------------------- CREATE FUNCTION dbo.PatExclude8K ( @string varchar(8000), @exclude_pattern varchar(50) ) RETURNS TABLE AS RETURN WITH E1(n) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)), E2(n) AS (SELECT 1 FROM E1 a CROSS JOIN E1 b), --10E+2 or 100 rows E4(n) AS (SELECT 1 FROM E2 a CROSS JOIN E2 b), --10E+4 or 10,000 rows max iTally(N) AS ( SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 ), stringToTable AS ( SELECT TOP (len(@string)) n, s = substring(@string,n,1) FROM iTally WHERE n <= len(@string) ), newstring(newstring) AS ( SELECT s+'' FROM stringToTable WHERE s NOT LIKE @exclude_pattern ORDER BY n FOR XML PATH('') ) SELECT newstring FROM newstring; GO ----------------------------------------------------------------------- -- (4) Using PatternSplitCM ----------------------------------------------------------------------- CREATE FUNCTION dbo.StripNonNumeric_itvf_pscm(@OriginalText VARCHAR(8000)) RETURNS TABLE --WITH SCHEMABINDING AS return WITH newstring(newstring) AS ( SELECT item+'' FROM PatternSplitCM(@OriginalText, '[^0-9]') WHERE [matched] = 0 FOR XML PATH('') ) SELECT newstring FROM newstring; GO ----------------------------------------------------------------------- -- Make sure they work ----------------------------------------------------------------------- SELECT * FROM dbo.StripNonNumeric_itvf('abc123!'); SELECT * FROM dbo.StripNonNumeric_itvf_pscm('abc123!'); SELECT * FROM dbo.StripNonNumeric_itvf_ajb('abc123!'); SELECT * FROM dbo.PatExclude8K('abc123!', '[^0-9]'); ```

then created this test harness...

``` ----------------------------------------------------------------------- -- 100K ROW TEST HARNESS ----------------------------------------------------------------------- IF OBJECT_ID('tempdb..#val') IS NOT NULL DROP TABLE #val; CREATE TABLE #val (txt varchar(36) primary key); INSERT INTO #val SELECT TOP (100000) NEWID() FROM sys.all_columns a CROSS JOIN sys.all_columns b; GO set nocount on; dbcc freeproccache; dbcc dropcleanbuffers; DECLARE @string varchar(50); SET STATISTICS TIME ON; PRINT 'Original (skipped; runs 30+ seconds for 10K rows):'; --SELECT CleanedText --FROM #val --CROSS APPLY dbo.StripNonNumeric_itvf(txt); PRINT 'Using PatternSplitCM:'; SELECT @string = newstring FROM #val CROSS APPLY dbo.StripNonNumeric_itvf_pscm(txt); PRINT 'Using nGrams:'; SELECT @string = CleanedText FROM #val CROSS APPLY dbo.StripNonNumeric_itvf_ajb(txt); PRINT 'Using PatExclude8K:'; SELECT @string = newstring FROM #val CROSS APPLY dbo.PatExclude8K(txt, '[^0-9]'); SET STATISTICS TIME OFF; GO ```

the results...

``` Original (skipped; runs 30+ seconds for 10K rows): Using PatternSplitCM: SQL Server Execution Times: CPU time = 7187 ms, elapsed time = 7239 ms. Using nGrams: SQL Server Execution Times: CPU time = 2594 ms, elapsed time = 2591 ms. PatExclude8K: SQL Server Execution Times: CPU time = 2266 ms, elapsed time = 2276 ms. ```

The results were pretty much the same when tested against hot or cold cache.

Note: the first 2 functions are the original code from a couple months back.

Edit: fixed print statement, added note

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

• Jeff Moden

SSC Guru

Points: 996119

I took the two contenders from your good post, Alan, and added on of my own. Here's the code for the function I use for such things. Yeah... you'll be shocked. It's not only a scalar function but it also has a WHILE loop in it.

``` SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE FUNCTION [dbo].[CleanString] /******************************************************************************* Purpose: Given a string and a pattern of characters to remove, remove the patterned characters from the string. Usage: --===== Basic Syntax Example SELECT CleanedString = dbo.CleanString(@pSomeString,@pPattern) ; --===== Remove all but Alpha characters SELECT CleanedString = dbo.CleanString(st.SomeString,'%[^A-Za-z]%'); FROM dbo.SomeTable st ; --===== Remove all but Numeric digits SELECT CleanedString = dbo.CleanString(st.SomeString,'%[^0-9]%'); FROM dbo.SomeTable st ; Programmer Notes: 1. @pPattern is case sensitive. 2. The pattern set of characters must be for just one character. Revision History: Rev 00 - Circa 2007 - George Mastros? - Initial find on the web Rev 01 - 29 Mar 2007 - Jeff Moden - Optimize to remove one instance of PATINDEX from the loop. - Add code to use the pattern as a parameter. Rev 02 - 26 May 2013 - Jeff Moden - Add case sensitivity *******************************************************************************/ (@pString VARCHAR(8000),@pPattern VARCHAR(100)) RETURNS VARCHAR(8000) AS BEGIN DECLARE @Pos SMALLINT; SELECT @Pos = PATINDEX(@pPattern,@pString COLLATE Latin1_General_BIN); WHILE @Pos > 0 SELECT @pString = STUFF(@pString,@Pos,1,''), @Pos = PATINDEX(@pPattern,@pString COLLATE Latin1_General_BIN); RETURN @pString; END ; ```

Here's the test harness that I used. It runs each function through the 100K row table 3 times.

``` --===== Create the 100K row test table IF OBJECT_ID('tempdb..#val') IS NOT NULL DROP TABLE #val ; SELECT TOP 100000 txt = ISNULL(CONVERT(VARCHAR(36),NEWID()),'') INTO #val FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2 ; ALTER TABLE #Val ADD PRIMARY KEY CLUSTERED (txt) ; --===== Do the tests. Had to use duration because one -- of the tests is on the new scalar function and -- SET STATISTICS doesn't report on those correctly. GO PRINT '========== Using nGrams =========='; DECLARE @String VARCHAR(36) ,@StartTime DATETIME ; SELECT @StartTime = GETDATE() ; SELECT @string = CleanedText FROM #val CROSS APPLY dbo.StripNonNumeric_itvf_ajb(txt) ; PRINT DATEDIFF(ms,@StartTime,GETDATE()) ; GO 3 PRINT '========== Using PatExclude8K =========='; DECLARE @String VARCHAR(36) ,@StartTime DATETIME ; SELECT @StartTime = GETDATE() ; SELECT @string = newstring FROM #val CROSS APPLY dbo.PatExclude8K(txt, '[^0-9]') ; PRINT DATEDIFF(ms,@StartTime,GETDATE()) ; GO 3 PRINT '========== Jeff''s Old Scalar Loop Function =========='; DECLARE @String VARCHAR(36) ,@StartTime DATETIME ; SELECT @StartTime = GETDATE() ; SELECT @String = dbo.CleanString(txt,'%[^0-9]%') FROM #val ; PRINT DATEDIFF(ms,@StartTime,GETDATE()) ; GO 3 ```

Here are the run results. Again, you'll be shocked. This is the one place where I've not been able to make a Tally Table solution able to beat it. Lord knows I and other good folks have tried.

``` (100000 row(s) affected) Beginning execution loop ========== Using nGrams ========== 2916 ========== Using nGrams ========== 2893 ========== Using nGrams ========== 2890 Batch execution completed 3 times. Beginning execution loop ========== Using PatExclude8K ========== 2703 ========== Using PatExclude8K ========== 2640 ========== Using PatExclude8K ========== 2653 Batch execution completed 3 times. Beginning execution loop ========== Jeff's Old Scalar Loop Function ========== 2413 ========== Jeff's Old Scalar Loop Function ========== 2500 ========== Jeff's Old Scalar Loop Function ========== 2466 Batch execution completed 3 times. ```

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
"If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
"Change is inevitable... change for the better is not."

How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

• Eirikur Eiriksson

SSC Guru

Points: 182425

Here is my attempt from the train journey to work this morning, looks like it will give the while loop a real run for the money.

😎

`/********************************************************************`

``` -- Stripping out any non-numerical characters -- EE 2014-10-28 ********************************************************************/ CREATE FUNCTION dbo.STRIP_NUM_EE ( @INSTR VARCHAR(8000) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN WITH T(N) AS (SELECT N FROM ( VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N)) ,NUMS(N) AS (SELECT TOP(CONVERT(BIGINT,LEN(@INSTR),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4) SELECT ( SELECT CASE WHEN CONVERT(CHAR(1),(ASCII(SUBSTRING(@INSTR COLLATE Latin1_General_BIN,NM.N,1)) - 48),0) = SUBSTRING(@INSTR COLLATE Latin1_General_BIN,NM.N,1) THEN SUBSTRING(@INSTR,NM.N,1) END FROM NUMS NM FOR XML PATH('') ) AS OUT_STR ```

`;`

Test results

`Beginning execution loop`

``` ========== Using nGrams ========== 1640 ========== Using nGrams ========== 1640 ========== Using nGrams ========== 1640 Batch execution completed 3 times. Beginning execution loop ========== Using PatExclude8K ========== 2686 ========== Using PatExclude8K ========== 2730 ========== Using PatExclude8K ========== 2700 Batch execution completed 3 times. Beginning execution loop ========== Jeff's Old Scalar Loop Function ========== 1556 ========== Jeff's Old Scalar Loop Function ========== 1563 ========== Jeff's Old Scalar Loop Function ========== 1546 Batch execution completed 3 times. Beginning execution loop ========== dbo.STRIP_NUM_EE Function ========== 1490 ========== dbo.STRIP_NUM_EE Function ========== 1513 ========== dbo.STRIP_NUM_EE Function ========== 1473 ```

`Batch execution completed 3 times.`

Viewing 15 posts - 1 through 15 (of 46 total)