RE: REPLACE Multiple Spaces with One

SSC Guru

Points: 61136

May 28, 2015 at 8:27 pm

Gary Harding (5/28/2015)
Alan.B (5/27/2015)
Michael Meierruth (5/27/2015)
Alan B.
How does your function stand up to a varchar(8000) column with an * at the beginning and end and all blanks in between?
Okay, first - both functions are much slower with a * at the beginning/end. My first test was at 500K rows and I cancelled before the first test finished. I did 10K, 50K and 100K rows. Here's DDL, Test and results:
-- Ran on my Laptop -- X64 (8 Logical CPU, 2.8ghz, 32GB) Microsoft SQL Server 2014 - 12.0.2254.0 (X64)
USE tempdb
GO
-- CREATE THE FUNCTIONS
-----------------------------------------------------------------
IF OBJECT_ID('dbo.fn_CleanUp_MichaelMeierruth_MKIV') IS NOT NULL
DROP FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV
GO
CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV(@S VARCHAR(max))
RETURNS VARCHAR(max) WITH SCHEMABINDING AS
BEGIN
RETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(replace(@s,CHAR(9),' '))),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' ')
END;
GO
IF OBJECT_ID('dbo.NormalizeWhiteSpace8K') IS NOT NULL
DROP FUNCTION dbo.NormalizeWhiteSpace8K
GO
CREATE FUNCTION dbo.NormalizeWhiteSpace8K(@String varchar(max))
RETURNS TABLE WITH SCHEMABINDING AS
RETURN
SELECT
NewString =
replace(replace(replace(replace(replace(replace(replace(
ltrim(rtrim(@String)),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' ');
GO
-- MAKE SURE BOTH STILL WORK (yep)
-----------------------------------------------------------------
DECLARE @string varchar(8000);
SET @string = '*'+REPLICATE(' ',7998)+'*';
SELECT * FROM dbo.NormalizeWhiteSpace8K(@string);
SELECT dbo.fn_CleanUp_MichaelMeierruth_MKIV(@string);
--THE TESTS
-----------------------------------------------------------------
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals;
CREATE TABLE #vals (id int identity primary key, val varchar(8000) not null);
INSERT #vals (val)
SELECT TOP 100000 '*'+REPLICATE(' ',7998)+'*'
FROM sys.all_columns a, sys.all_columns b;
dbcc freeproccache with no_infomsgs
dbcc dropcleanbuffers with no_infomsgs
PRINT '========== itvf version ==========';
DECLARE
@result varchar(8000),
@StartTime DATETIME = GETDATE();
SELECT @result = NewString
FROM #vals
CROSS APPLY dbo.NormalizeWhiteSpace8K(val);
PRINT CAST(@@rowcount AS varchar(10))+' rows.';
PRINT DATEDIFF(ms,@StartTime,GETDATE());
GO 3
PRINT '========== scalar version ==========';
DECLARE
@result varchar(8000),
@StartTime DATETIME = GETDATE();
SELECT @result = dbo.fn_CleanUp_MichaelMeierruth_MKIV(val)
FROM #vals;
PRINT CAST(@@rowcount AS varchar(10))+' rows.';
PRINT DATEDIFF(ms,@StartTime,GETDATE());
GO 3
/*
--10,000 Records
Beginning execution loop
========== itvf version ==========
10000 rows.
1450
========== itvf version ==========
10000 rows.
1380
========== itvf version ==========
10000 rows.
1413
Batch execution completed 3 times.
Beginning execution loop
========== scalar version ==========
10000 rows.
2300
========== scalar version ==========
10000 rows.
2290
========== scalar version ==========
10000 rows.
2276
Batch execution completed 3 times.
*/
/*
-- 50,000 rows
Beginning execution loop
========== itvf version ==========
50000 rows.
7166
========== itvf version ==========
50000 rows.
7116
========== itvf version ==========
50000 rows.
7070
Batch execution completed 3 times.
Beginning execution loop
========== scalar version ==========
50000 rows.
11503
========== scalar version ==========
50000 rows.
11530
========== scalar version ==========
50000 rows.
11493
Batch execution completed 3 times.
*/
/*
-- 100,000 rows
Beginning execution loop
========== itvf version ==========
100000 rows.
14426
========== itvf version ==========
100000 rows.
14226
========== itvf version ==========
100000 rows.
14226
Batch execution completed 3 times.
Beginning execution loop
========== scalar version ==========
100000 rows.
22900
========== scalar version ==========
100000 rows.
23076
========== scalar version ==========
100000 rows.
22943
Batch execution completed 3 times.
*/
Surely both of these methods would benefit from using a binary collation, allowing the REPLACE function to execute faster?
For example, adding COLLATE Latin1_General_BIN2 to the function's inner @string parameter reference gives a significant speed improvement in my tests.
Maybe there's a downside I haven't thought of, but I'd have thought a space is a space is a space in any collation.

I have to try Latin1_General_BIN2. I tested with Latin1_General_BIN and saw no improvement.

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001