• Gary Harding (5/28/2015)


    Alan.B (5/27/2015)


    Michael Meierruth (5/27/2015)


    Alan B.

    How does your function stand up to a varchar(8000) column with an * at the beginning and end and all blanks in between?

    Okay, first - both functions are much slower with a * at the beginning/end. My first test was at 500K rows and I cancelled before the first test finished. I did 10K, 50K and 100K rows. Here's DDL, Test and results:

    -- Ran on my Laptop -- X64 (8 Logical CPU, 2.8ghz, 32GB) Microsoft SQL Server 2014 - 12.0.2254.0 (X64)

    USE tempdb

    GO

    -- CREATE THE FUNCTIONS

    -----------------------------------------------------------------

    IF OBJECT_ID('dbo.fn_CleanUp_MichaelMeierruth_MKIV') IS NOT NULL

    DROP FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV

    GO

    CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV(@S VARCHAR(max))

    RETURNS VARCHAR(max) WITH SCHEMABINDING AS

    BEGIN

    RETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(replace(@s,CHAR(9),' '))),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' ')

    END;

    GO

    IF OBJECT_ID('dbo.NormalizeWhiteSpace8K') IS NOT NULL

    DROP FUNCTION dbo.NormalizeWhiteSpace8K

    GO

    CREATE FUNCTION dbo.NormalizeWhiteSpace8K(@String varchar(max))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    SELECT

    NewString =

    replace(replace(replace(replace(replace(replace(replace(

    ltrim(rtrim(@String)),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' ');

    GO

    -- MAKE SURE BOTH STILL WORK (yep)

    -----------------------------------------------------------------

    DECLARE @string varchar(8000);

    SET @string = '*'+REPLICATE(' ',7998)+'*';

    SELECT * FROM dbo.NormalizeWhiteSpace8K(@string);

    SELECT dbo.fn_CleanUp_MichaelMeierruth_MKIV(@string);

    --THE TESTS

    -----------------------------------------------------------------

    SET NOCOUNT ON

    IF OBJECT_ID('tempdb..#vals') IS NOT NULL DROP TABLE #vals;

    CREATE TABLE #vals (id int identity primary key, val varchar(8000) not null);

    INSERT #vals (val)

    SELECT TOP 100000 '*'+REPLICATE(' ',7998)+'*'

    FROM sys.all_columns a, sys.all_columns b;

    dbcc freeproccache with no_infomsgs

    dbcc dropcleanbuffers with no_infomsgs

    PRINT '========== itvf version ==========';

    DECLARE

    @result varchar(8000),

    @StartTime DATETIME = GETDATE();

    SELECT @result = NewString

    FROM #vals

    CROSS APPLY dbo.NormalizeWhiteSpace8K(val);

    PRINT CAST(@@rowcount AS varchar(10))+' rows.';

    PRINT DATEDIFF(ms,@StartTime,GETDATE());

    GO 3

    PRINT '========== scalar version ==========';

    DECLARE

    @result varchar(8000),

    @StartTime DATETIME = GETDATE();

    SELECT @result = dbo.fn_CleanUp_MichaelMeierruth_MKIV(val)

    FROM #vals;

    PRINT CAST(@@rowcount AS varchar(10))+' rows.';

    PRINT DATEDIFF(ms,@StartTime,GETDATE());

    GO 3

    /*

    --10,000 Records

    Beginning execution loop

    ========== itvf version ==========

    10000 rows.

    1450

    ========== itvf version ==========

    10000 rows.

    1380

    ========== itvf version ==========

    10000 rows.

    1413

    Batch execution completed 3 times.

    Beginning execution loop

    ========== scalar version ==========

    10000 rows.

    2300

    ========== scalar version ==========

    10000 rows.

    2290

    ========== scalar version ==========

    10000 rows.

    2276

    Batch execution completed 3 times.

    */

    /*

    -- 50,000 rows

    Beginning execution loop

    ========== itvf version ==========

    50000 rows.

    7166

    ========== itvf version ==========

    50000 rows.

    7116

    ========== itvf version ==========

    50000 rows.

    7070

    Batch execution completed 3 times.

    Beginning execution loop

    ========== scalar version ==========

    50000 rows.

    11503

    ========== scalar version ==========

    50000 rows.

    11530

    ========== scalar version ==========

    50000 rows.

    11493

    Batch execution completed 3 times.

    */

    /*

    -- 100,000 rows

    Beginning execution loop

    ========== itvf version ==========

    100000 rows.

    14426

    ========== itvf version ==========

    100000 rows.

    14226

    ========== itvf version ==========

    100000 rows.

    14226

    Batch execution completed 3 times.

    Beginning execution loop

    ========== scalar version ==========

    100000 rows.

    22900

    ========== scalar version ==========

    100000 rows.

    23076

    ========== scalar version ==========

    100000 rows.

    22943

    Batch execution completed 3 times.

    */

    Surely both of these methods would benefit from using a binary collation, allowing the REPLACE function to execute faster?

    For example, adding COLLATE Latin1_General_BIN2 to the function's inner @string parameter reference gives a significant speed improvement in my tests.

    Maybe there's a downside I haven't thought of, but I'd have thought a space is a space is a space in any collation.

    I have to try Latin1_General_BIN2. I tested with Latin1_General_BIN and saw no improvement.

    "I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

    -- Itzik Ben-Gan 2001