• My apologies, Ramesh. I should have qualified my statement. Instead of saying...

    "I believe you'll find it's a wee bit more difficult to do than just using REPLACE."

    ... I should have said...

    "I believe you'll find it's a wee bit more difficult to do it in a high performance manner using just REPLACE."

    Let's see what I mean. First, we need two test tables... "#Dirty" will contain some characters that we need to remove and "#Clean" will have no characters to be removed.

    --==============================================================================

    -- Setup the test tables

    --==============================================================================

    --===== Conditionally remove the test tables so we can easily retest

    IF OBJECT_ID('TempDB..#Dirty','U') IS NOT NULL

    DROP TABLE #Dirty;

    IF OBJECT_ID('TempDB..#Clean','U') IS NOT NULL

    DROP TABLE #Clean;

    GO

    --===== Build the "dirty" table where each row has

    -- at least 4 characters to remove

    SELECT TOP 100000

    CAST(NEWID() AS VARCHAR(100)) AS SomeText

    INTO #Dirty

    FROM Master.sys.All_Columns ac1,

    Master.sys.All_Columns ac2;

    --===== Build the "clean" table where each row has

    -- NO characters to remove

    SELECT TOP 100000

    REPLICATE('B',26) AS SomeText

    INTO #Clean

    FROM Master.sys.All_Columns ac1,

    Master.sys.All_Columns ac2;

    GO

    Now, let's put your code into a function and some slightly different code. Please... no one faint and no one write to their Congressman... I'm going to use a WHILE loop for this one because my tests show that this is one of the few places where a WHILE loop will beat a Tally table. 😛

    --==============================================================================

    -- Setup the functions to test

    --==============================================================================

    --===== Do these tests in a nice safe place that everyone has.

    USE TempDB;

    GO

    --===== Build a function using "Replace"

    CREATE FUNCTION dbo.CleanWithReplace

    (@SomeText VARCHAR(100))

    RETURNS VARCHAR(100)

    AS

    BEGIN

    DECLARE @Characters TABLE(SomeChar CHAR(1) NOT NULL)

    INSERT @Characters( SomeChar )

    SELECT 'A' UNION ALL

    SELECT 'E' UNION ALL

    SELECT '-'

    SELECT @SomeText = REPLACE( @SomeText, SomeChar, '' )

    FROM @Characters

    RETURN @SomeText

    END;

    GO

    --===== Build a function using "Stuff"

    CREATE FUNCTION dbo.RemoveNonPrintable

    -- Modified by Jeff Moden

    (@String VARCHAR(8000))

    RETURNS VARCHAR(8000) AS

    BEGIN

    DECLARE @IncorrectCharLoc SMALLINT,

    @Pattern CHAR(7)

    SELECT @Pattern = '%[-AE]%',

    @IncorrectCharLoc = PATINDEX(@Pattern, @String)

    WHILE @IncorrectCharLoc > 0

    SELECT @string = STUFF(@String, @IncorrectCharLoc, 1, ''),

    @IncorrectCharLoc = PATINDEX(@Pattern, @String)

    RETURN @string

    END

    GO

    Now, we'll run some tests. Each function will be executed against the "dirty" table and the "clean" table while measuring performance using Profiler. Please, read the comments in the code... they explain how the test was done.

    --==============================================================================

    -- Now we'll do the tests. I have Profiler measuring SQL:Batch Completed

    -- against the SPID that I'm running these tests from. Also notice that

    -- I've taken the time to take the display the results out of the picture

    -- by returning the result to a "throw away" variable (@BitBucket).

    --==============================================================================

    GO

    --===== CleanWithStuff against "dirty"

    DECLARE @Bitbucket VARCHAR(8000);

    SELECT @Bitbucket = dbo.CleanWithStuff(SomeText)

    FROM #Dirty;

    GO

    --===== CleanWithReplace against "dirty"

    DECLARE @Bitbucket VARCHAR(8000);

    SELECT @Bitbucket = dbo.CleanWithReplace(SomeText)

    FROM #Dirty;

    GO

    --===== CleanWithStuff against "clean"

    DECLARE @Bitbucket VARCHAR(8000);

    SELECT @Bitbucket = dbo.CleanWithStuff(SomeText)

    FROM #Clean;

    GO

    --===== CleanWithReplace against "clean"

    DECLARE @Bitbucket VARCHAR(8000);

    SELECT @Bitbucket = dbo.CleanWithReplace(SomeText)

    FROM #Clean;

    GO

    Here's the output from the Profiler trace...

    As you can see, the STUFF method is a whole lot faster even on just a paltry 100,000 rows. As another benefit, the REPLACE method takes about the same amount of time to wade through the rows whether there's anything to clean out or not. The STUFF method runs a lot faster because it will short-circuit out of the function if there's nothing to do.

    And, that's just to check for 3 characters that we want to remove. There are 33 non-printable characters just in the basic ASCII character set. Those characters are CHAR(0) through CHAR(31) and CHAR(127).

    Heh... I know what's coming next. A lot of people would justify using the REPLACE code by saying it will only be used on one variable at a time from a GUI and the performance is probably good enough for even 10,000 simultaneous "hits". Those people would be absolutely correct.

    What those people don't understand is that it's a function, i.e. easily "Reusable Code", and there's nothing in the code to control who uses it or why. So, some developer given an assignment to create an ETL package that will handle hundreds of million row files a day runs across the "Reusable Code" and uses it. Considering that the REPLACE function takes 30 seconds (on my humble desktop) to handle only 100,000 rows, it'll take a good 5 minutes to handle just one million row file and it's hammering the CPU the whole time. I'm thinking that would cause a major problem in the not-so-unusual scenario I just gave.

    That's what I originally meant by it not being so easy to do with REPLACE.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)