REPLACE Multiple Spaces with One

  • Paul White (11/18/2009)


    - no doubt by then someone will have found an improvement which runs in negative time :laugh:Paul

    Um, tell the vendor to clean it up on their end? 😀

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • vliet (11/18/2009)


    Hello Jeff,

    Thank you so much for posting this solution! We will apply your method in a SSIS Derived Column Transformation on data coming from raw files. This data needs to be cleansed before it even reaches the SQL Server staging tables, so we can not use CLR functions or SQL functions.

    I will let you know how it performs, I'll try to use the horizontal tab (ASCII code 9) as the special character because this character will be replaced by a single space in this same transformation step. I'm curious about the performance in this context, but I must admit that your solution is much more elegant than my initial version with the 65, 33, 17, 9, 5, 3 an 2 spaces nested REPLACE construction. Your method can be applied on very wide columns without a limit to the number of contiguous spaces, and all it needs is one special character.

    We can use the same expression for every string column that needs to be cleansed and we only need to replace the column name once. This is an aspect of your method that comes in handy in real life situations where you need to cleans more than a few columns of a few tables.

    Keep up the good work!

    Dony.

    Hi Dony... thanks for the great feedback. BUT... as so often happens (it's almost designed to happen because so many good people get involved), the method I posted has been soundly beaten by, you guessed it, a nested replace very similar to what you said. Please see the following post on this thread....

    http://www.sqlservercentral.com/Forums/Topic819042-203-3.aspx#BM820760

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Paul White (11/18/2009)


    Hey everyone, just a quick note before I go to work this morning. I love the way this thread it going - huge thanks especially to C# Screw (lol!) and Flo for tidying my (still slightly amateur) C# as usual! I have a fairly busy morning ahead but will look in this afternoon - no doubt by then someone will have found an improvement which runs in negative time :laugh:

    I hope Jeff pops by soon because I'm really interested in the huge performance difference (10x) I found working with the different collations with his cool method.

    Paul

    I'm at work right now so I can only provide quick "verbal" answers... can't take the time to test any code at work. I'll give the collation code a test tonight... very interesting find you made, Paul. Heh... more proof that SQL is NOT SQL.... well, unless it's Tuesday, it's raining, and you have precisely 3 pork chops on the thaw. 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I am wondering if most usefull/usable Community result should include strip multiple Tab characters as well?

    Thanks

    Screw

    10k rows

    Each execution completed 10 times.

    SQL function: Replace Technique : but using replacement chars ||*9*9||

    2635

    SQL function :Brigzy (C#Screw) Looping

    2517

    SQL function: Jeff Original : single bell char

    1884

    CLR: Brigzy (C#Screw) Looping

    332

    SQL function: Michael Meierruth Original

    332

    CLR: Not looping using ||*9*9|| technique in C#

    312

    SQL function: Michael Meierruth : with hard coded spaces

    304

    CLR: using Jeff's single char technique in C#

    243

    CLR: C#Screw II - Spaces and TABS

    181

    CLR :using JCBnew space replacer C#

    160

    CLR: Michael Meierruth - C#Screw conversion: hard coded spaces

    143

    CLR: using Flo s space replacer C#

    121

    CLR: using Paul White space replacer C#

    105

    [SqlFunction()]

    public static string fn_TidySpace_SharpScrewII_CLR(string s)

    {

    StringBuilder sb = new StringBuilder();

    string[] parts = s.Split(new char[] { ' ','\t' }, StringSplitOptions.RemoveEmptyEntries);

    int size = parts.Length;

    for (int i = 0; i < size; i++)

    sb.AppendFormat("{0} ", parts);

    return sb.ToString();

    }

  • Hi C# Screw

    Good point, maybe try this instead of the StringBuilder:

    string[] parts = s.Split(new char[] { ' ', '\t' });

    return string.Join(" ", parts);

    But I think the char-loop based solutions will still be much faster, since they do less memory copies.

    Greets

    Flo

  • Flo - thank you so much for the Tui billboard :laugh:

    I should have known you'd get your revenge! Man, that made me snigger out loud at my desk....!

  • Paul White (11/18/2009)


    Flo - thank you so much for the Tui billboard :laugh:

    I should have known you'd get your revenge! Man, that made me snigger out loud at my desk....!

    Always welcome! 😀

    Remember, what you give is what you get.

    (... Damn! Yesterday I just called you crazy :pinch: )

  • Florian Reischl (11/18/2009)


    Hi C# Screw

    Good point, maybe try this instead of the StringBuilder:

    string[] parts = s.Split(new char[] { ' ', '\t' });

    return string.Join(" ", parts);

    But I think the char-loop based solutions will still be much faster, since they do less memory copies.

    Greets

    Flo

    Hi Flo

    I tried the above but it is returning the original string :unsure:

  • C# Screw (11/18/2009)


    Florian Reischl (11/18/2009)


    Hi C# Screw

    Good point, maybe try this instead of the StringBuilder:

    string[] parts = s.Split(new char[] { ' ', '\t' });

    return string.Join(" ", parts);

    But I think the char-loop based solutions will still be much faster, since they do less memory copies.

    Greets

    Flo

    Hi Flo

    I tried the above but it is returning the original string :unsure:

    Oups, forgot the StringSplitOptions... Try this:

    string[] parts = s.Split(new char[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);

    return string.Join(" ", parts);

    Greets

    Flo

    Edit:Removed the leading spaces

  • Yep thats it Flo:

    10k rows

    10x execution AVG

    SQL function: Replace Technique : but using replacement chars ||*9*9||

    2648

    SQL function :Brigzy (C#Screw) Looping

    2538

    SQL function: Jeff Original : single bell char

    1891

    CLR: Brigzy (C#Screw) Looping

    348

    SQL function: Michael Meierruth : with hard coded spaces

    342

    SQL function: Michael Meierruth Original

    317

    CLR: Not looping using ||*9*9|| technique in C#

    296

    CLR: using Jeff's single char technique in C#

    255

    CLR: C#Screw II - Spaces and TABS

    197

    CLR: Flo - Spaces and TABS - v.compact

    169

    CLR :using JCBnew space replacer C#

    162

    CLR: Michael Meierruth - C#Screw conversion: hard coded spaces

    152

    CLR: using Paul White space replacer C#

    109

    CLR: using Flo s space replacer C#

    108

  • Oh look ! I have become a ValuedMember :blush:

    Screw

  • C# Screw (11/18/2009)


    Oh look ! I have become a ValuedMember :blush:

    Congrats Screw!

    And thanks for all your time to do those tests 🙂

    Greets

    Flo

  • I noticed that 'vliet' in a post earlier today seems to know about my binary approach calling it the "65, 33, 17, 9, 5, 3 an 2 spaces nested REPLACE construction".

    I did notice one bug in the code I published. It appears to not work for strings of length K*2^N+1 for K>1 and 2^N+1 corresponding to the largest string substitution you are doing (33). Strings of this length would end up in a string of length 2 (e.g. 65, 97, 129, etc). Thus one additional replace(@s," "," ") is needed to handle these relatively rare cases. Here is the correct version for C#Screw because this might slow things down a nanosecond or so and put me in a different place in the leaderboard.

    CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKII(@S VARCHAR(8000))

    RETURNS VARCHAR(8000)

    BEGIN

    RETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(@s)),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' ')

    END

    I have also done some quick simulations to see maximum lengths of strings that can be handled with each value of N. These are the results so far:

    N=1 10

    N=2 38

    N=3 286

    N=4 4622

    N=5 151534

    Thus 2^N+1 = 33 for N=5 covers SQL Server's max length of 8000 quite nicely. No need for vliet's 65 for which I haven't figured out yet the maximum length string it could handle. From the above sequence it looks like something very big. Can anyone see the formula for this sequence?

    OK. Thought about it a bit.

    For N=6 it appears to be 9845678.

    The formula seems to be 2^(N+1)+Product for N=0 to N of 2^N+1

    E.g. for N=5 we get 64 + 33*17*9*5*3*2 = 151534

  • Michael,

    here you are now (included change to c# version):

    Slick T-SQL version to strip Spaces and TABS anyone?

    Time to sleep here shortly in UK ...:doze:

    Thanks

    Screw

    10k rows

    Each execution completed 10 times.

    SQL function: Replace Technique : but using replacement chars ||*9*9||

    2638

    SQL function :Brigzy (C#Screw) Looping

    2511

    SQL function: Jeff Original : single bell char

    1866

    SQL function: Michael Meierruth Original

    321

    CLR: Brigzy (C#Screw) Looping

    316

    SQL function: Michael Meierruth III - extra replace

    306

    SQL function: Michael Meierruth : with hard coded spaces

    305

    CLR: Not looping using ||*9*9|| technique in C#

    287

    CLR: using Jeff's single char technique in C#

    248

    CLR: C#Screw II - Spaces and TABS

    181

    CLR: Flo - Spaces and TABS - v.compact

    167

    CLR :using JCBnew space replacer C#

    160

    CLR: Michael Meierruth - C#Screw conversion: with extra Replace

    149

    CLR: Michael Meierruth - C#Screw conversion: hard coded spaces

    145

    CLR: using Flo s space replacer C#

    109

    CLR: using Paul White space replacer C#

    104

  • Mark-101232 (11/18/2009)


    Just out of interest I converted your code to use pointers which improved performance by 5-10%. Not that I would recommend using UNSAFE code though...

    Hey Mark,

    Wouldn't mind seeing that code if you have time to post it - I considered using pointers but ran out of talent! Same goes for a C++ implementation. The pointer thing should be faster, probably more than 5-10%, but IIRC unmanaged code means the thread has to go pre-emptive rather than co-operative, so that probably slows it down a bit overall...? Anyway, would be nice to see the code 🙂

    Paul

Viewing 15 posts - 136 through 150 (of 425 total)

You must be logged in to reply to this topic. Login to reply