REPLACE Multiple Spaces with One

  • Has anybody tried these solutions and measured the performance with VarChar(max) and indeed NVarChars of any length as my databases are all Unicode?

  • Hi,

    I've been coding for a few years now but still not with the level of technical expertise as shown in some of these posts. So can someone please comment on the deficiency of using the following code to remove spaces: THE DREADED LOOP

    while charIndex(' ',@newName)>0

    Set @newName = Replace(@newName,' ',' ')

    I didn't run any performance checks on this. But it would seem that in most cases it would be quite rare to have the type of excessive spacing that is being presented in the examples.

    Thanks,

    Kirklin Anglin

  • anglincis (3/18/2011)


    Hi,

    I've been coding for a few years now but still not with the level of technical expertise as shown in some of these posts. So can someone please comment on the deficiency of using the following code to remove spaces: THE DREADED LOOP

    while charIndex(' ',@newName)>0

    Set @newName = Replace(@newName,' ',' ')

    I didn't run any performance checks on this. But it would seem that in most cases it would be quite rare to have the type of excessive spacing that is being presented in the examples.

    Thanks,

    Kirklin Anglin

    ever tried processing an html file written by a good programmer?? lots a spaces!!!!!!!!!!!!!!!!!!!!

  • Ninja's_RGR'us (3/18/2011)


    anglincis (3/18/2011)


    Hi,

    I've been coding for a few years now but still not with the level of technical expertise as shown in some of these posts. So can someone please comment on the deficiency of using the following code to remove spaces: THE DREADED LOOP

    while charIndex(' ',@newName)>0

    Set @newName = Replace(@newName,' ',' ')

    I didn't run any performance checks on this. But it would seem that in most cases it would be quite rare to have the type of excessive spacing that is being presented in the examples.

    Thanks,

    Kirklin Anglin

    ever tried processing an html file written by a good programmer?? lots a spaces!!!!!!!!!!!!!!!!!!!!

    Can't resist.... and as Kirklin mentioned, wouldn't be that quite rare? :-P:-D

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

  • Maybe so from you pov, however even bad programmers still need multiple spaces to actually see what's going on!

  • I will remember this method for future usage, could have used it a few months ago. Most of the data I work with is from various text editors and I have found that multiple spacing is not always what it seems. A lot of people do not realize that a "Hard Space" (Shift+Space) is not the same as a space.

    Now that I am armed with the knowledge from this article, my "x" character will be the hard-space, so I can kill multiple birds with one stone

    Director of Transmogrification Services
  • Mad Myche (3/18/2011)


    I will remember this method for future usage, could have used it a few months ago. Most of the data I work with is from various text editors and I have found that multiple spacing is not always what it seems. A lot of people do not realize that a "Hard Space" (Shift+Space) is not the same as a space.

    Now that I am armed with the knowledge from this article, my "x" character will be the hard-space, so I can kill multiple birds with one stone

    I appreciate the feedback but you shouldn't have an "x" character because you shouldn't be using the method in the article. Go back and look at the "Prolog" where I state that a better way has been found and I provide a link to that better way. The better way is about 6 times faster than my method. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • anglincis (3/18/2011)


    Hi,

    I've been coding for a few years now but still not with the level of technical expertise as shown in some of these posts. So can someone please comment on the deficiency of using the following code to remove spaces: THE DREADED LOOP

    while charIndex(' ',@newName)>0

    Set @newName = Replace(@newName,' ',' ')

    I didn't run any performance checks on this. But it would seem that in most cases it would be quite rare to have the type of excessive spacing that is being presented in the examples.

    Thanks,

    Kirklin Anglin

    If you go back and read the (now) hundreds of posts on this thread, you'll finde some very serious testing was done across a large number of methods including various forms of WHILE loops. IIRC, there was an example such as yours.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • jcrawf02 (3/18/2011)


    Daniel Bowlin (3/18/2011)


    I have a database filled with extra spaces and I have been using this nested replace method for sometime now. However I have always been concerned about the control character I am using. This is a great discussion on practical characters to use, and even a bit of the history of some of these odd characters. Thanks! 🙂

    I don't remember seeing anything about Jeff's childhood, I'll have to go re-read...

    Heh... can you smell the pork chops coming your way? 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nice and useful script.

    The same approach is used to find number-of-string-occurrences-with-in-a-string

    look at here.

  • a.rajmane (3/19/2011)


    Nice and useful script.

    The same approach is used to find number-of-string-occurrences-with-in-a-string

    look at here.

    Thanks for the feedback and the link.

    In reference to that link for finding the number of occurances of a string within a string; it's likely better to change the search string to an empty string rather than appending a space to it because it takes longer to expand the content than it does to contract it. Concatenation is an expensive function compared to elimination by substitution. Elimination also allows you to stay within the confines of the same datatype where concatenation may cause an implicit conversion to the next wider datatype which would also be expensive.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hey All,

    Please check out this tip:

    http://www.mssqltips.com/tip.asp?tip=2297

    Sample run:

    select dbo.ReplaceMatch(

    '1 space 2 spaces 3 spaces 4 spaces no space(s)', ' {2,}', ' ')

    returns:

    -----------------------------------------------------

    1 space 2 spaces 3 spaces 4 spaces no space(s)

    It works for any number of spaces!:w00t:

    And I believe it's the fastest! :smooooth:

    With a few changes to the sample run,

    you could also use any character instead of space...;-)

    Let regular expressions do the job!:-D

    NOTE

    If you run this sample in SSMS, please make sure to have:

    1 space after '1 space'

    2 spaces after '2 spaces'

    3 spaces after '3 spaces'

    4 spaces after '4 spaces'

    0 spaces after 'no spaces(s)'

  • Michael Meierruth (11/18/2009)


    I noticed that 'vliet' in a post earlier today seems to know about my binary approach calling it the "65, 33, 17, 9, 5, 3 an 2 spaces nested REPLACE construction".

    [snipped]

    I have also done some quick simulations to see maximum lengths of strings that can be handled with each value of N. These are the results so far:

    N=1 10

    N=2 38

    N=3 286

    N=4 4622

    N=5 151534

    Fascinating thread! I may have missed this suggestion if someone already made it, but why not use the more efficient (and more intuitive) power of two series, i.e., 64, 32, ... , 2, 2? As with the original suggestion, the requirement of processing the final "2" twice remains.

    I say P-O-2 is more efficient because the number of maximum string size handled by each level of REPLACE is usually greater than in the 65, 33... series:

    N=1 (REPLACE x 2) 4

    N=2 (REPLACE x 3) 10

    N=3 (REPLACE x 4) 38

    N=4 (REPLACE x 5) 398

    N=5 (REPLACE x 6) 11806

    N=6 (REPLACE x 7) 754702

    Now before you say, "hey, those maxima aren't greater than the original!" understand that the number of replacements per N-level is different. In the 65, 33... series, you do 3 replacements at N=1 (3, 2, and 2 again). At N=5 you do 7 replacements (33,17,9,5,3,2,2), and that covers a max of 151534 characters, thus covering the original requirement of handling 8000 characters.

    In the P-O-2 series you do 3 replacements at N=*2* (4, 2, and 2 again). At N=5 you do 6 replacements (32,16,8,4,2,2), covering up to 11806 characters.

    Here then would be the revised code, which handles up to 754702 characters (vs. 151534 in the original):

    CREATE FUNCTION dbo.fn_CleanUp_InTheStyleOf_MichaelMeierruth_MKII(@S VARCHAR(8000))

    RETURNS VARCHAR(8000)

    BEGIN

    RETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(@s)),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' '),

    ' ',' ')

    END

    What I have not figured out is the relationship between N and max characters.

    Edit/Addenda:

    From my numeric model, two things are clear about the relationship between N and the maximum number of characters that can be handled i.e., MAX(N):

    MAX(N+1) = MAX(N) x 2^(N+1) + ADJ(N+1); where ADJ(N+1) decreases rapidly in significance with greater N, and

    MAX(N) -> (PRODUCT for X = 1 to N (2^X))/C; where C ~ 2.77984

  • What is strange is that my first suggestion regarding the binary recursive approach corresponds to your sequence derived from 2^N (N starting at 1). But I liked Jeff's idea so much that I didn't bother to actually try it out.

    But then curiosity killed the cat and I dug into the problem and for some reason I decided to use 2^N+1 (N starting at 0). But I can't remember why I decided that. What is pretty interesting is that both require an extra replace of 2 spaces to handle those strange cases.

    Now I will have a closer look at your variant...

  • What is pretty interesting is that both require an extra replace of 2 spaces to handle those strange cases.

    In the 2^N series the extra replace is used quite commonly -- roughly every third case on average.

    Also, the max value I stated for N=7 is not correct (close, but not quite). It still beats the 2^N+1 series in terms of characters handled per replacement level though, by far. I have a lot of interesting data about the 2^N algo but have not figured out a general expression for max characters handled. Still working on that...

Viewing 15 posts - 346 through 360 (of 425 total)

You must be logged in to reply to this topic. Login to reply