REPLACE Multiple Spaces with One

  • Andrew..Peterson

    SSCertifiable

    Points: 6607

    Simply brilliant!

    The more you are prepared, the less you need it.

  • Matt Miller (4)

    SSC Guru

    Points: 124136

    Sorry to ask a most likely silly question - has anyone tried out the STRING_SPLIT and STRING_AGG functions to do this?  I've not had a chance to put my hands on a SQL 2016 version, but I've read good things about their performance.

    Something along the lines of 

    Select string_agg(x.value,' '), rawstring from mytable cross apply string_split(rawstring,' ') s(x)
    where x.value<>''

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Jeff Moden

    SSC Guru

    Points: 993628

    STRING_AGG (Transact-SQL)

    <!-- -->THIS TOPIC APPLIES TO: yesSQL Server (starting with 2017)

    It's not available in SQL Server 2016 so I'm in the same boat as you even though I have 2016.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
    "If you think its expensive to hire a professional to do the job, wait until you hire an amateur."--Red Adair
    "Change is inevitable... change for the better is not."
    When you put the right degree of spin on it, the number 3|8 is also a glyph that describes the nature of a DBAs job. 😉

    Helpful Links:
    How to post code problems

  • Matt Miller (4)

    SSC Guru

    Points: 124136

    Jeff Moden - Monday, August 6, 2018 8:40 PM

    STRING_AGG (Transact-SQL)

    <!-- -->THIS TOPIC APPLIES TO: yesSQL Server (starting with 2017)

    It's not available in SQL Server 2016 so I'm in the same boat as you even though I have 2016.

    Hmm.  thanks - I stopped at the "available on 2016" from string_split.  Will see if I can refresh my dev edition in a few days if this death march of a financial system migration doesn't kill me first :crazy:

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Matt Miller (4)

    SSC Guru

    Points: 124136

    Update - did get a chance to do a quick test.  Looks like it's "middle of the pack", i.e. about 15% slower than Michael's, but still - scales well enough.  I did try a larger set on this and no choking on it.  I tried it against Michael's function with the Fibonacci fixed set of spaces, scaled the sample size to get Michael's return set around the original results (40K rows, tested 10x each), and ended up with:

    Michael's Fixed spaces (fibo)    351ms      (40K sample)          17,557ms (1M)
    string_agg/string_split                403ms     (40K sample)          19,910ms (1M)

    Note:  the original pseudocode doesn't seem to want to work.  I was NOT able to get these functions to work with CROSS APPLY at all, so it looks more like a correlated sub instead.


    UPDATE #temp3
    SET COL3 =  (select string_agg(a.value,' ') from
                                (
                                    select value
                                               from String_Split(COL2,' ')
                                              where value<>''
                                )a
                           )
    FROM #temp3;

    My prod env is years away from having SQL 2017, so no doubt some of you will get to have better exposure to these new ones 😛

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • avery_larry

    Newbie

    Points: 7

    Michael Meierruth - Wednesday, November 18, 2009 2:42 PM

    I noticed that 'vliet' in a post earlier today seems to know about my binary approach calling it the "65, 33, 17, 9, 5, 3 an 2 spaces nested REPLACE construction".I did notice one bug in the code I published. It appears to not work for strings of length K*2^N+1 for K>1 and 2^N+1 corresponding to the largest string substitution you are doing (33). Strings of this length would end up in a string of length 2 (e.g. 65, 97, 129, etc). Thus one additional replace(@s," "," ") is needed to handle these relatively rare cases. Here is the correct version for C#Screw because this might slow things down a nanosecond or so and put me in a different place in the leaderboard.CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKII(@S VARCHAR(8000))RETURNS VARCHAR(8000)BEGINRETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(@s)), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' '), ' ',' ')ENDI have also done some quick simulations to see maximum lengths of strings that can be handled with each value of N. These are the results so far:N=1 10N=2 38N=3 286N=4 4622N=5 151534Thus 2^N+1 = 33 for N=5 covers SQL Server's max length of 8000 quite nicely. No need for vliet's 65 for which I haven't figured out yet the maximum length string it could handle. From the above sequence it looks like something very big. Can anyone see the formula for this sequence?OK. Thought about it a bit.For N=6 it appears to be 9845678.The formula seems to be 2^(N+1)+Product for N=0 to N of 2^N+1E.g. for N=5 we get 64 + 33*17*9*5*3*2 = 151534

    An ancient thread, but so much fun.  And I confess to not reading every post so I apologize if this has been brought up.

    From inner most replace to outermost replace, I think 2,3,4,7,22 gets up to 460 spaces with 5 replaces, compared to 2,2,3,5,9 getting up to 286 spaces with 5 replaces.  I believe the 6th replacement for maximum spaces would be 233 which would replace up to 51,052 spaces.
    I can't for the life of me figure out the math, though.

    But I'm fairly sure that 18 is sufficient for char(8000) as it will replace 8,008 spaces.

    Thus 18,22,7,4,3,2 theoretically could be faster than 33,17,9,5,3,2,2 because it uses 1 less replace.  I wonder if the number of spaces being replaced affects the performance?  For example; comparing 233,22,7,4,3,2 with 18,22,7,4,3,2 -- is there a performance difference?

Viewing 6 posts - 421 through 426 (of 426 total)

You must be logged in to reply to this topic. Login to reply