REPLACE Multiple Spaces with One

  • Heh... it's not a fault... it's a feature! Makes it real easy to look good by doing a "performance improvement". 😛

    Actually, I'm tickled at the response, Paul. I've had and held back several ideas just because I didn't believe Connections worked that well.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (12/5/2009)


    Heh... it's not a fault... it's a feature! Makes it real easy to look good by doing a "performance improvement". 😛

    Certainly one trick that has moved up in my toolkit! At some stage, I might go back and see what impact the collation thing might have on the string-splitting routines...might be interesting?

    Jeff Moden (12/5/2009)


    Actually, I'm tickled at the response, Paul. I've had and held back several ideas just because I didn't believe Connections worked that well.

    I'm very heartened to see the response to Connect items improving. I think it helps having the idea 'peer-reviewed' on a site like this before submission - and it definitely seems to help if a good repro is included too.

  • I absolutely agree. It helps a lot if you can demo an "easily repeatable problem" as well. We all hate intermitent problems and the good folks at MS are no exception.

    Hats off to you for the submittal. Guess I have to get off my hiney and start submitting.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • C# Screw (11/18/2009)


    Hi Old Hand

    I hope you don't mind I wrapped you up in function

    >code cut>

    And here you are :

    Beginning execution loop

    Batch execution completed 10000 times.

    SQL: using ||*9*9||

    6570

    SQL: Jeffs single char

    3053

    SQL:Looping

    2696

    SQL:Old Hand

    1720

    CLR: looping

    453

    CLR: Not looping using ||*9*9|| technique in C#

    293

    CLR: Not looping using Jeff's single char technique in C#

    253

    CLR: using JCBnew space replacer C#

    163

    CLR: using Paul White space replacer C#

    110

    CLR: using Flo s space replacer C#

    110

    It would be really good to have GSquared's code in this comparison too. I'm so far behind on this discussion that maybe someone has already suggested this, and you've alreasy done it. The difficuly part of that of course is working out what value 1+2**N in Michael Merrieuth's code is the sensible match for the value 10 in GSquared's code. I think that for reasonable vales of both will give GSquared a small advantage, but I don't have an SQL 2000 system to measure it in (and that was the original target).

    Tom

  • Michael Meierruth (11/18/2009)


    There is a recursive effect going on which I'm still trying to understand from a methematical point of view, i.e. for a given N+1 spaces what is the maximum length of blanks that can be reduce to 1 blank using the above approach.

    I think there is an obvious lower bound to the maximum number of successive blanks it will handle (assuming it works and doesn't sometimes leave 2 adjacent blanks - I haven't made it fail but although I think I've checked the logic and I think it works this is easy to get wrong; I guess the "+1" is there to eliminate leaving two adjacent blanks?). Anyway, I am sure it will work with anything up to 2**(X*(X+1)/2) successive blanks if (2**X)+1 is the highest length in the replace commands. I feel it will work a bit larger (somewhere around than (1+2**-X) times that) but don't trust my combinatorial math instincts any more (it's more than 4 decades since I thought of myself as a mathematician rather than a Computer Scientist). The G-squared code (using all numbers from 10 down to 2) should work up to about N! successive blanks where N is the length of the string replaced in the innermost replace, so about 3.25 million with the original 10 limit (so I don't understand Geoff's comment about going up to 13 to get to 8000 at all). Stopping at 65 will get you about 2/3 of the way there. I also think using 65,33,17,9,5,3,2 may well be slower than using 10,9,8,7,6,5,4,3,2 - at least when the average blanks stretch is pretty short (below 64, say)

    Tom

  • Paul White (12/5/2009)


    TheSQLGuru (12/5/2009)


    Oh dude, wouldn't that be SWEET! Now watch them say "yep, it's a bug, but we aren't gonna fix it because ..." 🙁

    True, true, true. But I'm staying positive on this one: even if it turns into a "won't fix" or "maybe in some future version", I'm happy that they've taken it seriously. If nothing else, I'll have learnt some new stuff about SQL internals, and that's always good!

    I was fully ready for a fob-off answer on this one, so I have to give huge credit for the work they've done so far. A lot of stuff gets posted on connect, much of it of very variable quality, so pretty happy about getting a well-researched answer. That's three times in a month or so now...a welcome trend.

    I noticed the status was "Closed as By Design". The response is good in that it contains some genuine analysis of what is going on (much better than most responses) but I suspect that that status means it's going nowhere. Or am I being unduly pessimistic?

    On a rally pessimistic note, perhaps the sentence you quoted contained a simple typo: omitted "not"?

    Tom

  • Tom.Thomson (1/22/2010)


    I noticed the status was "Closed as By Design". The response is good in that it contains some genuine analysis of what is going on (much better than most responses) but I suspect that that status means it's going nowhere. Or am I being unduly pessimistic?

    On a rally pessimistic note, perhaps the sentence you quoted contained a simple typo: omitted "not"?

    There are only a limited number of ways in which a report can be closed - Won't Fix, Fixed, By Design and so on. My hope is that although marked as 'By Design', someone will take note of the unexpectedly high cost of the current algorithms, and give them some attention at some stage.

    For the moment, I am happy with the explanation provided, and will certainly give collation a close look whenever I code something that requires many string comparisons, especially if the default collation is a Windows one.

  • I have written a procedure a while back which replaces 2 space with 1 while in loop of max 30 occurances. Not rbar, but still not the best solution that I knew forsure.

    But now looking at your solution, it is so easy! Many thanks, and I'm changing my code right after this post! :-D:-D:-D

  • Thanks, Jeff. Also some interesting stuff about control characters, something I've never really considered before.

    Tom

    Life: it twists and turns like a twisty turny thing

  • Thanks for the update, Jeff.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • chris.stuart (3/18/2011)


    I have written a procedure a while back which replaces 2 space with 1 while in loop of max 30 occurances. Not rbar, but still not the best solution that I knew forsure.

    But now looking at your solution, it is so easy! Many thanks, and I'm changing my code right after this post! :-D:-D:-D

    Oh, be careful, please. My stuff is super simple but it's not the best. If you look at the first post on this thread, there's a link to a different post on this thread for the fastest. I know it's the fastest because I also tested it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • hodgy (3/18/2011)


    Thanks, Jeff. Also some interesting stuff about control characters, something I've never really considered before.

    Tom

    Thanks, Tom. If you'd like to know a bit more about ASCII control characters, take a look at the following link.

    http://en.wikipedia.org/wiki/ASCII

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Koen Verbeeck (3/18/2011)


    Thanks for the update, Jeff.

    You bet. Thanks for stopping by, Koen.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I have a database filled with extra spaces and I have been using this nested replace method for sometime now. However I have always been concerned about the control character I am using. This is a great discussion on practical characters to use, and even a bit of the history of some of these odd characters. Thanks! 🙂

  • Daniel Bowlin (3/18/2011)


    I have a database filled with extra spaces and I have been using this nested replace method for sometime now. However I have always been concerned about the control character I am using. This is a great discussion on practical characters to use, and even a bit of the history of some of these odd characters. Thanks! 🙂

    I don't remember seeing anything about Jeff's childhood, I'll have to go re-read...

    ---------------------------------------------------------
    How best to post your question[/url]
    How to post performance problems[/url]
    Tally Table:What it is and how it replaces a loop[/url]

    "stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."

Viewing 15 posts - 331 through 345 (of 425 total)

You must be logged in to reply to this topic. Login to reply