Blog Post

Cleaning up excess contiguous spaces in strings. A mystery.

,

/* Have you ever come across the problem where you have to remove spaces from a string that has too many? You'll know that if you do the obvious ...*/
SELECT REPLACE ('this         has            too                          many                               spaces' ,'  ', ' ')
-- ... you end up with your problem only slightly diminished.
--------------------------------------------------------------
--this     has      too             many                spaces
--------------------------------------------------------------

/* you could, of course, just repeat this until all the extra spaces are gone, but this isn't going to help if you need to do this rapidly as an inline process as un update to a huge table of strings. Yet, this sort of data-cleaning is done a lot.
I must admit that I've always used a scalar function for this, which hid an iterative process. */
DECLARE @trimmed VARCHAR(255) ,
  
@LastTrimmed INT

SELECT  @trimmed = 'this    has           too  many        spaces' ,
        
@LastTrimmed = 0

WHILE LEN(@Trimmed) <> @LastTrimmed
  
SELECT  @LastTrimmed = LEN(@Trimmed) ,
          
@trimmed = REPLACE(@trimmed, '  ', ' ')
SELECT  @Trimmed

/* but when I thought about it, (I was re-writing Robyn Page's String Manipulation Workbench for her, to bring it up to date) it seemed inefficient. This is a lot better as long as you don't have more than 208 continuous spaces. (no, you're not going to find that in an address) */

SELECT
  
REPLACE
  
(REPLACE
      
(REPLACE
        
(REPLACE
            
(REPLACE
              
(
              
'this           has              too                            many                                 spaces' ,
              
REPLICATE(CHAR(32), 6),CHAR(32)),
            
REPLICATE(CHAR(32), 5), CHAR(32)),
      
REPLICATE(CHAR(32), 4), CHAR(32)),
    
REPLICATE(CHAR(32), 3), CHAR(32)),
  
REPLICATE(CHAR(32), 2), CHAR(32))

/* If you test this out, you'll find that it will only clean out 38 contiguous spaces if you miss out the replacement for six consecutive spaces, and 10 if you miss out both the five and six consecutive spaces.
If, instead of 2 3 4 5 6, you chose 2 3 4 7 21 you get upto 461 contiguous spaces cleaned up rather then 208.
What is the maths behind this then?Is this the optimum sequence, or is there something better? (2 3 4 6 21 seems to do as well)

Help! Does anyone know of a helpful mathematician? I'd love an explanation
*/

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating