Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1213141516»»»

REPLACE Multiple Spaces with One Expand / Collapse
Author
Message
Posted Wednesday, November 18, 2009 12:39 PM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Sunday, November 4, 2012 12:23 PM
Points: 1,893, Visits: 3,932
Good morning Paul!

Paul White (11/18/2009)
... my (still slightly amateur) C# as usual!

Well, now I have to say this to you:
I don't know how much you are working with C#, but you are definitely not an amateur in .NET. Your solution really fine appears to be one of the two fastest solutions

Remember

(To all the others: This is an insider to joke to Paul)

Greets
Flo



The more I learn, the more I know what I do not know
Blog: Things about Software Architecture, .NET development and T-SQL

How to Post Data/Code to get the best Help How to Post Performance Problems


  Post Attachments 
PaulNewbie.png (366 views, 35.79 KB)
Post #821112
Posted Wednesday, November 18, 2009 12:44 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 1:17 PM
Points: 4,474, Visits: 6,407
jcrawf02 (11/18/2009)
Thought I'd tell you, you guys are just fun to watch.


And provide significant community benefit too!!


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #821117
Posted Wednesday, November 18, 2009 12:45 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 4:23 PM
Points: 35,821, Visits: 32,494
C# Screw (11/18/2009)
Jeff,
as promised : all tests combined SQL & CLR

Beginning execution loop
Batch execution completed 10000 times.
SQL:Looping
2536
SQL: using ||*9*9||
6310
SQL: Jeffs single char
2916
CLR: looping
453
CLR: Not looping using ||*9*9|| technique in C#
296
CLR: Not looping using Jeff's single char technique in C#
416 <-- consistently c# performs slower single space bell character, surprising?
Correction : Noticed one space missing in c#
CLR: Not looping using Jeff's single char technique in C#
246

I am really sorry the c# input has caused some upset, I thought it was really useful/interesting.. so 10/10 article & productive discussion.



Oh no... don't be sorry at all about that. I just wanted to be sure what we were testing against. Thank you very much for taking that time. And thanks for the 10-by marks.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #821121
Posted Wednesday, November 18, 2009 12:47 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 1:17 PM
Points: 4,474, Visits: 6,407
GSquared (11/18/2009)
Nadrek (11/18/2009)
Jeff Moden (11/17/2009)
You and another poster said about the same thing... I'm going to have to give it a try. Although it's not likely to have 8000 spaces (almost silly to even think that), I'll have to expand your good code example to 13 replaces so they're functionally identical so far as VARCHAR(8000) goes and see how that performs, as well.


Not likely, perhaps... but very common to have about that many if someone started out with a CHAR(8000), even worse if they did concatenation on a CHAR(8000)+ another VARCHAR or CHAR + LEFT() or RIGHT() to get back to 8000. Some forms of data import also lead to excessive space padding. Are they good SQL? No. Do they happen? I believe they do.

Also, yes, GSquared and I do apparently use some similar methods.


I actually discovered this method almost 20 years ago, when I was building WordPerfect macros to clean up documents that had been typed up in an ASCII text editor. Used it to clean up spaces, underlines, even page-lines created by entering arbitrary numbers of hyphens. Came in very, very handy.


I'll be damned - someone else who did macro-based document management using WordPerfect macros! How kewl is that!? I had 2.5M of macros that did a full-fledged work package management system that included change bar and version tracking, etc. Tied into a dbase IV database system too!


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #821126
Posted Wednesday, November 18, 2009 12:55 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 4:23 PM
Points: 35,821, Visits: 32,494
Michael Meierruth (11/18/2009)
Earlier on in this post I mentioned the use of a binary approach. I finally found some time to try this out.

declare @s varchar(8000)
set @s = '*' + replicate(' ',7998) + '*'
select len(@s)
set @s = replace(replace(replace(replace(replace(replace(ltrim(rtrim(@s)),
replicate(' ',32+1),' '),replicate(' ',16+1),' '),replicate(' ',8+1),' '),replicate(' ',4+1),' '),replicate(' ',2+1),' '),replicate(' ',1+1),' ')
select len(@s)
select @s

Needless to say I have tried this with Jeff's data as well as with some random data and it works just fine.
As for performance, I generated 100000 records and Jeff's code did in 110 seconds whereas my code did in 16 seconds.
I shaved off a second by replacing the calls to replicate (which I used for clarity) with hardcoded constants or variables.
Oh yes, Jeff's code ran at 100% CPU whereas my code never reached 100%.

There is a recursive effect going on which I'm still trying to understand from a methematical point of view, i.e. for a given N+1 spaces what is the maximum length of blanks that can be reduce to 1 blank using the above approach.

Of course, if someone can find a string with blanks where this doesn't work I will quietly shut up.


Whoa! Very cool... I'm going to have to try that one for sure. Thanks, Michael!


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #821133
Posted Wednesday, November 18, 2009 1:12 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, December 8, 2014 6:44 AM
Points: 1,332, Visits: 19,320
TheSQLGuru (11/18/2009)
GSquared (11/18/2009)
Nadrek (11/18/2009)
Jeff Moden (11/17/2009)
You and another poster said about the same thing... I'm going to have to give it a try. Although it's not likely to have 8000 spaces (almost silly to even think that), I'll have to expand your good code example to 13 replaces so they're functionally identical so far as VARCHAR(8000) goes and see how that performs, as well.


Not likely, perhaps... but very common to have about that many if someone started out with a CHAR(8000), even worse if they did concatenation on a CHAR(8000)+ another VARCHAR or CHAR + LEFT() or RIGHT() to get back to 8000. Some forms of data import also lead to excessive space padding. Are they good SQL? No. Do they happen? I believe they do.

Also, yes, GSquared and I do apparently use some similar methods.


I actually discovered this method almost 20 years ago, when I was building WordPerfect macros to clean up documents that had been typed up in an ASCII text editor. Used it to clean up spaces, underlines, even page-lines created by entering arbitrary numbers of hyphens. Came in very, very handy.


I'll be damned - someone else who did macro-based document management using WordPerfect macros! How kewl is that!? I had 2.5M of macros that did a full-fledged work package management system that included change bar and version tracking, etc. Tied into a dbase IV database system too!
Pipe down, us younguns can't hear the discussion over the creaking of your old bones...


---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Post #821144
Posted Wednesday, November 18, 2009 1:13 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, December 8, 2014 6:44 AM
Points: 1,332, Visits: 19,320
Paul White (11/18/2009)
- no doubt by then someone will have found an improvement which runs in negative time Paul


Um, tell the vendor to clean it up on their end?


---------------------------------------------------------
How best to post your question
How to post performance problems
Tally Table:What it is and how it replaces a loop

"stewsterl 80804 (10/16/2009)I guess when you stop and try to understand the solution provided you not only learn, but save yourself some headaches when you need to make any slight changes."
Post #821146
Posted Wednesday, November 18, 2009 1:15 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 4:23 PM
Points: 35,821, Visits: 32,494
vliet (11/18/2009)
Hello Jeff,

Thank you so much for posting this solution! We will apply your method in a SSIS Derived Column Transformation on data coming from raw files. This data needs to be cleansed before it even reaches the SQL Server staging tables, so we can not use CLR functions or SQL functions.

I will let you know how it performs, I'll try to use the horizontal tab (ASCII code 9) as the special character because this character will be replaced by a single space in this same transformation step. I'm curious about the performance in this context, but I must admit that your solution is much more elegant than my initial version with the 65, 33, 17, 9, 5, 3 an 2 spaces nested REPLACE construction. Your method can be applied on very wide columns without a limit to the number of contiguous spaces, and all it needs is one special character.

We can use the same expression for every string column that needs to be cleansed and we only need to replace the column name once. This is an aspect of your method that comes in handy in real life situations where you need to cleans more than a few columns of a few tables.

Keep up the good work!

Dony.


Hi Dony... thanks for the great feedback. BUT... as so often happens (it's almost designed to happen because so many good people get involved), the method I posted has been soundly beaten by, you guessed it, a nested replace very similar to what you said. Please see the following post on this thread....

http://www.sqlservercentral.com/Forums/Topic819042-203-3.aspx#BM820760


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #821149
Posted Wednesday, November 18, 2009 1:20 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 4:23 PM
Points: 35,821, Visits: 32,494
Paul White (11/18/2009)
Hey everyone, just a quick note before I go to work this morning. I love the way this thread it going - huge thanks especially to C# Screw (lol!) and Flo for tidying my (still slightly amateur) C# as usual! I have a fairly busy morning ahead but will look in this afternoon - no doubt by then someone will have found an improvement which runs in negative time

I hope Jeff pops by soon because I'm really interested in the huge performance difference (10x) I found working with the different collations with his cool method.

Paul


I'm at work right now so I can only provide quick "verbal" answers... can't take the time to test any code at work. I'll give the collation code a test tonight... very interesting find you made, Paul. Heh... more proof that SQL is NOT SQL.... well, unless it's Tuesday, it's raining, and you have precisely 3 pork chops on the thaw.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #821154
Posted Wednesday, November 18, 2009 1:41 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Thursday, December 18, 2014 4:56 AM
Points: 225, Visits: 646
I am wondering if most usefull/usable Community result should include strip multiple Tab characters as well?

Thanks
Screw

10k rows
Each execution completed 10 times.

SQL function: Replace Technique : but using replacement chars ||*9*9||
2635
SQL function :Brigzy (C#Screw) Looping
2517
SQL function: Jeff Original : single bell char
1884
CLR: Brigzy (C#Screw) Looping
332
SQL function: Michael Meierruth Original
332
CLR: Not looping using ||*9*9|| technique in C#
312
SQL function: Michael Meierruth : with hard coded spaces
304
CLR: using Jeff's single char technique in C#
243
CLR: C#Screw II - Spaces and TABS
181

CLR :using JCBnew space replacer C#
160
CLR: Michael Meierruth - C#Screw conversion: hard coded spaces
143
CLR: using Flo s space replacer C#
121
CLR: using Paul White space replacer C#
105

   [SqlFunction()]
public static string fn_TidySpace_SharpScrewII_CLR(string s)
{
StringBuilder sb = new StringBuilder();

string[] parts = s.Split(new char[] { ' ','\t' }, StringSplitOptions.RemoveEmptyEntries);

int size = parts.Length;

for (int i = 0; i < size; i++)

sb.AppendFormat("{0} ", parts[i]);

return sb.ToString();

}



C# Gnu
____________________________________________________


Multi user data edit / data transfer Excel Task based solution
Version 2.7 released June 2013
Post #821163
« Prev Topic | Next Topic »

Add to briefcase «««1213141516»»»

Permissions Expand / Collapse