Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1617181920»»»

REPLACE Multiple Spaces with One Expand / Collapse
Author
Message
Posted Thursday, November 19, 2009 6:58 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Wednesday, May 08, 2013 10:33 AM
Points: 10,989, Visits: 10,529
A quick stand-alone repro for the collation thing:

USE tempdb;
GO
CREATE TABLE dbo.SpaceTest
(
Data VARCHAR(4000) NOT NULL
);
GO
INSERT dbo.SpaceTest
SELECT TOP (10000)
SPACE(4000)
FROM master.sys.allocation_units A1,
master.sys.allocation_units A2,
master.sys.allocation_units A3,
master.sys.allocation_units A4;
GO
DECLARE
@BitBucket VARCHAR(4000);

SET STATISTICS TIME ON

-- SLOW
SELECT
@BitBucket = --dbo.fn_CleanUp_MichaelMeierruth_MKII(Data)
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(Data COLLATE LATIN1_GENERAL_CI_AS,
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' ')
FROM
dbo.SpaceTest;

-- FAST
SELECT
@BitBucket = --dbo.fn_CleanUp_MichaelMeierruth_MKII(Data)
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(Data COLLATE LATIN1_GENERAL_BIN,
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' ')
FROM
dbo.SpaceTest;

SET STATISTICS TIME OFF;

DROP TABLE dbo.SpaceTest;

The other odd thing is that if I change the data type in the table from VARCHAR(4000) to CHAR(4000), the problem goes away. I'm convincing myself that this is either an optimizer bug, or some effect of the complex collation precedence rules (coercible-default and all that).

Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #821565
Posted Thursday, November 19, 2009 7:20 AM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 7:38 AM
Points: 3,573, Visits: 5,109
Paul, I seem to recall other topics from the years whereby BIN collation was the way to go for efficiency reasons if that was your most important criteria.

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #821591
Posted Thursday, November 19, 2009 7:44 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, May 08, 2013 3:35 AM
Points: 224, Visits: 629
Hi Paul
Would the script pick up collation sequence from master.sys.allocation_units ?
Just wondering if "sp_help SpaceTest" shows a bit collation..?

Why don't we standardise and all use the same collation sequence, life would be easier

Screw


C# Gnu
____________________________________________________


Excel Database Tasks V2 ..
Multi user data edit Task based solution - Excel with SQL Server / Access / MySQL ..


Version 2.5 released - See Blog
Post #821620
Posted Thursday, November 19, 2009 8:42 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, May 08, 2013 3:35 AM
Points: 224, Visits: 629
Hi
I have added tab support to Michael's code by initially replacing tab with a space, which might not be best aproach.

Anyhow here is the code and the leader board:

CREATE FUNCTION dbo.fn_CleanUp_MichaelMeierruth_MKIV(@S VARCHAR(8000))
RETURNS VARCHAR(8000)
BEGIN
RETURN replace(replace(replace(replace(replace(replace(replace(ltrim(rtrim(replace(@s,CHAR(9),' '))),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' '),
' ',' ')
END

It has however reduced performance, maybe there is a better way to achieve this in SQL.

SQL function: Replace Technique : but using replacement chars ||*9*9||
2785
SQL function :Brigzy (C#Screw) Looping
2430
SQL function: Jeff Original : single bell char
1958
SQL function: Michael Meierruth : with hard coded spaces and TAB support
484

CLR: Brigzy (C#Screw) Looping
393
CLR: Not looping using ||*9*9|| technique in C#
383
SQL function: Michael Meierruth : with hard coded spaces
324
SQL function: Michael Meierruth Original
314
SQL function: Michael Meierruth III - extra replace
312

CLR: using Jeff's single char technique in C#
282
CLR: C#Screw II - Spaces and TABS
229
CLR: Flo - Spaces and TABS - v.compact
175
CLR: Michael Meierruth - C#Screw conversion: hard coded spaces
171
CLR: Michael Meierruth - C#Screw conversion: with extra Replace
171
CLR :using JCBnew space replacer C#
167
CLR: using Flo s space replacer C#
140
CLR: using Paul White space replacer C#
114
CLR: C#Screw : Slightly reduced Paul White's function and Added Support TABS
112

CLR: Paul White's function, reduced by C#Screw with Pointers ideas from Mark-101232
112
CLR: Paul White's function with Pointers by Mark-101232
104


C# Gnu
____________________________________________________


Excel Database Tasks V2 ..
Multi user data edit Task based solution - Excel with SQL Server / Access / MySQL ..


Version 2.5 released - See Blog
Post #821676
Posted Thursday, November 19, 2009 9:56 AM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, May 08, 2013 3:35 AM
Points: 224, Visits: 629
Hang on a minute !!!

I realised that doing all replacements / loops whatever in the real world could well be a waste of time.

This is because it is surely more lilely that the data doesn't include any double spaces! which might be true of say 95% of your data?

So we only want to go to each level of replace if we really neeed to ... so recursion seems good:

CREATE FUNCTION dbo.fn_CleanUp_Recursion_Brigzy(@Data varchar(max))
RETURNS VarChar(Max)
AS
BEGIN
IF CHARINDEX(' ',@Data) > 0
BEGIN
SET @Data = REPLACE(@Data,' ',' ')
IF CHARINDEX(' ',@Data) > 0
SELECT @Data = dbo.fn_CleanUp_Recursion_Brigzy(@Data)
END
RETURN @Data
END

Unfortuanately our test data is FULL of double spaces so this doesn't appear high in the leader
board:


SQL function: Jeff Original : single bell char
5279
SQL function: C#Screw : Recursive
2636
SQL function: Replace Technique : but using replacement chars ||*9*9||
2611
SQL function :Brigzy (C#Screw) Looping
2514
SQL function: Michael Meierruth : with hard coded spaces and TAB support
391
SQL function: Michael Meierruth Original
318
CLR: Brigzy (C#Screw) Looping
313
SQL function: Michael Meierruth III - extra replace
306
SQL function: Michael Meierruth : with hard coded spaces
305
CLR: Not looping using ||*9*9|| technique in C#
280
CLR: using Jeff's single char technique in C#
236
CLR: C#Screw II - Spaces and TABS
189
CLR: Flo - Spaces and TABS - v.compact
171
CLR :using JCBnew space replacer C#
167
CLR: Michael Meierruth - C#Screw conversion: with extra Replace
156
CLR: Michael Meierruth - C#Screw conversion: hard coded spaces
150
CLR: C#Screw : Slightly reduced Paul White's function and Added Support TABS
116
CLR: Paul White's function, reduced by C#Screw with Pointers ideas from Mark-101232
112
CLR: using Flo s space replacer C#
108
CLR: using Paul White space replacer C#
106
CLR: Paul White's function with Pointers by Mark-101232
102

However if test data contained allot less double spaces this could look very different?



What do you think?

Edit: Anyone got some real data to try it on?


C# Gnu
____________________________________________________


Excel Database Tasks V2 ..
Multi user data edit Task based solution - Excel with SQL Server / Access / MySQL ..


Version 2.5 released - See Blog
Post #821759
Posted Thursday, November 19, 2009 12:31 PM


SSC Veteran

SSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC VeteranSSC Veteran

Group: General Forum Members
Last Login: Wednesday, May 08, 2013 3:35 AM
Points: 224, Visits: 629
Hi folks,
Well this is the 'last post' from me but this is what things look like if there are no double spaces in the data at all:

SQL function: Michael Meierruth : with hard coded spaces and TAB support
863
SQL function: Replace Technique : but using replacement chars ||*9*9||
858
SQL function: C#Screw : Recursive
837
SQL function: Michael Meierruth III - extra replace
728
SQL function: Michael Meierruth Original
716
SQL function: Michael Meierruth : with hard coded spaces
660
SQL function :Brigzy (C#Screw) Looping
503

SQL function: Jeff Original : single bell char
444

CLR: Michael Meierruth - C#Screw conversion: with extra Replace
317
CLR: Michael Meierruth - C#Screw conversion: hard coded spaces
307
CLR: using Jeff's single char technique in C#
298
CLR: Not looping using ||*9*9|| technique in C#
268
CLR: C#Screw II - Spaces and TABS
256
CLR: Brigzy (C#Screw) Looping
246
CLR: Flo - Spaces and TABS - v.compact
232
CLR :using JCBnew space replacer C#
230
CLR: using Paul White space replacer C#
224
CLR: Paul White's function, reduced by C#Screw with Pointers ideas from Mark-101232
221
CLR: Paul White's function with Pointers by Mark-101232
214
CLR: using Flo s space replacer C#
210
CLR: C#Screw : Slightly reduced Paul White's function and Added Support TABS
205

My own conclusions
-----------------------------------
I wouldn't like to explain some of the results above entirely - but it is nice to see Jeff has the fastest SQL, and interesting to see SQL looping just behind.

The SQL looping does very well when there are no double spaces as it drops out on the first line of code. How Jeff's SQL can be faster than that I cannot explain.

CLR should be at most least 2x faster; so only but considerably faster if there really is a hefty stack of looping/string work to do.

So there we are folks, almost full circle


C# Gnu
____________________________________________________


Excel Database Tasks V2 ..
Multi user data edit Task based solution - Excel with SQL Server / Access / MySQL ..


Version 2.5 released - See Blog
Post #821879
Posted Thursday, November 19, 2009 1:31 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Wednesday, May 08, 2013 10:33 AM
Points: 10,989, Visits: 10,529
Great stuff 'C# screw' - awesome work on this thread



Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #821924
Posted Thursday, November 19, 2009 1:40 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Wednesday, May 08, 2013 10:33 AM
Points: 10,989, Visits: 10,529
TheSQLGuru (11/19/2009)
Paul, I seem to recall other topics from the years whereby BIN collation was the way to go for efficiency reasons if that was your most important criteria.

Oh absolutely - though I also seem to recall that the SQL collations are generally more efficient than the Windows collations on non-Unicode data - I think that's right.

BIN surely makes a difference; in fact when using CHAR (to avoid the problem in the first place) the BIN collation is 5-15% faster than my default collation. I must admit I am less than clear how sort order affects nested REPLACEs, but at least REPLACE is documented as being collation-aware so there is some sense to it.

The real issue though, is what the engine is doing when using VARCHAR with a non-BIN collation. Recall that with a BIN collation, Michael's method took 2 seconds, but with LATIN1_GENERAL_CI_AS the execution time blew out to 40 seconds! There is only a very small difference in the XML show plan (an extra non-implicit convert to VARCHAR(4000) directly on the column) but it is hard to see why that should make so much difference.

It's almost as if there are optimizations which allow in-place string replacement which are defeated when using VARCHAR with collation conversions. I can only imagine that the strings are being copied in memory for each row (maybe once per REPLACE) rather than being modified in place. The sheer number of allocations and copies might go some way to explain the difference.

I will take another quick look today, but unless something obvious turns up, or someone else out there has a brainwave, this is heading for Connect for sure.

Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #821927
Posted Thursday, November 19, 2009 1:46 PM


Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Today @ 7:38 AM
Points: 3,573, Visits: 5,109
I am not sure that blowing out a varchar(4000) conversion for lord knows what is a 'minor plan difference'!! :)

I bet the memory crap is happening for every iteration of the REPLACE (perhaps with the vc4K thing thrown in for good measure). That really could be some significant CPU/memory hits.

Speaking of which, can the main tester for this give a report back on CPU utilization for the test runs? And if you feel frisky, check wait stats deltas for each run. I am wondering if we might not see something show up there indicating memory and/or cpu pressure.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #821932
Posted Thursday, November 19, 2009 1:54 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Wednesday, May 08, 2013 10:33 AM
Points: 10,989, Visits: 10,529
TheSQLGuru (11/19/2009)
I am not sure that blowing out a varchar(4000) conversion for lord knows what is a 'minor plan difference'!! :)

Ha, yeah. No, I just meant that the XML show plan is visually very, very similar (the graphical plan appears identitical!) there's just one small extra node there...

Just so everyone knows, my timing results have always been CPU-only. I will admit to being lazy and using SET STATISTICS TIME instead of total_worker_time / execution_count from sys.dm_exec_query_stats or Profiler which I normally prefer.

Thanks for the other thoughts too!

Paul




Paul White
SQL Server MVP
SQLblog.com
@SQL_Kiwi
Post #821942
« Prev Topic | Next Topic »

Add to briefcase «««1617181920»»»

Permissions Expand / Collapse