Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Remove non printable characters


Remove non printable characters

Author
Message
Grinja
Grinja
SSC-Enthusiastic
SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)

Group: General Forum Members
Points: 133 Visits: 618
Hi,
Is there an equivilent function to Excel's 'CLEAN' in SQL to remove non printable characters?.

Thanks,
G
Ramesh Saive
Ramesh Saive
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2564 Visits: 2643
I don't think so there is any equivalent function in SQL Server. But you can create you own function using the existing REPLACE function.

--Ramesh


Grinja
Grinja
SSC-Enthusiastic
SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)SSC-Enthusiastic (133 reputation)

Group: General Forum Members
Points: 133 Visits: 618
That could work ... thanks for the suggestion!
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44990 Visits: 39875
Ramesh Saive (2/5/2010)
I don't think so there is any equivalent function in SQL Server. But you can create you own function using the existing REPLACE function.


Heh.. Let's see some code. ;-) I believe you'll find it's a wee bit more difficult to do than just using REPLACE.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44990 Visits: 39875
Grinja (2/5/2010)
That could work ... thanks for the suggestion!


Which unprintable characters? Just any character less than CHAR(32)???

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Ramesh Saive
Ramesh Saive
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2564 Visits: 2643
Jeff Moden (2/5/2010)
Ramesh Saive (2/5/2010)
I don't think so there is any equivalent function in SQL Server. But you can create you own function using the existing REPLACE function.


Heh.. Let's see some code. ;-) I believe you'll find it's a wee bit more difficult to do than just using REPLACE.


Ah..., Here is the code


DECLARE @Characters TABLE( SomeChar CHAR(1) NOT NULL )
DECLARE @SomeText VARCHAR(100)

SELECT   @SomeText = 'Ticking away the moments that make up a dull day, Fritter and waste the hours in an offhand way.'

INSERT   @Characters( SomeChar )
SELECT   't' UNION ALL
SELECT   'i' UNION ALL
SELECT   'c'

SELECT   @SomeText = REPLACE( @SomeText, SomeChar, '' )
FROM   @Characters

PRINT @SomeText



--Ramesh


Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44990 Visits: 39875
My apologies, Ramesh. I should have qualified my statement. Instead of saying...

"I believe you'll find it's a wee bit more difficult to do than just using REPLACE."

... I should have said...

"I believe you'll find it's a wee bit more difficult to do it in a high performance manner using just REPLACE."

Let's see what I mean. First, we need two test tables... "#Dirty" will contain some characters that we need to remove and "#Clean" will have no characters to be removed.

--==============================================================================
-- Setup the test tables
--==============================================================================
--===== Conditionally remove the test tables so we can easily retest
IF OBJECT_ID('TempDB..#Dirty','U') IS NOT NULL
DROP TABLE #Dirty;

IF OBJECT_ID('TempDB..#Clean','U') IS NOT NULL
DROP TABLE #Clean;
GO
--===== Build the "dirty" table where each row has
-- at least 4 characters to remove
SELECT TOP 100000
CAST(NEWID() AS VARCHAR(100)) AS SomeText
INTO #Dirty
FROM Master.sys.All_Columns ac1,
Master.sys.All_Columns ac2;

--===== Build the "clean" table where each row has
-- NO characters to remove
SELECT TOP 100000
REPLICATE('B',26) AS SomeText
INTO #Clean
FROM Master.sys.All_Columns ac1,
Master.sys.All_Columns ac2;
GO



Now, let's put your code into a function and some slightly different code. Please... no one faint and no one write to their Congressman... I'm going to use a WHILE loop for this one because my tests show that this is one of the few places where a WHILE loop will beat a Tally table. :-P


--==============================================================================
-- Setup the functions to test
--==============================================================================
--===== Do these tests in a nice safe place that everyone has.
USE TempDB;
GO
--===== Build a function using "Replace"
CREATE FUNCTION dbo.CleanWithReplace
(@SomeText VARCHAR(100))
RETURNS VARCHAR(100)
AS
BEGIN
DECLARE @Characters TABLE(SomeChar CHAR(1) NOT NULL)
INSERT @Characters( SomeChar )
SELECT 'A' UNION ALL
SELECT 'E' UNION ALL
SELECT '-'

SELECT @SomeText = REPLACE( @SomeText, SomeChar, '' )
FROM @Characters

RETURN @SomeText
END;
GO
--===== Build a function using "Stuff"
CREATE FUNCTION dbo.RemoveNonPrintable
-- Modified by Jeff Moden
(@String VARCHAR(8000))
RETURNS VARCHAR(8000) AS
BEGIN
DECLARE @IncorrectCharLoc SMALLINT,
@Pattern CHAR(7)

SELECT @Pattern = '%[-AE]%',
@IncorrectCharLoc = PATINDEX(@Pattern, @String)

WHILE @IncorrectCharLoc > 0
SELECT @string = STUFF(@String, @IncorrectCharLoc, 1, ''),
@IncorrectCharLoc = PATINDEX(@Pattern, @String)
RETURN @string
END
GO




Now, we'll run some tests. Each function will be executed against the "dirty" table and the "clean" table while measuring performance using Profiler. Please, read the comments in the code... they explain how the test was done.


--==============================================================================
-- Now we'll do the tests. I have Profiler measuring SQL:Batch Completed
-- against the SPID that I'm running these tests from. Also notice that
-- I've taken the time to take the display the results out of the picture
-- by returning the result to a "throw away" variable (@BitBucket).
--==============================================================================
GO
--===== CleanWithStuff against "dirty"
DECLARE @BitBucket VARCHAR(8000);
SELECT @BitBucket = dbo.CleanWithStuff(SomeText)
FROM #Dirty;
GO
--===== CleanWithReplace against "dirty"
DECLARE @BitBucket VARCHAR(8000);
SELECT @BitBucket = dbo.CleanWithReplace(SomeText)
FROM #Dirty;
GO
--===== CleanWithStuff against "clean"
DECLARE @BitBucket VARCHAR(8000);
SELECT @BitBucket = dbo.CleanWithStuff(SomeText)
FROM #Clean;
GO
--===== CleanWithReplace against "clean"
DECLARE @BitBucket VARCHAR(8000);
SELECT @BitBucket = dbo.CleanWithReplace(SomeText)
FROM #Clean;
GO




Here's the output from the Profiler trace...


As you can see, the STUFF method is a whole lot faster even on just a paltry 100,000 rows. As another benefit, the REPLACE method takes about the same amount of time to wade through the rows whether there's anything to clean out or not. The STUFF method runs a lot faster because it will short-circuit out of the function if there's nothing to do.

And, that's just to check for 3 characters that we want to remove. There are 33 non-printable characters just in the basic ASCII character set. Those characters are CHAR(0) through CHAR(31) and CHAR(127).

Heh... I know what's coming next. A lot of people would justify using the REPLACE code by saying it will only be used on one variable at a time from a GUI and the performance is probably good enough for even 10,000 simultaneous "hits". Those people would be absolutely correct.

What those people don't understand is that it's a function, i.e. easily "Reusable Code", and there's nothing in the code to control who uses it or why. So, some developer given an assignment to create an ETL package that will handle hundreds of million row files a day runs across the "Reusable Code" and uses it. Considering that the REPLACE function takes 30 seconds (on my humble desktop) to handle only 100,000 rows, it'll take a good 5 minutes to handle just one million row file and it's hammering the CPU the whole time. I'm thinking that would cause a major problem in the not-so-unusual scenario I just gave.

That's what I originally meant by it not being so easy to do with REPLACE.

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Attachments
TraceOutputFromTest.gif (3.5K views, 5.00 KB)
Jeff Moden
Jeff Moden
SSC-Forever
SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)SSC-Forever (44K reputation)

Group: General Forum Members
Points: 44990 Visits: 39875
Grinja (2/5/2010)
Hi,
Is there an equivilent function to Excel's 'CLEAN' in SQL to remove non printable characters?.

Thanks,
G


I believe this will do it for you...

 CREATE FUNCTION dbo.RemoveNonDisplayChars
/********************************************************************
Purpose:
Remove the non-displayable control characters from CHAR(0) to
CHAR(31) and the DELETE character CHAR(127).

Revision History:
Rev 00 - Jeff Moden - 06 Feb 2010 - Initial Release and Unit Test
********************************************************************/
--===== Declare the I/O parameters
(@pString VARCHAR(8000))
RETURNS VARCHAR(8000)
AS
BEGIN

--===== Declare Local variables
DECLARE @IncorrectCharLoc SMALLINT, --Position of bad character
@Pattern CHAR(37) --Bad characters to look for

SELECT @Pattern = '%['
+ CHAR(0)+CHAR(1)+CHAR(2)+CHAR(3)+CHAR(4)
+ CHAR(5)+CHAR(6)+CHAR(7)+CHAR(8)+CHAR(9)
+ CHAR(10)+CHAR(11)+CHAR(12)+CHAR(13)+CHAR(14)
+ CHAR(15)+CHAR(16)+CHAR(17)+CHAR(18)+CHAR(19)
+ CHAR(20)+CHAR(21)+CHAR(22)+CHAR(23)+CHAR(24)
+ CHAR(25)+CHAR(26)+CHAR(27)+CHAR(28)+CHAR(29)
+ CHAR(30)+CHAR(31)+CHAR(127)
+ ']%',
@IncorrectCharLoc = PATINDEX(@Pattern, @pString)

WHILE @IncorrectCharLoc > 0
SELECT @pString = STUFF(@pString, @IncorrectCharLoc, 1, ''),
@IncorrectCharLoc = PATINDEX(@Pattern, @pString)
RETURN @pString
END
GO




--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
     Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
Ramesh Saive
Ramesh Saive
SSCrazy
SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)SSCrazy (2.6K reputation)

Group: General Forum Members
Points: 2564 Visits: 2643
Jeff Moden (2/6/2010)
...... I should have qualified my statement. Instead of saying...

"I believe you'll find it's a wee bit more difficult to do than just using REPLACE."

... I should have said...

"I believe you'll find it's a wee bit more difficult to do it in a high performance manner using just REPLACE.".......


I knew it was coming, Jeff.:-) I know you would always come up with more optimized solution than this.

And I agree with you on the reusable optimized code; there is really no harm in using optimized code though it is for a one time or one row execution.

Thank you Jeff, for taking time and letting me know few of the cases where a while loop beats a tally table.

I wonder how many more cases I would see where a while loop beats a tally table.:-)

--Ramesh


Paul White
Paul White
SSChampion
SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)SSChampion (10K reputation)

Group: General Forum Members
Points: 10338 Visits: 11350
Hey Jeff,

Scalar T-SQL functions?! WHILE loops? I am genuinely shocked! w00t

The REPLACE solution can be made to run an order of magnitude faster if we use an in-line table-valued function instead of an evil scalar function:

Create the in-line table-valued function

CREATE FUNCTION dbo.IF_CleanWithReplace
(
@SomeText VARCHAR(100)
)
RETURNS TABLE
WITH SCHEMABINDING
AS RETURN
SELECT cleaned =
REPLACE(
REPLACE(
REPLACE(
@SomeText COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS,
'A', SPACE(0)),
'E', SPACE(0)),
'-', SPACE(0));



Run the test

DECLARE @BitBucket VARCHAR(8000);
SELECT @BitBucket = iTVF.cleaned
FROM #Dirty D
CROSS
APPLY dbo.IF_CleanWithReplace(D.SomeText) iTVF;



Execution times (scalar function times in parentheses):

#Dirty: 501ms (13,234ms)

Paul

edit: code updated in-place to reflect Carl's excellent observation that the previous implementation was fast, but failed to return the correct results Hehe



Paul White
SQLPerformance.com
SQLblog.com
@SQL_Kiwi
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search