﻿<?xml version='1.0' encoding='UTF-8'?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/"><channel><title>SQLServerCentral / SQL Server 2005 / T-SQL (SS2K5)  / Remove non printable characters / Latest Posts</title><generator>InstantForum.NET v2.9.0</generator><description>SQLServerCentral</description><link>http://www.sqlservercentral.com/Forums/</link><webMaster>notifications@sqlservercentral.com</webMaster><lastBuildDate>Tue, 21 May 2013 05:48:39 GMT</lastBuildDate><ttl>20</ttl><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>This is what I have been looking for and forgive me if this is a simple question (I am trying to learn some of these concepts, like in-line table-valued functions), but how do you easily apply this to an entire table?  How do you use this, even?  It returns a table (not permanent), so I selected the result set into a new table, but I don't know if that defeats the point of using the TVF in the first place.This:select * into dbo.testtable from(SELECT t1.id as id, t1.title as title, clean.cleaned as cleaned from t1cross apply dbo.if_cleanwithreplace(t1.title) clean) afixes one column</description><pubDate>Sat, 18 May 2013 10:01:27 GMT</pubDate><dc:creator>ryan.blosser</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>This is exactly what i was looking for.  My data had ASCII(160) appended to it.  I just added that to the list and it worked very well.-MK</description><pubDate>Wed, 25 Jul 2012 11:17:55 GMT</pubDate><dc:creator>mkucera</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]telastruct (12/30/2010)[/b][hr]Thanks, guys. I apologize if my "actually" seemed snitty- I see how it could be taken that way after re-reading it. I guess I should have followed it up with the question: If the replace method "can clean the 13 million rows in about 25 minutes" (about 2 minutes for each million rows), how is that faster than 2 minutes for about 4 million rows? I'm not a DB expert, and am sure there's probably some other factor I'm missing...?Even if it is not as efficient, I do like the way the other solution is packaged- it seems like the replace method would be pretty easy to incorporate.Thank you again for your thoughts.[/quote]Understood and thanks for the feedback.The real key is the two different rowcounts on 2 different tables on 2 different machines by two different people using two different methods that haven't been fully posted.  For example, are they doing an UPDATE, piping to a new table, or displaying on the screen?  What's the status of fragmentation for both tables?  How many indexes are being updated?  Are the systems using the same version?  Are the systems using the same capacity pipe and hard drives?  What is the wait-state of the two systems?  Are they both "dead quiet" or is one supporting a world wide heavy hit Web site with lots of reporting going on?There's something else that folks need to understand... each system has a "tipping point" where UPDATEs are concerned.  For example... on one system it may take only 10 seconds to update a million rows and, likewise, only 20 seconds to update 2 million rows and only 30 seconds to update 3 million rows.  But, on that same system with the same table and index structures using the exact same query, it may suddenly take 4 hours to update only 4 million rows.  Everything else being the same, another system may be able to handle 6 million rows without reaching the "tipping point".If you want to do a comparison, it has to be on the same table and the same data or at least data with the same level of randomization.  You just can't tell what's what with all the other variables currenlty in play.I'll also admit that I've seen certain memory only scalar functions beat other methods even when they have a WHILE loop in it.  But, we won't actually know until at least one person actually tests both methods on the same data on the same machine.  I was hoping you'd be that person so I didn't actually have to be "the one" yet again.  :-P</description><pubDate>Fri, 31 Dec 2010 13:39:21 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Thanks, guys. I apologize if my "actually" seemed snitty- I see how it could be taken that way after re-reading it. I guess I should have followed it up with the question: If the replace method "can clean the 13 million rows in about 25 minutes" (about 2 minutes for each million rows), how is that faster than 2 minutes for about 4 million rows? I'm not a DB expert, and am sure there's probably some other factor I'm missing...?Even if it is not as efficient, I do like the way the other solution is packaged- it seems like the replace method would be pretty easy to incorporate.Thank you again for your thoughts.</description><pubDate>Thu, 30 Dec 2010 20:30:29 GMT</pubDate><dc:creator>telastruct</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]telastruct (12/30/2010)[/b][hr]Actually, I did test it and it works great. I tested it on a table with almost 4 million rows and it only took a minute or two.[/quote]As Paul said, test the nested-REPLACE method... you'll find that 4 million rows are process in just several seconds... ;-)  That's provided that you're not outputing the 4 million rows to the display which is the "great equalizer".  Lot's of people think their code runs well when they compared it to other code only by how long it takes to render on the screen.  Bad code and good code will frequently take the same time on-screen simply because they have the same number of rows to display.</description><pubDate>Thu, 30 Dec 2010 13:21:44 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]telastruct (12/30/2010)[/b][hr]Actually, I did test it and it works great. I tested it on a table with almost 4 million rows and it only took a minute or two.[/quote]Now try the 'nested REPLACE' code Jeff referred to.You should find it's much faster (and more efficient).Actually.</description><pubDate>Thu, 30 Dec 2010 12:44:09 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Actually, I did test it and it works great. I tested it on a table with almost 4 million rows and it only took a minute or two.</description><pubDate>Thu, 30 Dec 2010 09:16:22 GMT</pubDate><dc:creator>telastruct</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]telastruct (12/6/2010)[/b][hr]Here's another approach:[url]http://iso30-sql.blogspot.com/2010/10/remove-non-printable-unicode-characters.html[/url][/quote]Thanks for the link and welcome aboard!  Shifting gears, please be sure to test that code from that link for performance before you think of using it because it's likely a whole lot slower than the code already posted on this thread because of the scalar function with a While loop in it.  The "nested REPLACE" code posted on this thread will absolutely fly.</description><pubDate>Mon, 06 Dec 2010 15:41:23 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Here's another approach:[url]http://iso30-sql.blogspot.com/2010/10/remove-non-printable-unicode-characters.html[/url]</description><pubDate>Mon, 06 Dec 2010 10:52:07 GMT</pubDate><dc:creator>telastruct</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]carl.anderson-1037280 (4/6/2010)[/b][hr]I guess I have to read up on collations now![/quote]Worth doing, but the reason it is important here is that the sorting and comparison rules are very complex for Windows collations, much less so for SQL collations, and pretty simplistic for binary collations.  See [url]https://connect.microsoft.com/SQLServer/feedback/details/512459/replace-function-extremely-slow-with-non-binary-windows-collation[/url][quote]PS&amp;gt; I replaced the SPACE(0) with '' - it didn't seem to change performance.  What's the idea behind SPACE(0)??[/quote]I just find SPACE(0) more explicit and easier to read than ''.The difference between '' and ' ' is not always easy to spot in scripts, whereas SPACE(0) is clearly not the same as SPACE(1) :-)</description><pubDate>Tue, 06 Apr 2010 15:41:44 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Thank you, Carl.As far as the actual requirement is concerned, concatenating RTF fragements (I'm not even going to ask why they are like that to start with) might be best done outside the database.  RTF processing solutions I have seen before (admittedly converting from RTF to another format) used SSIS to bulk-read the data from SQL Server, and a C# or VB.NET component called from a Script Task to do the hard work.  There is some kind of RTF support in the standard library IIRC, and plenty of code samples out there.One notable advantage of this approach is that an SSIS package could take advantage of parallelism, and involve more than one machine in the effort.Just some ideas.Paul</description><pubDate>Tue, 06 Apr 2010 15:35:27 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Excellent suggestions Paul!I changed the argument varchars from MAX to 1200 and that improved performance marginally ~5%.  I switched the collation type and selected the top 60,000 again but found it still took between 30 and 35 seconds.  However, when I selected INTO a temp table I saw that performance had jumped by a factor of 4!  I went from 26 seconds to just 6!This means I can clean the 13 million rows in about 25 minutes now.  Ideally I would like to cut this in half, but it's still a big win.  I guess I have to read up on collations now!CarlHere's the updated code for anyone interested:[code="sql"]CREATE  FUNCTION dbo.IF_CleanWithReplace        (        @SomeText VARCHAR(1200)        )RETURNS TABLEWITH    SCHEMABINDINGAS      RETURNSELECT    cleaned =     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(        @SomeText COLLATE LATIN1_GENERAL_BIN,    char(0),''), char(1),''), char(2),''), char(3),''), char(4),''),    char(5),''), char(6),''), char(7),''), char(8),''), char(9),''),    char(10),''), char(11),''), char(12),''), char(13),''), char(14),''),    char(15),''), char(16),''), char(17),''), char(18),''), char(19),''),    char(20),''), char(21),''), char(22),''), char(23),''), char(24),''),    char(25),''), char(26),''), char(27),''), char(28),''), char(29),''),    char(128),'');GO[/code]PS&amp;gt; I replaced the SPACE(0) with '' - it didn't seem to change performance.  What's the idea behind SPACE(0)??</description><pubDate>Tue, 06 Apr 2010 15:24:01 GMT</pubDate><dc:creator>carl.anderson-1037280</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Carl,Two suggestions: 1.  Define @SomeText AS VARCHAR(8000) instead of VARCHAR(MAX)2.  Use the LATIN1_GENERAL_BIN collation - and only on @SomeText, not the replace chars.VARCHAR(MAX) variables require a tempdb worktable, whereas VARCHAR(8000) does not.A binary collation is even faster than a SQL collation, and safe to use with your replacement characters, since you are using code point values.Paul</description><pubDate>Tue, 06 Apr 2010 14:29:57 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Hi Paul,thanks for your replies and corrections.  I actually put together a version of your example that nests 31 REPLACE functions and then tested it against Jeff's while loop.  As it turns out, they are closer performance-wise than you think.I'm testing on real-life medical data, so I can't share it.  But I can tell you that it's varchar(1200) columns containing RTF with very few actual unprintable characters per row.  I'm stripping unprintables so I can use a FOR XML query to concatenate RTF fragments into whole documents; unprintable characters are apparently not allowed.I'm selecting the top 60,000 from a table of 13,000,000 rows and it takes about 30 seconds for the nested REPLACE function to complete, whereas it takes about 40 seconds for Jeff's WHILE loop function.  Interestingly enough, when I select INTO a temp table, either solution completes in almost exactly the same amount of time, or about 26 seconds each.By selecting into a temp table I can clean the RTF for the full table in about an hour.  Is this the best we can do???  Maybe I'm missing a query option or something???Thanks again,CarlHere's my code for your reference:[code="sql"]CREATE  FUNCTION dbo.IF_CleanWithReplace        (        @SomeText VARCHAR(MAX)        )RETURNS TABLEWITH    SCHEMABINDINGAS      RETURNSELECT    cleaned =     REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(        @SomeText COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS,    char(0) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(1) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(2) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(3) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(4) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(5) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(6) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(7) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(8) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(9) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(10) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(11) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(12) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(13) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(14) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(15) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(16) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(17) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(18) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(19) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(20) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(21) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(22) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(23) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(24) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(25) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(26) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(27) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(28) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(29) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0)),    char(128) COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS, SPACE(0));GOselect top 60000 clean.* from dev.Y_RTF_NOTE_TEXT cross apply dbo.IF_CleanWithReplace(NOTE_RICH_TEXT) as clean[/code]</description><pubDate>Tue, 06 Apr 2010 14:11:39 GMT</pubDate><dc:creator>carl.anderson-1037280</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]karthikeyan-444867 (4/5/2010)[/b][hr][quote]The REPLACE solution can be made to run an [i]order of magnitude [/i]faster if we use an in-line table-valued function instead of an evil scalar function:[/quote]I am not getting the exact meaning of this statement.  what do you mean by [i]order of magnitude[/i]?[/quote]Order of magnitude = 10 times faster.[quote]you mean to say even though if we use REPLACE function in an in-line table valued function the performance will be good. Am i correct?[/quote]Yes.  Test it, and examine the query plan to understand it fully.</description><pubDate>Mon, 05 Apr 2010 04:32:41 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote]The REPLACE solution can be made to run an [i]order of magnitude [/i]faster if we use an in-line table-valued function instead of an evil scalar function:[/quote]I am not getting the exact meaning of this statement.what do you mean by [i]order of magnitude[/i] ?you mean to say even though if we use REPLACE function in an in-line table valued function the performance will be good. Am i correct?</description><pubDate>Mon, 05 Apr 2010 03:58:43 GMT</pubDate><dc:creator>karthik M</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]Paul White NZ (4/2/2010)[/b][hr]Believe me (and Jeff will attest to this) when I say I am very keen on appropriate use of SQLCLR integration.  Unfortunately, an in-line function will out-perform a CLR scalar function every time in this sort of task.  The difference can be made small (and CLR is much faster than a T-SQL scalar function) but never faster than the fully in-lined solution.Paul[/quote]Heh... attest and certify.  The cool part is that you're not one of those folks that thinks SQLCLR is a panacea and you use it wisely.</description><pubDate>Fri, 02 Apr 2010 18:34:34 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]carl.anderson-1037280 (4/2/2010)[/b][hr]I see.  So if you want to remove the full set of 37 unprintable characters, you would have to create 37 levels of nesting with REPLACE.  I guess the computer scientist in me is concerned about 37 full string scans per value when Jeff's code only does one.[/quote]REPLACE is pretty fast, especially when a binary or SQL collation is used.[quote]However, Jeff also uses the PATINDEX function and double wildcards, so I guess it says something about the performance of PATINDEX if the nested REPLACE functions are still faster.Seems like there should be a better way!  CLR maybe?  Maybe a better question is, why is a deeply nested REPLACE so much faster than Jeff's solution[/quote]The answer is that optimizing the replacement operation is much less important than optimizing the overall query plan.  Let me explain that...The nested REPLACE uses in in-line table-valued function, which is logically just a parameterized view.  The definition of the 'function' is expanded in-line into the query plan of the calling statement before SQL Server optimization occurs.  This is the reason that the whole operation can be performed with a single scan and a Compute Scalar.Scalar T-SQL functions cannot be in-lined in this way, and are invoked once per row using a relatively slow interface.  There overheads are such as to completely dominate the overall cost.A CLR scalar function uses a much faster invocation path, but the overhead of passing the values row-by-row to and from the hosted environment again dominate the overall cost.Believe me (and Jeff will attest to this) when I say I am very keen on appropriate use of SQLCLR integration.  Unfortunately, an in-line function will out-perform a CLR scalar function every time in this sort of task.  The difference can be made small (and CLR is much faster than a T-SQL scalar function) but never faster than the fully in-lined solution.Paul</description><pubDate>Fri, 02 Apr 2010 09:05:51 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>I see.  So if you want to remove the full set of 37 unprintable characters, you would have to create 37 levels of nesting with REPLACE.  I guess the computer scientist in me is concerned about 37 full string scans per value when Jeff's code only does one.  However, Jeff also uses the PATINDEX function and double wildcards, so I guess it says something about the performance of PATINDEX if the nested REPLACE functions are still faster.Seems like there should be a better way!  CLR maybe?  Maybe a better question is, why is a deeply nested REPLACE so much faster than Jeff's solution?</description><pubDate>Fri, 02 Apr 2010 08:32:39 GMT</pubDate><dc:creator>carl.anderson-1037280</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Ok, I have fixed the code in the original post to avoid anyone else using it by mistake.The new version uses a nested REPLACE:[code="sql"]GOCREATE  FUNCTION dbo.IF_CleanWithReplace        (        @SomeText VARCHAR(100)        )RETURNS TABLEWITH    SCHEMABINDINGAS      RETURN        SELECT  cleaned =                     REPLACE(                        REPLACE(                            REPLACE(                                @SomeText COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS,                                'A', SPACE(0)),                            'E', SPACE(0)),                        '-', SPACE(0));GO[/code]Thanks again, Carl.</description><pubDate>Fri, 02 Apr 2010 04:34:24 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]carl.anderson-1037280 (4/1/2010)[/b][hr]when I run your example code, I get a separate resulting row for every character I'm trying to exclude.  I don't see the same behavior in Jeff's code.  Are you sure these are equivalent solutions, or am I doing something wrong?  It looks like each individual row returned by your procedure has removed one of the different excluded characters.[/quote]Hey Carl,Thanks so much for posting here - it sent me an email notification!  I have been trying to track this thread down for a few weeks now, since I first realised that the in-line TVF is not equivalent after all - the trick with the multiple-variable assignment is the thing.I am busy with something else just at the moment, but will correct my posts and submit a revised version in the next few days.Thanks again!Paul</description><pubDate>Fri, 02 Apr 2010 00:53:40 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Paul,when I run your example code, I get a separate resulting row for every character I'm trying to exclude.  I don't see the same behavior in Jeff's code.  Are you sure these are equivalent solutions, or am I doing something wrong?It looks like each individual row returned by your procedure has removed one of the different excluded characters.Thanks,Carl</description><pubDate>Thu, 01 Apr 2010 11:53:03 GMT</pubDate><dc:creator>carl.anderson-1037280</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]Grinja (3/15/2010)[/b][hr]Hi all,I've been away on leave and have only recently had time to catch up.Thanks for all the code samples you were extremely thorough :w00t:[/quote]Thank you for the interesting question.</description><pubDate>Mon, 15 Mar 2010 07:23:45 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Hi all,I've been away on leave and have only recently had time to catch up.Thanks for all the code samples you were extremely thorough :w00t:</description><pubDate>Mon, 15 Mar 2010 07:03:01 GMT</pubDate><dc:creator>Grinja</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]Jeff Moden (2/26/2010)[/b][hr]Me too!  I guess I've cut way too far back on the coffee and pork chops. Cool code, Paul.  Thanks.[/quote]Not quite so cool - it doesn't work!  (Thanks Carl)</description><pubDate>Fri, 26 Feb 2010 22:55:30 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]Ramesh Saive (2/8/2010)[/b][hr][quote][b]Jeff Moden (2/6/2010)[/b][hr]......  I should have qualified my statement.  Instead of saying...[i]"I believe you'll find it's a wee bit more difficult to do than just using REPLACE."[/i] ... I should have said...[i]"I believe you'll find it's a wee bit more difficult to do it in a high performance manner using just REPLACE."[/i].......[/quote]I knew it was coming, Jeff.:-)  I know you would always come up with more optimized solution than this.  And I agree with you on the reusable optimized code; there is really no harm in using optimized code though it is for a one time or one row execution.Thank you Jeff, for taking time and letting me know few of the cases where a while loop beats a tally table.I wonder how many more cases I would see where a while loop beats a tally table.:-)[/quote]Heh... guess I'm shootin' blanks, lately.  Paul White's code above is much faster.  Like he said, "an order of magnitude faster".  Thanks for the compliment anyway, Ramesh.</description><pubDate>Fri, 26 Feb 2010 22:50:43 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]Paul White (2/26/2010)[/b][hr]Hey Jeff,Scalar T-SQL functions?!  WHILE loops?  I am genuinely shocked! :w00t:[/quote]Me too!  I guess I've cut way too far back on the coffee and pork chops. Cool code, Paul.  Thanks.</description><pubDate>Fri, 26 Feb 2010 22:46:22 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>--- Duff code removed ---</description><pubDate>Fri, 26 Feb 2010 21:48:02 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Hey Jeff,Scalar T-SQL functions?!  WHILE loops?  I am genuinely shocked! :w00t:The REPLACE solution can be made to run [i]an order of magnitude faster[/i] if we use an in-line table-valued function instead of an evil scalar function:[i]Create the in-line table-valued function[/i][code]CREATE  FUNCTION dbo.IF_CleanWithReplace        (        @SomeText VARCHAR(100)        )RETURNS TABLEWITH    SCHEMABINDINGAS      RETURN        SELECT  cleaned =                     REPLACE(                        REPLACE(                            REPLACE(                                @SomeText COLLATE SQL_LATIN1_GENERAL_CP1_CI_AS,                                'A', SPACE(0)),                            'E', SPACE(0)),                        '-', SPACE(0));[/code][i]Run the test[/i][code]DECLARE @BitBucket VARCHAR(8000);SELECT  @BitBucket = iTVF.cleanedFROM    #Dirty DCROSSAPPLY   dbo.IF_CleanWithReplace(D.SomeText) iTVF;[/code]Execution times (scalar function times in parentheses):#Dirty: 501ms (13,234ms)Pauledit:  code updated in-place to reflect Carl's excellent observation that the previous implementation was fast, but failed to return the correct results :hehe:</description><pubDate>Fri, 26 Feb 2010 21:40:26 GMT</pubDate><dc:creator>Paul White</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]Jeff Moden (2/6/2010)[/b][hr]......  I should have qualified my statement.  Instead of saying...[i]"I believe you'll find it's a wee bit more difficult to do than just using REPLACE."[/i] ... I should have said...[i]"I believe you'll find it's a wee bit more difficult to do it in a high performance manner using just REPLACE."[/i].......[/quote]I knew it was coming, Jeff.:-)  I know you would always come up with more optimized solution than this.  And I agree with you on the reusable optimized code; there is really no harm in using optimized code though it is for a one time or one row execution.Thank you Jeff, for taking time and letting me know few of the cases where a while loop beats a tally table.I wonder how many more cases I would see where a while loop beats a tally table.:-)</description><pubDate>Mon, 08 Feb 2010 01:22:48 GMT</pubDate><dc:creator>Ramesh Saive</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]Grinja (2/5/2010)[/b][hr]Hi,Is there an equivilent function to Excel's 'CLEAN' in SQL to remove non printable characters?.Thanks,G[/quote]I believe this will do it for you...[code="sql"] CREATE FUNCTION dbo.RemoveNonDisplayChars/******************************************************************** Purpose: Remove the non-displayable control characters from CHAR(0) to  CHAR(31) and the DELETE character CHAR(127). Revision History: Rev 00 - Jeff Moden - 06 Feb 2010 - Initial Release and Unit Test********************************************************************/--===== Declare the I/O parameters        (@pString VARCHAR(8000))RETURNS VARCHAR(8000)      AS  BEGIN--===== Declare Local variablesDECLARE @IncorrectCharLoc SMALLINT, --Position of bad character        @Pattern          CHAR(37)  --Bad characters to look for SELECT @Pattern          = '%['                          + CHAR(0)+CHAR(1)+CHAR(2)+CHAR(3)+CHAR(4)                          + CHAR(5)+CHAR(6)+CHAR(7)+CHAR(8)+CHAR(9)                          + CHAR(10)+CHAR(11)+CHAR(12)+CHAR(13)+CHAR(14)                          + CHAR(15)+CHAR(16)+CHAR(17)+CHAR(18)+CHAR(19)                          + CHAR(20)+CHAR(21)+CHAR(22)+CHAR(23)+CHAR(24)                          + CHAR(25)+CHAR(26)+CHAR(27)+CHAR(28)+CHAR(29)                          + CHAR(30)+CHAR(31)+CHAR(127)                          + ']%',        @IncorrectCharLoc = PATINDEX(@Pattern, @pString)  WHILE @IncorrectCharLoc &amp;gt; 0 SELECT @pString          = STUFF(@pString, @IncorrectCharLoc, 1, ''),        @IncorrectCharLoc = PATINDEX(@Pattern, @pString) RETURN @pString    ENDGO[/code]</description><pubDate>Sat, 06 Feb 2010 12:15:48 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>My apologies, Ramesh.  I should have qualified my statement.  Instead of saying...[i]"I believe you'll find it's a wee bit more difficult to do than just using REPLACE."[/i] ... I should have said...[i]"I believe you'll find it's a wee bit more difficult to do it in a high performance manner using just REPLACE."[/i]Let's see what I mean.  First, we need two test tables... "#Dirty" will contain some characters that we need to remove and "#Clean" will have no characters to be removed.[code="sql"]--==============================================================================--      Setup the test tables--==============================================================================--===== Conditionally remove the test tables so we can easily retest     IF OBJECT_ID('TempDB..#Dirty','U') IS NOT NULL        DROP TABLE #Dirty;     IF OBJECT_ID('TempDB..#Clean','U') IS NOT NULL        DROP TABLE #Clean;GO--===== Build the "dirty" table where each row has     -- at least 4 characters to remove SELECT TOP 100000        CAST(NEWID() AS VARCHAR(100)) AS SomeText   INTO #Dirty   FROM Master.sys.All_Columns ac1,        Master.sys.All_Columns ac2;--===== Build the "clean" table where each row has     -- NO characters to remove SELECT TOP 100000        REPLICATE('B',26) AS SomeText   INTO #Clean   FROM Master.sys.All_Columns ac1,        Master.sys.All_Columns ac2;GO[/code]Now, let's put your code into a function and some slightly different code.  Please... no one faint and no one write to their Congressman... I'm going to use a WHILE loop for this one because my tests show that this is one of the few places where a WHILE loop will beat a Tally table. :-P  [code="sql"]--==============================================================================--      Setup the functions to test--==============================================================================--===== Do these tests in a nice safe place that everyone has.    USE TempDB;GO--===== Build a function using "Replace" CREATE FUNCTION dbo.CleanWithReplace        (@SomeText VARCHAR(100))RETURNS VARCHAR(100)     AS  BEGIN        DECLARE @Characters TABLE(SomeChar CHAR(1) NOT NULL)         INSERT  @Characters( SomeChar )         SELECT  'A' UNION ALL         SELECT  'E' UNION ALL         SELECT  '-'         SELECT @SomeText = REPLACE( @SomeText, SomeChar, '' )           FROM @Characters RETURN @SomeText    END;GO--===== Build a function using "Stuff" CREATE FUNCTION dbo.RemoveNonPrintable     -- Modified by Jeff Moden        (@String VARCHAR(8000))RETURNS VARCHAR(8000) AS  BEGINDECLARE @IncorrectCharLoc SMALLINT,        @Pattern          CHAR(7) SELECT @Pattern = '%[-AE]%',        @IncorrectCharLoc = PATINDEX(@Pattern, @String)  WHILE @IncorrectCharLoc &amp;gt; 0 SELECT @string = STUFF(@String, @IncorrectCharLoc, 1, ''),        @IncorrectCharLoc = PATINDEX(@Pattern, @String) RETURN @string    ENDGO[/code]Now, we'll run some tests.  Each function will be executed against the "dirty" table and the "clean" table while measuring performance using Profiler.  Please, read the comments in the code... they explain how the test was done.[code="sql"]--==============================================================================--      Now we'll do the tests.  I have Profiler measuring SQL:Batch Completed--      against the SPID that I'm running these tests from.  Also notice that--      I've taken the time to take the display the results out of the picture --      by returning the result to a "throw away" variable (@BitBucket).--==============================================================================GO--===== CleanWithStuff against "dirty"DECLARE @BitBucket VARCHAR(8000); SELECT @BitBucket = dbo.CleanWithStuff(SomeText)   FROM #Dirty;GO--===== CleanWithReplace against "dirty"DECLARE @BitBucket VARCHAR(8000); SELECT @BitBucket = dbo.CleanWithReplace(SomeText)   FROM #Dirty;GO--===== CleanWithStuff against "clean"DECLARE @BitBucket VARCHAR(8000); SELECT @BitBucket = dbo.CleanWithStuff(SomeText)   FROM #Clean;GO--===== CleanWithReplace against "clean"DECLARE @BitBucket VARCHAR(8000); SELECT @BitBucket = dbo.CleanWithReplace(SomeText)   FROM #Clean;GO[/code]Here's the output from the Profiler trace...[img]http://www.sqlservercentral.com/Forums/Attachment5144.aspx[/img]As you can see, the STUFF method is a whole lot faster even on just a paltry 100,000 rows.  As another benefit, the REPLACE method takes about the same amount of time to wade through the rows whether there's anything to clean out or not.  The STUFF method runs a lot faster because it will short-circuit out of the function if there's nothing to do.And, that's just to check for 3 characters that we want to remove.  There are 33 non-printable characters just in the basic ASCII character set.  Those characters are CHAR(0) through CHAR(31) and CHAR(127).Heh... I know what's coming next.  A lot of people would justify using the REPLACE code by saying it will only be used on one variable at a time from a GUI and the performance is probably good enough for even 10,000 simultaneous "hits".  Those people would be absolutely correct.What those people don't understand is that it's a function, i.e. easily "Reusable Code", and there's nothing in the code to control who uses it or why.  So, some developer given an assignment to create an ETL package that will handle hundreds of million row files a day runs across the "Reusable Code" and uses it.  Considering that the REPLACE function takes 30 seconds (on my humble desktop) to handle only 100,000 rows, it'll take a good 5 minutes to handle just one million row file and it's hammering the CPU the whole time.  I'm thinking that would cause a major problem in the not-so-unusual scenario I just gave.That's what I originally meant by it not being so easy to do with REPLACE.</description><pubDate>Sat, 06 Feb 2010 11:47:59 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]Jeff Moden (2/5/2010)[/b][hr][quote][b]Ramesh Saive (2/5/2010)[/b][hr]I don't think so there is any equivalent function in SQL Server.  But you can create you own function using the existing REPLACE function.[/quote]Heh.. Let's see some code.  ;-)  I believe you'll find it's a wee bit more difficult to do than just using REPLACE.[/quote]Ah..., Here is the code[code]DECLARE @Characters TABLE( SomeChar CHAR(1) NOT NULL )DECLARE @SomeText VARCHAR(100)SELECT	@SomeText = 'Ticking away the moments that make up a dull day, Fritter and waste the hours in an offhand way.'INSERT	@Characters( SomeChar )SELECT	't' UNION ALLSELECT	'i' UNION ALLSELECT	'c'SELECT	@SomeText = REPLACE( @SomeText, SomeChar, '' )FROM	@CharactersPRINT @SomeText[/code]</description><pubDate>Sat, 06 Feb 2010 01:20:21 GMT</pubDate><dc:creator>Ramesh Saive</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]Grinja (2/5/2010)[/b][hr]That could work ... thanks for the suggestion![/quote]Which unprintable characters?  Just any character less than CHAR(32)???</description><pubDate>Fri, 05 Feb 2010 21:32:41 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>[quote][b]Ramesh Saive (2/5/2010)[/b][hr]I don't think so there is any equivalent function in SQL Server.  But you can create you own function using the existing REPLACE function.[/quote]Heh.. Let's see some code.  ;-)  I believe you'll find it's a wee bit more difficult to do than just using REPLACE.</description><pubDate>Fri, 05 Feb 2010 21:31:12 GMT</pubDate><dc:creator>Jeff Moden</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>That could work ... thanks for the suggestion!</description><pubDate>Fri, 05 Feb 2010 04:54:19 GMT</pubDate><dc:creator>Grinja</dc:creator></item><item><title>RE: Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>I don't think so there is any equivalent function in SQL Server.  But you can create you own function using the existing REPLACE function.</description><pubDate>Fri, 05 Feb 2010 04:46:29 GMT</pubDate><dc:creator>Ramesh Saive</dc:creator></item><item><title>Remove non printable characters</title><link>http://www.sqlservercentral.com/Forums/Topic860321-338-1.aspx</link><description>Hi,Is there an equivilent function to Excel's 'CLEAN' in SQL to remove non printable characters?.Thanks,G</description><pubDate>Fri, 05 Feb 2010 04:22:04 GMT</pubDate><dc:creator>Grinja</dc:creator></item></channel></rss>