Request code review for UDF: ContainsControlChar and CleanString

  • Hi,

    I have a lot of dirty data in my upstream data sources.  I've written two UDFs to assist in my work:

    • ContainsControlChar:  Used in a where clause to return rows containing control characters
    • CleanString:  Used to remove various Posix character classes from dirty (or non-dirty) data

    They seem to be working well, but I'm wondering if I'm missing any edge cases (collation? nulls?), if they are as performant as possible (I've found PATINDEX to perform better than a character by character loop), etc.

    For example, is one better than the other wrt performance? 

    RETURN IIF(PATINDEX(CONCAT('%[',CHAR(0),'-',CHAR(31),CHAR(127),']%'),ISNULL(@pString,'') COLLATE Latin1_General_BIN2) = 0,0,1);

    vs.

    RETURN IIF(PATINDEX(CONCAT('%[',CHAR(0),'-',CHAR(31),CHAR(127),']%'),@pString COLLATE Latin1_General_BIN2) = 0 OR @pString IS NULL,0,1);

    Any input welcome, even as simple as suggested name changes.

    If you find them useful obviously feel free to use/modify.

  • I'll take a closer look over the weekend (as I'm sure some of the heavy hitters on this site also will) but the fact that these are scalar functions and the code within is RBAR, I suspect that we can improve performance quite a bit.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Yes, converting either of them to an iTVF will almost certainly be vastly better than using a scalar function. As to which is better, I'd suspect the second but testing them should show if there is any real difference.

  • Jeff Moden - Friday, November 30, 2018 8:15 AM

    I'll take a closer look over the weekend (as I'm sure some of the heavy hitters on this site also will) but the fact that these are scalar functions and the code within is RBAR, I suspect that we can improve performance quite a bit.

    I smell blood:exclamationmark:
    😎

  • Eirikur Eiriksson - Saturday, December 1, 2018 12:34 PM

    Jeff Moden - Friday, November 30, 2018 8:15 AM

    I'll take a closer look over the weekend (as I'm sure some of the heavy hitters on this site also will) but the fact that these are scalar functions and the code within is RBAR, I suspect that we can improve performance quite a bit.

    I smell blood:exclamationmark:
    😎

    Nah... some strong coffee, maybe. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Scott - Looking at the CleanString function... You currently have a "DELETE" option that would allow you output the "bad" characters... How important is it that you keep that functionality?
    The reason I ask: There are a few ways to strip out characters in this manner and one of the fastest involves breaking the sting down with a tally table, filter out the "bad rows" and then reassembling the remaining character rows using FOR XML PATH(''). The problem is that the normal method that uses "FOR XML PATH(''), TYPE).value('./text()[1]', 'nvarchar(4000)')" doesn't like first first 30 or so ASCII characters. 
    BASE64 works but that's a whole other con-o-worms. 

    I you DO need to be able to output those characters... Hopefully someone like Eirikur knows an alternate XML syntax... If not, the options fall back to using looping. Ideally you'd want the function to be "inline" and the only inline-able loop is a recursive cte. Given the performance of rCTE's, that's not an ideal option.

    In any case, I just wanted to see which requirements are "absolute must haves" and which ones (if any) are just "nice to haves". 

  • Here is one suggestion, not performance tested yet.
    😎


    CREATE OR ALTER FUNCTION dbo.ITVFN_CONTAINS_CTRL_CHAR
    (
      @INPUTSTR VARCHAR(8000)
    )
    RETURNS TABLE
    WITH SCHEMABINDING
    AS
    RETURN
      SELECT CTRLFLG = SIGN(PATINDEX(0x255B002D1F7F5D25,@INPUTSTR COLLATE Latin1_General_BIN2));

  • Hi All,

    Thanks for the replies so far.  TBH, I kind of expected "looks pretty good" or "it sucks, here's why".  Which I guess is Jeff's reply, except much more polite 😉

    Google "sql server remove null character from string", or something similar.  You'll get hundreds of hits (ok, 3.8M+ hits, although I have not read them all) with suggestions similar to my approach (either loop character by character or use PATINDEX).

    So, is this topic worth an article (as well as a reply to my post)?  This seems to be a common issue, and it would be great if Google returned an answer near the top of the hit list illustrating best practice. 

    @jeff, I'm familiar with your Tally Table/8KStringSpitter article, and @Eirikur, IIRC I believe you suggested some enhancements?  So, perhaps an article similar to those?

    @jason, we are a combined SAS + SQL Server shop.  Our data is stored in SQL Server, and we use SAS for a lot of our ETL and all of our analytics.  For various reasons, I am a proponent of doing as much as possible in the database, mainly for performance and code reusability.  Pulling the data from SQL Server to SAS, having SAS do the transformation, and pushing the data back to SQL Server, works, but does not perform well.  SAS has a function COMPRESS which will "compress" (remove) characters from a string.  I patterned my CleanString function after SAS' compress function as much as possible, although in many cases it's overkill for what I would normally need.  The functionality I would need most of the time is to delete control characters, leaving the remaining characters in the string.

    I've spent a fair bit of time trying to wrap my head around the repercussions of having null (0x00) characters in a string.  I've attached my test script, see embedded comments.  Hopefully I've covered all the use cases?

    Null characters are particularly problematic when I use SSIS, since it causes data truncation.  I also have to cope with the scenario where I have to copy bad data from Server A to Server B via SSIS, where I don't have any write access to Server A (so can't create an ITVF, view, SP, etc).  In that case, the only thing I can think of is:

    SELECT REPLACE(foo,CHAR(0),'') as foo, num1, REPLACE(bar,CHAR(0),'') as bar, num2 FROM SomeTable

    to stage the data onto "my" server, where I can then use my function, ITVF, etc. against the staged data to clean any remaining control characters (often tabs).

    I can use a code generator to help build the SELECT statement:

    DECLARE @Cursor CURSOR;
    DECLARE @Table_Schema SYSNAME = 'dbo'
       ,@Table_Name SYSNAME = 'EPISODE' -- <<<<<
       ,@Column_Name SYSNAME
       ,@Column_Type SYSNAME
       ,@sql NVARCHAR(MAX) = ''
       ,@line NVARCHAR(256);

    SET @Cursor = CURSOR FOR SELECT COLUMN_NAME
               ,DATA_TYPE
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_SCHEMA = @Table_Schema
       AND TABLE_NAME = @Table_Name
       AND COLUMN_NAME NOT IN ('PK')
    ORDER BY ORDINAL_POSITION;

    OPEN @Cursor;
    FETCH NEXT FROM @Cursor INTO @Column_Name,@Column_Type;
    WHILE @@fetch_status = 0
    BEGIN
    SET @line = REPLICATE(' ',256);
    IF @Column_Type LIKE '%char'
    BEGIN
      SET @line = STUFF(@line,7,50,CONCAT(',dbo.fn_CleanString(',QUOTENAME(@Column_Name,'['),',:CNTRL:,''DELETE STRIP'')'));
      SET @line = STUFF(@line,93,50,CONCAT('AS ',QUOTENAME(@Column_Name,'[')));
    END
    ELSE
    BEGIN
      SET @line = STUFF(@line,7,50,CONCAT(',',QUOTENAME(@Column_Name,'[')));
      SET @line = STUFF(@line,93,50,CONCAT('AS ',QUOTENAME(@Column_Name,'[')));
    END
    SET @sql += @line + CHAR(13);
    FETCH NEXT FROM @Cursor INTO @Column_Name,@Column_Type;
    END
    CLOSE @Cursor;
    DEALLOCATE @Cursor;

    SET @sql = STUFF(@sql,1,7,'SELECT ');
    SET @sql =
    'CREATE OR ALTER VIEW [cln].vw{TgtTable}
    AS
    ' + @sql +
    'FROM {SrcTable};
    GO';

    SET @sql = REPLACE(@sql,'{SrcTable}',CONCAT(@Table_Schema,'.',@Table_Name));
    SET @sql = REPLACE(@sql,'{TgtTable}',@Table_Name);

    SELECT CAST('<![CDATA[' + @sql + ']]>' AS XML)

    What would be ideal is an approach where I could:

    • Clean control characters from character columns of a given table (since I don't know a priory which columns have bad data, I have to clean them all, regardless of the performance hit)

    • Return that data using the original column names

    • And be able to do that on a server where I don't have rights to create views, functions, etc.

    • I'd be happy with either a SSIS (C# script?  BIML?) solution, or T-SQL solution.  My guess is T-SQL would perform better, and would be my preferred approach, as it would have applicability outside of SSIS.

    • I could do this with dynamic SQL, but that may not be the best approach?

    • But again, this is ideal.  Any improvements to what I've posted is most welcome.
    And yeah, this begs the question, why doesn't my upstream data provider (a different department in my government agency) clean their data before loading it?  I'll bite my tongue and simply say sometimes you have to work with what you've got, and try to do the best job possible with the data you're given.

    Thanks again for the replies so far.

  • Eirikur Eiriksson - Sunday, December 2, 2018 2:35 AM

    Here is one suggestion, not performance tested yet.
    😎


    CREATE OR ALTER FUNCTION dbo.ITVFN_CONTAINS_CTRL_CHAR
    (
      @INPUTSTR VARCHAR(8000)
    )
    RETURNS TABLE
    WITH SCHEMABINDING
    AS
    RETURN
      SELECT CTRLFLG = SIGN(PATINDEX(0x255B002D1F7F5D25,@INPUTSTR COLLATE Latin1_General_BIN2));

    I used this to derive my PATINDEX:

    DECLARE @s-2 VARCHAR(100) = ''
       ,@i INT = 0
    WHILE @i < 31
    BEGIN
    SET @s-2 += CHAR(@i)
    SET @i += 1
    END
    SET @s-2 += CHAR(127)

    SELECT @s-2, CAST(@s AS VARBINARY(100))

    But that didn't work :-/.  But yours does 🙂  Can you explain how you derived your patindex?

    And to invoke???

    SELECT b.*
    FROM tmp.baddata b
    CROSS APPLY dbo.ITVFN_CONTAINS_CTRL_CHAR(c)
    WHERE CTRLFLG = 1

    While (IMO) not as syntactically clear as the scalar function, if this performs better then I'll certainly use it, with lots of instructions in the function header in case colleagues want to use this (they aren't as familiar with CROSS APPLY).

    Thanks...

  • Scott In Sydney - Sunday, December 2, 2018 5:08 PM

    I used this to derive my PATINDEX:

    DECLARE @s-2 VARCHAR(100) = ''
       ,@i INT = 0
    WHILE @i < 31
    BEGIN
    SET @s-2 += CHAR(@i)
    SET @i += 1
    END
    SET @s-2 += CHAR(127)

    SELECT @s-2, CAST(@s AS VARBINARY(100))

    But that didn't work :-/.  But yours does 🙂  Can you explain how you derived your patindex?

    Never mind...I left off the wildcards.  Either of the below work:

    DECLARE @s-2 VARCHAR(100) = ''
       ,@i INT = 0
    WHILE @i <= 31
    BEGIN
    SET @s-2 += CHAR(@i)
    SET @i += 1
    END
    SET @s-2 += CHAR(127)
    SET @s-2 = CONCAT('%[',@s,']%')

    SELECT @s-2, CAST(@s AS VARBINARY(100))

    SELECT CAST(CONCAT('%[',CHAR(0),'-',CHAR(31),CHAR(127),']%') AS VARBINARY(100))

  • Scott In Sydney - Sunday, December 2, 2018 4:17 PM

    ...  The functionality I would need most of the time is to delete control characters, leaving the remaining characters in the string. ...

    Let's start with this and work our way out... If the primary objective, of the CleanString function, is to simply expunge a series of bad characters, that can be done fairly efficiently.
    Also keep in mind that, as a general rule, the more specialized a function is, the more efficient will be. So, if you need this base functionality 905% of the time, just create a separate (less efficient) function for that 5% fringe but don't cripple the 95% to accommodate the odd-ball 5%.

    In any case, the following should be a good start. At the very least it'll serve as a target for others to shoot at.
    CREATE FUNCTION dbo.tfn_CleanString
    /* ===================================================================
    12/02/2018 JL, Created: Comments    
    =================================================================== */
    --===== Define I/O parameters
    (
        @pString NVARCHAR(4000)
    )
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN
        WITH
            cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)),
            cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
            cte_Tally (n) AS (
                SELECT TOP (LEN(@pString))
                    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                FROM
                    cte_n2 a CROSS JOIN cte_n2 b
                )
        SELECT
            clean_string = CONVERT(VARCHAR(4000),((
                SELECT
                    '' + SUBSTRING(@pString, t.n,1)
                FROM
                    cte_Tally t
                WHERE
                    --ASCII(SUBSTRING(@pString, t.n,1)) BETWEEN 32 AND 126
                    CONVERT(BINARY(1), SUBSTRING(@pString, t.n,1)) BETWEEN 0x20 AND 0x7E    -- the binary(1) comparison seems a wee bit faster than the ascii comparison... but... not enough testing to say for sure.
                ORDER BY
                    t.n
                FOR XML PATH(''), TYPE).value('./text()[1]', 'nvarchar(max)')
                ));
    GO

    Take it for a test run...
    DECLARE
        @pString NVARCHAR(4000) =
            CHAR(0) + CHAR(1) + CHAR(2) + CHAR(3) + CHAR(4) + CHAR(5) + CHAR(6) + CHAR(7) + CHAR(8) + CHAR(9) + CHAR(10) + CHAR(11) + CHAR(12) + CHAR(13) + CHAR(14) + CHAR(15) + CHAR(16)
            + CHAR(17) + CHAR(18) + CHAR(19) + CHAR(20) + CHAR(21) + CHAR(22) + CHAR(23) + CHAR(24) + CHAR(25) + CHAR(26) + CHAR(27) + CHAR(28) + CHAR(29) + CHAR(30) + CHAR(31) + CHAR(32)
            + CHAR(33) + CHAR(34) + CHAR(35) + CHAR(36) + CHAR(37) + CHAR(38) + CHAR(39) + CHAR(40) + CHAR(41) + CHAR(42) + CHAR(43) + CHAR(44) + CHAR(45) + CHAR(46) + CHAR(47) + CHAR(48)
            + CHAR(49) + CHAR(50) + CHAR(51) + CHAR(52) + CHAR(53) + CHAR(54) + CHAR(55) + CHAR(56) + CHAR(57) + CHAR(58) + CHAR(59) + CHAR(60) + CHAR(61) + CHAR(62) + CHAR(63) + CHAR(64)
            + CHAR(65) + CHAR(66) + CHAR(67) + CHAR(68) + CHAR(69) + CHAR(70) + CHAR(71) + CHAR(72) + CHAR(73) + CHAR(74) + CHAR(75) + CHAR(76) + CHAR(77) + CHAR(78) + CHAR(79) + CHAR(80)
            + CHAR(81) + CHAR(82) + CHAR(83) + CHAR(84) + CHAR(85) + CHAR(86) + CHAR(87) + CHAR(88) + CHAR(89) + CHAR(90) + CHAR(91) + CHAR(92) + CHAR(93) + CHAR(94) + CHAR(95) + CHAR(96)
            + CHAR(97) + CHAR(98) + CHAR(99) + CHAR(100) + CHAR(101) + CHAR(102) + CHAR(103) + CHAR(104) + CHAR(105) + CHAR(106) + CHAR(107) + CHAR(108) + CHAR(109) + CHAR(110) + CHAR(111)
            + CHAR(112) + CHAR(113) + CHAR(114) + CHAR(115) + CHAR(116) + CHAR(117) + CHAR(118) + CHAR(119) + CHAR(120) + CHAR(121) + CHAR(122) + CHAR(123) + CHAR(124) + CHAR(125) + CHAR(126)
            + CHAR(127);

    SELECT
        cs.clean_string
    FROM
        dbo.tfn_CleanString(@pString) cs;

    Results...
    clean_string
    -------------------------------------------------------------------------------------------------
    !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

  • Scott In Sydney - Sunday, December 2, 2018 5:08 PM

    Eirikur Eiriksson - Sunday, December 2, 2018 2:35 AM

    Here is one suggestion, not performance tested yet.
    😎


    CREATE OR ALTER FUNCTION dbo.ITVFN_CONTAINS_CTRL_CHAR
    (
      @INPUTSTR VARCHAR(8000)
    )
    RETURNS TABLE
    WITH SCHEMABINDING
    AS
    RETURN
      SELECT CTRLFLG = SIGN(PATINDEX(0x255B002D1F7F5D25,@INPUTSTR COLLATE Latin1_General_BIN2));

    But that didn't work :-/.  But yours does 🙂  Can you explain how you derived your patindex?

    What Eirikur did was to take the value you were making with the concatenation, and then convert it to a hex value.
    If you look at the value, 0x255B002D1F7F5D25, and break it down into bytes, it's easier to see... 25 5B 00 2D 1F 7F 5D 25

    Okay... its easier to see with a bit of translation...
    SELECT
        0x25,
        0x5B,
        0x00,
        0x2D,
        0x1F,
        0x7F,
        0x5D,
        0x25;
    SELECT
        ASCII(0x25),
        ASCII(0x5B),
        ASCII(0x00),
        ASCII(0x2D),
        ASCII(0x1F),
        ASCII(0x7F),
        ASCII(0x5D),
        ASCII(0x25);
    SELECT
        CHAR(0x25),
        CHAR(0x5B),
        CHAR(0x00),
        CHAR(0x2D),
        CHAR(0x1F),
        CHAR(0x7F),
        CHAR(0x5D),
        CHAR(0x25);

    ---- ---- ---- ---- ---- ---- ---- ----
    0x25 0x5B 0x00 0x2D 0x1F 0x7F 0x5D 0x25

                                
    ----------- ----------- ----------- ----------- ----------- ----------- ----------- -----------
    37    91    0    45    31    127   93    37

               
    ---- ---- ---- ---- ---- ---- ---- ----
    %  [   -      ]  %

  • Scott In Sydney - Sunday, December 2, 2018 4:17 PM

    Hi All,

    Thanks for the replies so far.  TBH, I kind of expected "looks pretty good" or "it sucks, here's why".  Which I guess is Jeff's reply, except much more polite 😉

    Google "sql server remove null character from string", or something similar.  You'll get hundreds of hits (ok, 3.8M+ hits, although I have not read them all) with suggestions similar to my approach (either loop character by character or use PATINDEX).

    So, is this topic worth an article (as well as a reply to my post)?  This seems to be a common issue, and it would be great if Google returned an answer near the top of the hit list illustrating best practice. 

    @jeff, I'm familiar with your Tally Table/8KStringSpitter article, and @Eirikur, IIRC I believe you suggested some enhancements?  So, perhaps an article similar to those?

    @jason, we are a combined SAS + SQL Server shop.  Our data is stored in SQL Server, and we use SAS for a lot of our ETL and all of our analytics.  For various reasons, I am a proponent of doing as much as possible in the database, mainly for performance and code reusability.  Pulling the data from SQL Server to SAS, having SAS do the transformation, and pushing the data back to SQL Server, works, but does not perform well.  SAS has a function COMPRESS which will "compress" (remove) characters from a string.  I patterned my CleanString function after SAS' compress function as much as possible, although in many cases it's overkill for what I would normally need.  The functionality I would need most of the time is to delete control characters, leaving the remaining characters in the string.

    I've spent a fair bit of time trying to wrap my head around the repercussions of having null (0x00) characters in a string.  I've attached my test script, see embedded comments.  Hopefully I've covered all the use cases?

    Null characters are particularly problematic when I use SSIS, since it causes data truncation.  I also have to cope with the scenario where I have to copy bad data from Server A to Server B via SSIS, where I don't have any write access to Server A (so can't create an ITVF, view, SP, etc).  In that case, the only thing I can think of is:

    SELECT REPLACE(foo,CHAR(0),'') as foo, num1, REPLACE(bar,CHAR(0),'') as bar, num2 FROM SomeTable

    to stage the data onto "my" server, where I can then use my function, ITVF, etc. against the staged data to clean any remaining control characters (often tabs).

    I can use a code generator to help build the SELECT statement:

    DECLARE @Cursor CURSOR;
    DECLARE @Table_Schema SYSNAME = 'dbo'
       ,@Table_Name SYSNAME = 'EPISODE' -- <<<<<
       ,@Column_Name SYSNAME
       ,@Column_Type SYSNAME
       ,@sql NVARCHAR(MAX) = ''
       ,@line NVARCHAR(256);

    SET @Cursor = CURSOR FOR SELECT COLUMN_NAME
               ,DATA_TYPE
    FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_SCHEMA = @Table_Schema
       AND TABLE_NAME = @Table_Name
       AND COLUMN_NAME NOT IN ('PK')
    ORDER BY ORDINAL_POSITION;

    OPEN @Cursor;
    FETCH NEXT FROM @Cursor INTO @Column_Name,@Column_Type;
    WHILE @@fetch_status = 0
    BEGIN
    SET @line = REPLICATE(' ',256);
    IF @Column_Type LIKE '%char'
    BEGIN
      SET @line = STUFF(@line,7,50,CONCAT(',dbo.fn_CleanString(',QUOTENAME(@Column_Name,'['),',:CNTRL:,''DELETE STRIP'')'));
      SET @line = STUFF(@line,93,50,CONCAT('AS ',QUOTENAME(@Column_Name,'[')));
    END
    ELSE
    BEGIN
      SET @line = STUFF(@line,7,50,CONCAT(',',QUOTENAME(@Column_Name,'[')));
      SET @line = STUFF(@line,93,50,CONCAT('AS ',QUOTENAME(@Column_Name,'[')));
    END
    SET @sql += @line + CHAR(13);
    FETCH NEXT FROM @Cursor INTO @Column_Name,@Column_Type;
    END
    CLOSE @Cursor;
    DEALLOCATE @Cursor;

    SET @sql = STUFF(@sql,1,7,'SELECT ');
    SET @sql =
    'CREATE OR ALTER VIEW [cln].vw{TgtTable}
    AS
    ' + @sql +
    'FROM {SrcTable};
    GO';

    SET @sql = REPLACE(@sql,'{SrcTable}',CONCAT(@Table_Schema,'.',@Table_Name));
    SET @sql = REPLACE(@sql,'{TgtTable}',@Table_Name);

    SELECT CAST('<![CDATA[' + @sql + ']]>' AS XML)

    What would be ideal is an approach where I could:

    • Clean control characters from character columns of a given table (since I don't know a priory which columns have bad data, I have to clean them all, regardless of the performance hit)

    • Return that data using the original column names

    • And be able to do that on a server where I don't have rights to create views, functions, etc.

    • I'd be happy with either a SSIS (C# script?  BIML?) solution, or T-SQL solution.  My guess is T-SQL would perform better, and would be my preferred approach, as it would have applicability outside of SSIS.

    • I could do this with dynamic SQL, but that may not be the best approach?

    • But again, this is ideal.  Any improvements to what I've posted is most welcome.
    And yeah, this begs the question, why doesn't my upstream data provider (a different department in my government agency) clean their data before loading it?  I'll bite my tongue and simply say sometimes you have to work with what you've got, and try to do the best job possible with the data you're given.

    Thanks again for the replies so far.

    The function is a great concept and it's well documented and fairly easy to read.  Well done in all of those areas and if the only purpose of code was to provide functionality, then I'd say, yes, go for it as an article.  I'm not making that recommendation, though.

    As many others do, I get "twitchy" when I see such good ideas rendered out as both a scalar function and then RBAR within the function.  If you were to publish the function, you'd get high praises from functional-only users and likely be slammed to the wall by folks that are also concerned with performance (although I've not yet personally tested it for performance... it's that "twitchy" thing I speak of).

    I've been horribly busy at work and haven't had the time to give your code the testing it deserves nor provide a possible alternative to the contained RBAR.   There's a discussion on this site having to do with things of the nature contained in your code and I've not yet looked well enough to provide you with the link but the performance improvements were incredible and, yes, one of the methods that took it in the britches was the very same short loop you used in your code.

    It looks like you have some of the heavy hitters involved in this discussion and that makes it all a bit less urgent for me but I will test it and make some recommendations myself.  I just can't do it as quickly as some of the others.  For example, I worked for about 20 hours over this past weekend.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • @jeff,

    Just to be clear:  I meant for you (or Eirikur, or Jason, or whoever) to create the article.  I certainly wasn't suggesting I do it, or to propose my current code.

    I just thought an article might have a greater hit rate on Google (depending on its contents I guess), or at least be easier to find here in SSC.  Again, when I Googled this subject, I didn't get a single hit (or at least in the first 3-4 pages I viewed) suggesting an ITVF approach.  They all suggested a variation of what I posted.

    If this (http://www.sqlservercentral.com/articles/Tally+Table/72993/) is worthy of an article I would think cleaning bad characters out of a column to also be worth considering.  But that may be just my opinion.

    There's a discussion on this site having to do with things of the nature contained in your code and I've not yet looked well enough to provide you with the link

    If you or someone else can give me some search query hints re: this discussion I'll search and try to do my own research.

    I appreciate the time you took posting even when you're so busy.

  • You could use Alan's script here or a variant thereof to remove characters you don't want.

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply