Need advice on this string manipulate scalar UDF

  • Dear all,

    Can you please comment on this UDF, is there any better way to do it?

    For example CLF scalar UDF or inline table UDF?

    There is around 50 Million rows in TableABC and it will be around 1GB in size if it is exported as plain text .

    we want to use this function in this way:

    SELECT 'D' AS RECORDTYPECODE,

    a.[Id]

    ,a.[Col1]

    ,a.[Col2]

    ,a.[Col3]

    ,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col4])

    ,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col5])

    ,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col6])

    ,a.[genID]

    FROM %DBNAME%.TableABC a

    The function is shown below:

    USE [BPSUAT_BICD]

    GO

    /****** Object: UserDefinedFunction [dbo].[ReplaceSplVarcharsBICD] Script Date: 04/10/2013 16:08:15 ******/

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    CREATE function [dbo].[ReplaceSplVarcharsBICD](@value varchar(8000))RETURNS varchar(8000)

    as

    begin

    declare @result varchar(8000);

    -- This function is used to Replace some Special characters with space for varchar and char datatypes.

    -- Replacing for Enter, newline

    select @result = REPLACE(REPLACE(REPLACE(@value, CHAR(10) + CHAR(13), ' '),CHAR(10),' '), CHAR(13),' ');

    --Replacing other non printable characters from Hex 00 to Hex'1F' and Hex'7F'

    select @result = REPLACE(@result, CHAR(0),' ');

    select @result = REPLACE(@result, CHAR(1),' ');

    select @result = REPLACE(@result, CHAR(2),' ');

    select @result = REPLACE(@result, CHAR(3),' ');

    select @result = REPLACE(@result, CHAR(4),' ');

    select @result = REPLACE(@result, CHAR(5),' ');

    select @result = REPLACE(@result, CHAR(6),' ');

    select @result = REPLACE(@result, CHAR(7),' ');

    select @result = REPLACE(@result, CHAR(8),' ');

    select @result = REPLACE(@result, CHAR(9),' ');

    --select @result = REPLACE(@result, CHAR(10),' ');

    select @result = REPLACE(@result, CHAR(11),' ');

    select @result = REPLACE(@result, CHAR(12),' ');

    --select @result = REPLACE(@result, CHAR(13),' ');

    select @result = REPLACE(@result, CHAR(14),' ');

    select @result = REPLACE(@result, CHAR(15),' ');

    select @result = REPLACE(@result, CHAR(16),' ');

    select @result = REPLACE(@result, CHAR(17),' ');

    select @result = REPLACE(@result, CHAR(18),' ');

    select @result = REPLACE(@result, CHAR(19),' ');

    select @result = REPLACE(@result, CHAR(20),' ');

    select @result = REPLACE(@result, CHAR(21),' ');

    select @result = REPLACE(@result, CHAR(22),' ');

    select @result = REPLACE(@result, CHAR(23),' ');

    select @result = REPLACE(@result, CHAR(24),' ');

    select @result = REPLACE(@result, CHAR(25),' ');

    select @result = REPLACE(@result, CHAR(26),' ');

    select @result = REPLACE(@result, CHAR(27),' ');

    select @result = REPLACE(@result, CHAR(28),' ');

    select @result = REPLACE(@result, CHAR(29),' ');

    select @result = REPLACE(@result, CHAR(30),' ');

    select @result = REPLACE(@result, CHAR(31),' ');

    select @result = REPLACE(@result, CHAR(127),' ');

    --Replacing the pipe symbol

    select @result = REPLACE(@result, CHAR(124),' ');

    --Repalcing the NULs

    select @result = Nullif(@result,'');

    --Removing the Right Spaces

    select @result = RTRIM(@result);

    --Removing the Left Spaces

    select @result = LTRIM(@result);

    return @result;

    end

    or should we use CLF scalar UDF like this

    using System;

    using System.Data.Sql;

    using System.Data.SqlTypes;

    using System.Collections.Generic;

    using Microsoft.SqlServer.Server;

    using System.Text;

    using System.Text.RegularExpressions;

    public partial class UserDefinedFunctions

    {

    [Microsoft.SqlServer.Server.SqlFunction]

    public static SqlString ReplaceSplVarcharsBICD(SqlString s){

    if (s.IsNull) return String.Empty;

    //Removing the Right Spaces and Left Spaces

    string s1 = s.ToString().Trim();

    if (s1.Length == 0) return String.Empty;

    StringBuilder tmpS = new StringBuilder(s1.Length);

    //striping out the "control characters"

    //Control characters are non-printing and formatting characters, such as ACK, BEL, CR, FF, LF, and VT. The Unicode standard assigns the following code points to control //characters: from \U0000 to \U001F, \U007F, and from \U0080 to \U009F. According to the Unicode standard, these values are to be interpreted as control characters unless //their use is otherwise defined by an application. Valid control characters are members of the UnicodeCategory.Control category.

    //Replacing for Enter, newline 0D0A or \x000D (13) or \x000A (10)

    //Replacing other non printable characters from Hex 00 to Hex'1F' and Hex'7F' subset of control characters

    //Replacing the pipe symbol \x007C (124)

    //Repalcing the NULs 00

    for (int i = 0; i <= s1.Length; i++)

    {

    if (!Char.IsControl(s1) & !s1 =='\x000D' & !s1 =='\x000A' & !s1 =='\x007C')

    {

    tmpS.Append(s1);

    }

    else

    {

    tmpS.Append(' ');

    }

    }

    string result = tmpS.ToString().Trim();

    return result;

    }

    }

  • It looks you have built both versions. Have you tested them? I would recommend nesting all of your replaces instead of doing them one at a time over and over. You can convert the entire function to a single statement. This will lend itself to becoming an iTVF pretty decently. It will take some effort to nest all those replaces but it will end up being faster. The CLR version may be faster. You will have to test that yourself in your system against your data. πŸ˜€

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • I have written another UDF to use while loop shown as below:

    create function dbo.[ReplaceSplVarcharsBICD_test3] (@s varchar(8000)) returns varchar(8000)

    with schemabinding

    begin

    if @s-2 is null

    return null

    declare @s2 varchar(8000)

    set @s2 = ''

    declare @l int

    set @l = len(@s)

    declare @p int

    set @p = 1

    while @p <= @l begin

    declare @C int

    set @C = ascii(substring(@s, @p, 1))

    --any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)

    if @C between 32 and 123 or @C between 125 and 126

    set @s2 = @s2 + char(@c)

    --any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)

    if @C between 0 and 31 or @C = 124 or @C= 127

    set @s2 = @s2 + char(32)

    set @p = @p + 1

    end

    if len(@s2) = 0

    return null

    return ltrim(rtrim(@s2))

    end

    go

    Testing results:

    1. If we don't call any udf, it took 30 minutes to dump all the data into a file

    2. If we use the first udf, it took 2.5 hours to dump all the data into a file

    3. If we use the second udf, it took 35 minutes to dump all the data into a file.

    Can you let me know if I rewrite the UDF into inline table level UDF, how to use the cross apply

    My current sql is shown as below:

    SELECT 'D' AS RECORDTYPECODE

    , a.[Id]

    ,a.[FundsMovementItemTypeId]

    ,a.[FundsMovementSetId]

    ,a.[FeeTypeId]

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountName])

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[GLAccountName])

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[LodgementReference])

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountNumber])

    ,a.[IsSundryParties]

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceBSB])

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountName])

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountNumber])

    ,a.[IsSundryPartiesTraceACC]

    ,a.[IsAttractGST]

    ,a.[IsCustomerAccount]

    ,a.[Amount]

    ,a.[OnDateTime]

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountBankCode])

    ,a.[U1]

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[U2])

    ,a.[U3]

    ,a.[U5]

    ,a.[genID]

    FROM %DBNAME%.FundsMovementItem a

  • liujchen 26197 (4/16/2013)


    I have written another UDF to use while loop shown as below:

    create function dbo.[ReplaceSplVarcharsBICD_test3] (@s varchar(8000)) returns varchar(8000)

    with schemabinding

    begin

    if @s-2 is null

    return null

    declare @s2 varchar(8000)

    set @s2 = ''

    declare @l int

    set @l = len(@s)

    declare @p int

    set @p = 1

    while @p <= @l begin

    declare @C int

    set @C = ascii(substring(@s, @p, 1))

    --any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)

    if @C between 32 and 123 or @C between 125 and 126

    set @s2 = @s2 + char(@c)

    --any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)

    if @C between 0 and 31 or @C = 124 or @C= 127

    set @s2 = @s2 + char(32)

    set @p = @p + 1

    end

    if len(@s2) = 0

    return null

    return ltrim(rtrim(@s2))

    end

    go

    Testing results:

    1. If we don't call any udf, it took 30 minutes to dump all the data into a file

    2. If we use the first udf, it took 2.5 hours to dump all the data into a file

    3. If we use the second udf, it took 35 minutes to dump all the data into a file.

    Can you let me know if I rewrite the UDF into inline table level UDF, how to use the cross apply

    My current sql is shown as below:

    SELECT 'D' AS RECORDTYPECODE

    , a.[Id]

    ,a.[FundsMovementItemTypeId]

    ,a.[FundsMovementSetId]

    ,a.[FeeTypeId]

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountName])

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[GLAccountName])

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[LodgementReference])

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountNumber])

    ,a.[IsSundryParties]

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceBSB])

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountName])

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountNumber])

    ,a.[IsSundryPartiesTraceACC]

    ,a.[IsAttractGST]

    ,a.[IsCustomerAccount]

    ,a.[Amount]

    ,a.[OnDateTime]

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountBankCode])

    ,a.[U1]

    ,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[U2])

    ,a.[U3]

    ,a.[U5]

    ,a.[genID]

    FROM %DBNAME%.FundsMovementItem a

    I would RUN. don't walk, away from that looping version as fast as you can go. If you move away fast it will be the ONLY time that function will see anything go fast. The nested replace option in your original post is going to be your best choice here. It will be a little tedious to code it but if you make it an iTVF it will be super fast. To gain the benefit of an iTVF you need to keep it to one statement. That is why I suggest using nested replaces.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Thanks a lot for your reply!

    Can you please guide me how to call a iTVF function

    in the Select SQL I posted? Same syntax or I need

    to use Cross Apply?

    I don't mind to try the nested replace, I

    have created one, but I can't execute the Select

    SQL in the original way.

    Thanks and Regards,

    Jingchen

  • liujchen 26197 (4/16/2013)


    Thanks a lot for your reply!

    Can you please guide me how to call a iTVF function

    in the Select SQL I posted? Same syntax or I need

    to use Cross Apply?

    I don't mind to try the nested replace, I

    have created one, but I can't execute the Select

    SQL in the original way.

    Thanks and Regards,

    Jingchen

    Well an iTVF returns a table. So you can use CROSS APPLY or you can join to it. I would think in your case that a CROSS APPLY is how I would handle it.

    You should check out Paul White's article on using apply. A great two article series that will help understand how it works.

    http://www.sqlservercentral.com/articles/APPLY/69953/[/url]

    http://www.sqlservercentral.com/articles/APPLY/69954/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • Here's an iTVF which might do what you want. You will need to check that the xml concatenation can deal with the characters you wish to include:

    ALTER FUNCTION [dbo].[ifn_RemoveUnwantedChars]

    (@strTarget VARCHAR(1000))

    RETURNS TABLE

    AS

    RETURN

    (

    WITH E1(N) AS ( -- 10 rows

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ),

    iTally(n) AS (

    SELECT TOP (DATALENGTH(@strTarget))

    n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM E1 a, E1 b, E1 c) -- 1000 rows

    SELECT

    OutputString = (

    SELECT '' + Eeuatkerrikta

    FROM iTally t

    CROSS APPLY (SELECT kerrikta = SUBSTRING(@strTarget,n,1)) d

    CROSS APPLY (SELECT Eeuatkerrikta = CASE

    WHEN ASCII(kerrikta) BETWEEN 0 AND 31 OR ASCII(kerrikta) IN (124,127) THEN ' '

    ELSE kerrikta END) e

    ORDER BY t.n

    FOR XML PATH(''), TYPE).value('.', 'varchar(1000)'))

    β€œWrite the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • I have read the cross apply articles

    I made the following code to see if I understand it correctly.

    SELECT 'D' AS RECORDTYPECODE

    , a.[Id]

    ,a.[FundsMovementItemTypeId]

    ,a.[FundsMovementSetId]

    ,a.[FeeTypeId]

    ,AccountName = iTVF.result

    ,GLAccountName = iTVF1.result

    ,LodgementReference = iTVF2.result

    ,AccountNumber = iTVF3.result

    ,a.[IsSundryParties]

    , TraceBSB = iTVF4.result

    ,TraceAccountName = iTVF5.result

    ,TraceAccountNumber = iTVF6.result

    ,a.[IsSundryPartiesTraceACC]

    ,a.[IsAttractGST]

    ,a.[IsCustomerAccount]

    ,a.[Amount]

    ,a.[OnDateTime]

    ,AccountBankCode = iTVF7.result

    ,a.[U1]

    ,U2 = iTVF8.result

    ,a.[U3]

    ,a.[U5]

    ,a.[genID]

    FROM %DBNAME%.FundsMovementItem a

    CROSS

    APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountName) iTVF

    CROSS

    APPLY dbo.ReplaceSplVarcharsBICD_test4(a.GLAccountName) iTVF1

    CROSS

    APPLY dbo.ReplaceSplVarcharsBICD_test4(a.LodgementReference) iTVF2

    CROSS

    APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountNumber) iTVF3

    CROSS

    APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceBSB) iTVF4

    CROSS

    APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceAccountName) iTVF5

    CROSS

    APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceAccountNumber) iTVF6

    CROSS

    APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountBankCode) iTVF7

    CROSS

    APPLY dbo.ReplaceSplVarcharsBICD_test4(a.U2) iTVF8

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply