Need advice on this string manipulate scalar UDF

Question

Need advice on this string manipulate scalar UDF

liujchen 26197

Valued Member

Points: 50
More actions
April 10, 2013 at 6:44 am

#270949

Dear all,
Can you please comment on this UDF, is there any better way to do it?
For example CLF scalar UDF or inline table UDF?
There is around 50 Million rows in TableABC and it will be around 1GB in size if it is exported as plain text .
we want to use this function in this way:
SELECT 'D' AS RECORDTYPECODE,
a.[Id]
,a.[Col1]
,a.[Col2]
,a.[Col3]
,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col4])
,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col5])
,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col6])
,a.[genID]
FROM %DBNAME%.TableABC a
The function is shown below:
USE [BPSUAT_BICD]
GO
/****** Object: UserDefinedFunction [dbo].[ReplaceSplVarcharsBICD] Script Date: 04/10/2013 16:08:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE function [dbo].[ReplaceSplVarcharsBICD](@value varchar(8000))RETURNS varchar(8000)
as
begin
declare @result varchar(8000);
-- This function is used to Replace some Special characters with space for varchar and char datatypes.
-- Replacing for Enter, newline
select @result = REPLACE(REPLACE(REPLACE(@value, CHAR(10) + CHAR(13), ' '),CHAR(10),' '), CHAR(13),' ');
--Replacing other non printable characters from Hex 00 to Hex'1F' and Hex'7F'
select @result = REPLACE(@result, CHAR(0),' ');
select @result = REPLACE(@result, CHAR(1),' ');
select @result = REPLACE(@result, CHAR(2),' ');
select @result = REPLACE(@result, CHAR(3),' ');
select @result = REPLACE(@result, CHAR(4),' ');
select @result = REPLACE(@result, CHAR(5),' ');
select @result = REPLACE(@result, CHAR(6),' ');
select @result = REPLACE(@result, CHAR(7),' ');
select @result = REPLACE(@result, CHAR(8),' ');
select @result = REPLACE(@result, CHAR(9),' ');
--select @result = REPLACE(@result, CHAR(10),' ');
select @result = REPLACE(@result, CHAR(11),' ');
select @result = REPLACE(@result, CHAR(12),' ');
--select @result = REPLACE(@result, CHAR(13),' ');
select @result = REPLACE(@result, CHAR(14),' ');
select @result = REPLACE(@result, CHAR(15),' ');
select @result = REPLACE(@result, CHAR(16),' ');
select @result = REPLACE(@result, CHAR(17),' ');
select @result = REPLACE(@result, CHAR(18),' ');
select @result = REPLACE(@result, CHAR(19),' ');
select @result = REPLACE(@result, CHAR(20),' ');
select @result = REPLACE(@result, CHAR(21),' ');
select @result = REPLACE(@result, CHAR(22),' ');
select @result = REPLACE(@result, CHAR(23),' ');
select @result = REPLACE(@result, CHAR(24),' ');
select @result = REPLACE(@result, CHAR(25),' ');
select @result = REPLACE(@result, CHAR(26),' ');
select @result = REPLACE(@result, CHAR(27),' ');
select @result = REPLACE(@result, CHAR(28),' ');
select @result = REPLACE(@result, CHAR(29),' ');
select @result = REPLACE(@result, CHAR(30),' ');
select @result = REPLACE(@result, CHAR(31),' ');
select @result = REPLACE(@result, CHAR(127),' ');
--Replacing the pipe symbol
select @result = REPLACE(@result, CHAR(124),' ');
--Repalcing the NULs
select @result = Nullif(@result,'');
--Removing the Right Spaces
select @result = RTRIM(@result);
--Removing the Left Spaces
select @result = LTRIM(@result);
return @result;
end
or should we use CLF scalar UDF like this
using System;
using System.Data.Sql;
using System.Data.SqlTypes;
using System.Collections.Generic;
using Microsoft.SqlServer.Server;
using System.Text;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString ReplaceSplVarcharsBICD(SqlString s){
if (s.IsNull) return String.Empty;
//Removing the Right Spaces and Left Spaces
string s1 = s.ToString().Trim();
if (s1.Length == 0) return String.Empty;
StringBuilder tmpS = new StringBuilder(s1.Length);
//striping out the "control characters"
//Control characters are non-printing and formatting characters, such as ACK, BEL, CR, FF, LF, and VT. The Unicode standard assigns the following code points to control //characters: from \U0000 to \U001F, \U007F, and from \U0080 to \U009F. According to the Unicode standard, these values are to be interpreted as control characters unless //their use is otherwise defined by an application. Valid control characters are members of the UnicodeCategory.Control category.
//Replacing for Enter, newline 0D0A or \x000D (13) or \x000A (10)
//Replacing other non printable characters from Hex 00 to Hex'1F' and Hex'7F' subset of control characters
//Replacing the pipe symbol \x007C (124)
//Repalcing the NULs 00
for (int i = 0; i <= s1.Length; i++)
{
if (!Char.IsControl(s1) & !s1 =='\x000D' & !s1 =='\x000A' & !s1 =='\x007C')
{
tmpS.Append(s1);
}
else
{
tmpS.Append(' ');
}
}
string result = tmpS.ToString().Trim();
return result;
}
}

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Sean Lange SSC Guru Points: 286573 More actions · Answer 1

It looks you have built both versions. Have you tested them? I would recommend nesting all of your replaces instead of doing them one at a time over and over. You can convert the entire function to a single statement. This will lend itself to becoming an iTVF pretty decently. It will take some effort to nest all those replaces but it will end up being faster. The CLR version may be faster. You will have to test that yourself in your system against your data. 😀

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

liujchen 26197 Valued Member Points: 50 More actions · Answer 2

I have written another UDF to use while loop shown as below:

create function dbo.[ReplaceSplVarcharsBICD_test3] (@s varchar(8000)) returns varchar(8000)

with schemabinding

begin

if @s-2 is null

return null

declare @s2 varchar(8000)

set @s2 = ''

declare @l int

set @l = len(@s)

declare @p int

set @p = 1

while @p <= @l begin

declare @C int

set @C = ascii(substring(@s, @p, 1))

--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)

if @C between 32 and 123 or @C between 125 and 126

set @s2 = @s2 + char(@c)

--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)

if @C between 0 and 31 or @C = 124 or @C= 127

set @s2 = @s2 + char(32)

set @p = @p + 1

end

if len(@s2) = 0

return null

return ltrim(rtrim(@s2))

end

go

Testing results:

1. If we don't call any udf, it took 30 minutes to dump all the data into a file

2. If we use the first udf, it took 2.5 hours to dump all the data into a file

3. If we use the second udf, it took 35 minutes to dump all the data into a file.

Can you let me know if I rewrite the UDF into inline table level UDF, how to use the cross apply

My current sql is shown as below:

SELECT 'D' AS RECORDTYPECODE

, a.[Id]

,a.[FundsMovementItemTypeId]

,a.[FundsMovementSetId]

,a.[FeeTypeId]

,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountName])

,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[GLAccountName])

,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[LodgementReference])

,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountNumber])

,a.[IsSundryParties]

,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceBSB])

,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountName])

,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountNumber])

,a.[IsSundryPartiesTraceACC]

,a.[IsAttractGST]

,a.[IsCustomerAccount]

,a.[Amount]

,a.[OnDateTime]

,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountBankCode])

,a.[U1]

,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[U2])

,a.[U3]

,a.[U5]

,a.[genID]

FROM %DBNAME%.FundsMovementItem a

Sean Lange SSC Guru Points: 286573 More actions · Answer 3

liujchen 26197 (4/16/2013)
I have written another UDF to use while loop shown as below:
create function dbo.[ReplaceSplVarcharsBICD_test3] (@s varchar(8000)) returns varchar(8000)
with schemabinding
begin
if @s-2 is null
return null
declare @s2 varchar(8000)
set @s2 = ''
declare @l int
set @l = len(@s)
declare @p int
set @p = 1
while @p <= @l begin
declare @C int
set @C = ascii(substring(@s, @p, 1))
--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)
if @C between 32 and 123 or @C between 125 and 126
set @s2 = @s2 + char(@c)
--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)
if @C between 0 and 31 or @C = 124 or @C= 127
set @s2 = @s2 + char(32)
set @p = @p + 1
end
if len(@s2) = 0
return null
return ltrim(rtrim(@s2))
end
go
Testing results:
1. If we don't call any udf, it took 30 minutes to dump all the data into a file
2. If we use the first udf, it took 2.5 hours to dump all the data into a file
3. If we use the second udf, it took 35 minutes to dump all the data into a file.
Can you let me know if I rewrite the UDF into inline table level UDF, how to use the cross apply
My current sql is shown as below:
SELECT 'D' AS RECORDTYPECODE
, a.[Id]
,a.[FundsMovementItemTypeId]
,a.[FundsMovementSetId]
,a.[FeeTypeId]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[GLAccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[LodgementReference])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountNumber])
,a.[IsSundryParties]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceBSB])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountNumber])
,a.[IsSundryPartiesTraceACC]
,a.[IsAttractGST]
,a.[IsCustomerAccount]
,a.[Amount]
,a.[OnDateTime]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountBankCode])
,a.[U1]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[U2])
,a.[U3]
,a.[U5]
,a.[genID]
FROM %DBNAME%.FundsMovementItem a

I would RUN. don't walk, away from that looping version as fast as you can go. If you move away fast it will be the ONLY time that function will see anything go fast. The nested replace option in your original post is going to be your best choice here. It will be a little tedious to code it but if you make it an iTVF it will be super fast. To gain the benefit of an iTVF you need to keep it to one statement. That is why I suggest using nested replaces.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

liujchen 26197 Valued Member Points: 50 More actions · Answer 4

Thanks a lot for your reply!

Can you please guide me how to call a iTVF function

in the Select SQL I posted? Same syntax or I need

to use Cross Apply?

I don't mind to try the nested replace, I

have created one, but I can't execute the Select

SQL in the original way.

Thanks and Regards,

Jingchen

Sean Lange SSC Guru Points: 286573 More actions · Answer 5

liujchen 26197 (4/16/2013)
Thanks a lot for your reply!
Can you please guide me how to call a iTVF function
in the Select SQL I posted? Same syntax or I need
to use Cross Apply?
I don't mind to try the nested replace, I
have created one, but I can't execute the Select
SQL in the original way.
Thanks and Regards,
Jingchen

Well an iTVF returns a table. So you can use CROSS APPLY or you can join to it. I would think in your case that a CROSS APPLY is how I would handle it.

You should check out Paul White's article on using apply. A great two article series that will help understand how it works.

http://www.sqlservercentral.com/articles/APPLY/69953/[/url]

http://www.sqlservercentral.com/articles/APPLY/69954/[/url]

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

Here's an iTVF which might do what you want. You will need to check that the xml concatenation can deal with the characters you wish to include:

ALTER FUNCTION [dbo].[ifn_RemoveUnwantedChars]

(@strTarget VARCHAR(1000))

RETURNS TABLE

AS

RETURN

(

WITH E1(N) AS ( -- 10 rows

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

),

iTally(n) AS (

SELECT TOP (DATALENGTH(@strTarget))

n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

FROM E1 a, E1 b, E1 c) -- 1000 rows

SELECT

OutputString = (

SELECT '' + Eeuatkerrikta

FROM iTally t

CROSS APPLY (SELECT kerrikta = SUBSTRING(@strTarget,n,1)) d

CROSS APPLY (SELECT Eeuatkerrikta = CASE

WHEN ASCII(kerrikta) BETWEEN 0 AND 31 OR ASCII(kerrikta) IN (124,127) THEN ' '

ELSE kerrikta END) e

ORDER BY t.n

FOR XML PATH(''), TYPE).value('.', 'varchar(1000)'))

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

liujchen 26197 Valued Member Points: 50 More actions · Answer 7

I have read the cross apply articles

I made the following code to see if I understand it correctly.

SELECT 'D' AS RECORDTYPECODE

, a.[Id]

,a.[FundsMovementItemTypeId]

,a.[FundsMovementSetId]

,a.[FeeTypeId]

,AccountName = iTVF.result

,GLAccountName = iTVF1.result

,LodgementReference = iTVF2.result

,AccountNumber = iTVF3.result

,a.[IsSundryParties]

, TraceBSB = iTVF4.result

,TraceAccountName = iTVF5.result

,TraceAccountNumber = iTVF6.result

,a.[IsSundryPartiesTraceACC]

,a.[IsAttractGST]

,a.[IsCustomerAccount]

,a.[Amount]

,a.[OnDateTime]

,AccountBankCode = iTVF7.result

,a.[U1]

,U2 = iTVF8.result

,a.[U3]

,a.[U5]

,a.[genID]

FROM %DBNAME%.FundsMovementItem a

CROSS

APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountName) iTVF

CROSS

APPLY dbo.ReplaceSplVarcharsBICD_test4(a.GLAccountName) iTVF1

CROSS

APPLY dbo.ReplaceSplVarcharsBICD_test4(a.LodgementReference) iTVF2

CROSS

APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountNumber) iTVF3

CROSS

APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceBSB) iTVF4

CROSS

APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceAccountName) iTVF5

CROSS

APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceAccountNumber) iTVF6

CROSS

APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountBankCode) iTVF7

CROSS

APPLY dbo.ReplaceSplVarcharsBICD_test4(a.U2) iTVF8