Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase

Need advice on this string manipulate scalar UDF Expand / Collapse
Author
Message
Posted Wednesday, April 10, 2013 6:44 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, May 03, 2013 9:42 AM
Points: 4, Visits: 11
Dear all,

Can you please comment on this UDF, is there any better way to do it?
For example CLF scalar UDF or inline table UDF?


There is around 50 Million rows in TableABC and it will be around 1GB in size if it is exported as plain text .

we want to use this function in this way:
SELECT 'D' AS RECORDTYPECODE, 
a.[Id]
,a.[Col1]
,a.[Col2]
,a.[Col3]
,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col4])
,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col5])
,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col6])
,a.[genID]
FROM %DBNAME%.TableABC a

The function is shown below:

USE [BPSUAT_BICD]
GO
/****** Object: UserDefinedFunction [dbo].[ReplaceSplVarcharsBICD] Script Date: 04/10/2013 16:08:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO


CREATE function [dbo].[ReplaceSplVarcharsBICD](@value varchar(8000))RETURNS varchar(8000)
as
begin
declare @result varchar(8000);

-- This function is used to Replace some Special characters with space for varchar and char datatypes.

-- Replacing for Enter, newline
select @result = REPLACE(REPLACE(REPLACE(@value, CHAR(10) + CHAR(13), ' '),CHAR(10),' '), CHAR(13),' ');

--Replacing other non printable characters from Hex 00 to Hex'1F' and Hex'7F'

select @result = REPLACE(@result, CHAR(0),' ');
select @result = REPLACE(@result, CHAR(1),' ');
select @result = REPLACE(@result, CHAR(2),' ');
select @result = REPLACE(@result, CHAR(3),' ');
select @result = REPLACE(@result, CHAR(4),' ');
select @result = REPLACE(@result, CHAR(5),' ');
select @result = REPLACE(@result, CHAR(6),' ');
select @result = REPLACE(@result, CHAR(7),' ');
select @result = REPLACE(@result, CHAR(8),' ');
select @result = REPLACE(@result, CHAR(9),' ');

--select @result = REPLACE(@result, CHAR(10),' ');
select @result = REPLACE(@result, CHAR(11),' ');
select @result = REPLACE(@result, CHAR(12),' ');
--select @result = REPLACE(@result, CHAR(13),' ');
select @result = REPLACE(@result, CHAR(14),' ');
select @result = REPLACE(@result, CHAR(15),' ');
select @result = REPLACE(@result, CHAR(16),' ');
select @result = REPLACE(@result, CHAR(17),' ');
select @result = REPLACE(@result, CHAR(18),' ');
select @result = REPLACE(@result, CHAR(19),' ');
select @result = REPLACE(@result, CHAR(20),' ');


select @result = REPLACE(@result, CHAR(21),' ');
select @result = REPLACE(@result, CHAR(22),' ');
select @result = REPLACE(@result, CHAR(23),' ');
select @result = REPLACE(@result, CHAR(24),' ');
select @result = REPLACE(@result, CHAR(25),' ');
select @result = REPLACE(@result, CHAR(26),' ');
select @result = REPLACE(@result, CHAR(27),' ');
select @result = REPLACE(@result, CHAR(28),' ');
select @result = REPLACE(@result, CHAR(29),' ');
select @result = REPLACE(@result, CHAR(30),' ');
select @result = REPLACE(@result, CHAR(31),' ');
select @result = REPLACE(@result, CHAR(127),' ');

--Replacing the pipe symbol
select @result = REPLACE(@result, CHAR(124),' ');

--Repalcing the NULs
select @result = Nullif(@result,'');

--Removing the Right Spaces
select @result = RTRIM(@result);

--Removing the Left Spaces
select @result = LTRIM(@result);


return @result;

end

or should we use CLF scalar UDF like this
    using System;
using System.Data.Sql;
using System.Data.SqlTypes;
using System.Collections.Generic;
using Microsoft.SqlServer.Server;
using System.Text;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString ReplaceSplVarcharsBICD(SqlString s){
if (s.IsNull) return String.Empty;
//Removing the Right Spaces and Left Spaces
string s1 = s.ToString().Trim();
if (s1.Length == 0) return String.Empty;
StringBuilder tmpS = new StringBuilder(s1.Length);
//striping out the "control characters"
//Control characters are non-printing and formatting characters, such as ACK, BEL, CR, FF, LF, and VT. The Unicode standard assigns the following code points to control //characters: from \U0000 to \U001F, \U007F, and from \U0080 to \U009F. According to the Unicode standard, these values are to be interpreted as control characters unless //their use is otherwise defined by an application. Valid control characters are members of the UnicodeCategory.Control category.
//Replacing for Enter, newline 0D0A or \x000D (13) or \x000A (10)
//Replacing other non printable characters from Hex 00 to Hex'1F' and Hex'7F' subset of control characters
//Replacing the pipe symbol \x007C (124)
//Repalcing the NULs 00
for (int i = 0; i <= s1.Length; i++)
{
if (!Char.IsControl(s1[i]) & !s1[i] =='\x000D' & !s1[i] =='\x000A' & !s1[i] =='\x007C')
{
tmpS.Append(s1[i]);
}
else
{
tmpS.Append(' ');
}
}
string result = tmpS.ToString().Trim();

return result;
}
}


Post #1440774
Posted Wednesday, April 10, 2013 7:58 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 3:44 PM
Points: 11,970, Visits: 10,995
It looks you have built both versions. Have you tested them? I would recommend nesting all of your replaces instead of doing them one at a time over and over. You can convert the entire function to a single statement. This will lend itself to becoming an iTVF pretty decently. It will take some effort to nest all those replaces but it will end up being faster. The CLR version may be faster. You will have to test that yourself in your system against your data.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1440820
Posted Tuesday, April 16, 2013 4:33 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, May 03, 2013 9:42 AM
Points: 4, Visits: 11
I have written another UDF to use while loop shown as below:
create function dbo.[ReplaceSplVarcharsBICD_test3] (@s varchar(8000)) returns varchar(8000)
with schemabinding
begin
if @s is null
return null
declare @s2 varchar(8000)
set @s2 = ''
declare @l int
set @l = len(@s)
declare @p int
set @p = 1
while @p <= @l begin
declare @c int
set @c = ascii(substring(@s, @p, 1))
--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)
if @c between 32 and 123 or @c between 125 and 126
set @s2 = @s2 + char(@c)
--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)
if @c between 0 and 31 or @c = 124 or @c= 127
set @s2 = @s2 + char(32)
set @p = @p + 1
end
if len(@s2) = 0
return null
return ltrim(rtrim(@s2))
end
go

Testing results:

1. If we don't call any udf, it took 30 minutes to dump all the data into a file
2. If we use the first udf, it took 2.5 hours to dump all the data into a file
3. If we use the second udf, it took 35 minutes to dump all the data into a file.

Can you let me know if I rewrite the UDF into inline table level UDF, how to use the cross apply

My current sql is shown as below:
SELECT 'D' AS RECORDTYPECODE
, a.[Id]
,a.[FundsMovementItemTypeId]
,a.[FundsMovementSetId]
,a.[FeeTypeId]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[GLAccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[LodgementReference])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountNumber])
,a.[IsSundryParties]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceBSB])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountNumber])
,a.[IsSundryPartiesTraceACC]
,a.[IsAttractGST]
,a.[IsCustomerAccount]
,a.[Amount]
,a.[OnDateTime]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountBankCode])
,a.[U1]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[U2])
,a.[U3]
,a.[U5]
,a.[genID]
FROM %DBNAME%.FundsMovementItem a


Post #1442674
Posted Tuesday, April 16, 2013 7:30 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 3:44 PM
Points: 11,970, Visits: 10,995
liujchen 26197 (4/16/2013)
I have written another UDF to use while loop shown as below:
create function dbo.[ReplaceSplVarcharsBICD_test3] (@s varchar(8000)) returns varchar(8000)
with schemabinding
begin
if @s is null
return null
declare @s2 varchar(8000)
set @s2 = ''
declare @l int
set @l = len(@s)
declare @p int
set @p = 1
while @p <= @l begin
declare @c int
set @c = ascii(substring(@s, @p, 1))
--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)
if @c between 32 and 123 or @c between 125 and 126
set @s2 = @s2 + char(@c)
--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)
if @c between 0 and 31 or @c = 124 or @c= 127
set @s2 = @s2 + char(32)
set @p = @p + 1
end
if len(@s2) = 0
return null
return ltrim(rtrim(@s2))
end
go

Testing results:

1. If we don't call any udf, it took 30 minutes to dump all the data into a file
2. If we use the first udf, it took 2.5 hours to dump all the data into a file
3. If we use the second udf, it took 35 minutes to dump all the data into a file.

Can you let me know if I rewrite the UDF into inline table level UDF, how to use the cross apply

My current sql is shown as below:
SELECT 'D' AS RECORDTYPECODE
, a.[Id]
,a.[FundsMovementItemTypeId]
,a.[FundsMovementSetId]
,a.[FeeTypeId]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[GLAccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[LodgementReference])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountNumber])
,a.[IsSundryParties]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceBSB])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountNumber])
,a.[IsSundryPartiesTraceACC]
,a.[IsAttractGST]
,a.[IsCustomerAccount]
,a.[Amount]
,a.[OnDateTime]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountBankCode])
,a.[U1]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[U2])
,a.[U3]
,a.[U5]
,a.[genID]
FROM %DBNAME%.FundsMovementItem a





I would RUN. don't walk, away from that looping version as fast as you can go. If you move away fast it will be the ONLY time that function will see anything go fast. The nested replace option in your original post is going to be your best choice here. It will be a little tedious to code it but if you make it an iTVF it will be super fast. To gain the benefit of an iTVF you need to keep it to one statement. That is why I suggest using nested replaces.


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1442734
Posted Tuesday, April 16, 2013 8:00 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, May 03, 2013 9:42 AM
Points: 4, Visits: 11
Thanks a lot for your reply!

Can you please guide me how to call a iTVF function
in the Select SQL I posted? Same syntax or I need
to use Cross Apply?

I don't mind to try the nested replace, I
have created one, but I can't execute the Select
SQL in the original way.

Thanks and Regards,
Jingchen
Post #1442751
Posted Tuesday, April 16, 2013 8:09 AM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Today @ 3:44 PM
Points: 11,970, Visits: 10,995
liujchen 26197 (4/16/2013)
Thanks a lot for your reply!

Can you please guide me how to call a iTVF function
in the Select SQL I posted? Same syntax or I need
to use Cross Apply?

I don't mind to try the nested replace, I
have created one, but I can't execute the Select
SQL in the original way.

Thanks and Regards,
Jingchen


Well an iTVF returns a table. So you can use CROSS APPLY or you can join to it. I would think in your case that a CROSS APPLY is how I would handle it.

You should check out Paul White's article on using apply. A great two article series that will help understand how it works.

http://www.sqlservercentral.com/articles/APPLY/69953/

http://www.sqlservercentral.com/articles/APPLY/69954/


_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Moden's splitter.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs
Understanding and Using APPLY (Part 1)
Understanding and Using APPLY (Part 2)
Post #1442761
Posted Tuesday, April 16, 2013 8:46 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Today @ 9:33 AM
Points: 6,754, Visits: 12,854
Here's an iTVF which might do what you want. You will need to check that the xml concatenation can deal with the characters you wish to include:

ALTER FUNCTION [dbo].[ifn_RemoveUnwantedChars] 
(@strTarget VARCHAR(1000))
RETURNS TABLE
AS
RETURN
(
WITH E1(N) AS ( -- 10 rows
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
iTally(n) AS (
SELECT TOP (DATALENGTH(@strTarget))
n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM E1 a, E1 b, E1 c) -- 1000 rows

SELECT
OutputString = (
SELECT '' + Eeuatkerrikta
FROM iTally t
CROSS APPLY (SELECT kerrikta = SUBSTRING(@strTarget,n,1)) d
CROSS APPLY (SELECT Eeuatkerrikta = CASE
WHEN ASCII(kerrikta) BETWEEN 0 AND 31 OR ASCII(kerrikta) IN (124,127) THEN ' '
ELSE kerrikta END) e
ORDER BY t.n
FOR XML PATH(''), TYPE).value('.', 'varchar(1000)'))



“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Exploring Recursive CTEs by Example Dwain Camps
Post #1442785
Posted Wednesday, April 17, 2013 4:08 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, May 03, 2013 9:42 AM
Points: 4, Visits: 11
I have read the cross apply articles

I made the following code to see if I understand it correctly.

SELECT 'D' AS RECORDTYPECODE
, a.[Id]
,a.[FundsMovementItemTypeId]
,a.[FundsMovementSetId]
,a.[FeeTypeId]
,AccountName = iTVF.result
,GLAccountName = iTVF1.result
,LodgementReference = iTVF2.result
,AccountNumber = iTVF3.result
,a.[IsSundryParties]
, TraceBSB = iTVF4.result
,TraceAccountName = iTVF5.result
,TraceAccountNumber = iTVF6.result
,a.[IsSundryPartiesTraceACC]
,a.[IsAttractGST]
,a.[IsCustomerAccount]
,a.[Amount]
,a.[OnDateTime]
,AccountBankCode = iTVF7.result
,a.[U1]
,U2 = iTVF8.result
,a.[U3]
,a.[U5]
,a.[genID]
FROM %DBNAME%.FundsMovementItem a
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountName) iTVF
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.GLAccountName) iTVF1
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.LodgementReference) iTVF2
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountNumber) iTVF3
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceBSB) iTVF4
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceAccountName) iTVF5
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceAccountNumber) iTVF6
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountBankCode) iTVF7
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.U2) iTVF8

Post #1443174
« Prev Topic | Next Topic »

Add to briefcase

Permissions Expand / Collapse