April 10, 2013 at 6:44 am
Dear all,
Can you please comment on this UDF, is there any better way to do it?
For example CLF scalar UDF or inline table UDF?
There is around 50 Million rows in TableABC and it will be around 1GB in size if it is exported as plain text .
we want to use this function in this way:
SELECT 'D' AS RECORDTYPECODE,
a.[Id]
,a.[Col1]
,a.[Col2]
,a.[Col3]
,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col4])
,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col5])
,%DBNAME%.ReplaceSplVarcharsBICD(a.[Col6])
,a.[genID]
FROM %DBNAME%.TableABC a
The function is shown below:
USE [BPSUAT_BICD]
GO
/****** Object: UserDefinedFunction [dbo].[ReplaceSplVarcharsBICD] Script Date: 04/10/2013 16:08:15 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE function [dbo].[ReplaceSplVarcharsBICD](@value varchar(8000))RETURNS varchar(8000)
as
begin
declare @result varchar(8000);
-- This function is used to Replace some Special characters with space for varchar and char datatypes.
-- Replacing for Enter, newline
select @result = REPLACE(REPLACE(REPLACE(@value, CHAR(10) + CHAR(13), ' '),CHAR(10),' '), CHAR(13),' ');
--Replacing other non printable characters from Hex 00 to Hex'1F' and Hex'7F'
select @result = REPLACE(@result, CHAR(0),' ');
select @result = REPLACE(@result, CHAR(1),' ');
select @result = REPLACE(@result, CHAR(2),' ');
select @result = REPLACE(@result, CHAR(3),' ');
select @result = REPLACE(@result, CHAR(4),' ');
select @result = REPLACE(@result, CHAR(5),' ');
select @result = REPLACE(@result, CHAR(6),' ');
select @result = REPLACE(@result, CHAR(7),' ');
select @result = REPLACE(@result, CHAR(8),' ');
select @result = REPLACE(@result, CHAR(9),' ');
--select @result = REPLACE(@result, CHAR(10),' ');
select @result = REPLACE(@result, CHAR(11),' ');
select @result = REPLACE(@result, CHAR(12),' ');
--select @result = REPLACE(@result, CHAR(13),' ');
select @result = REPLACE(@result, CHAR(14),' ');
select @result = REPLACE(@result, CHAR(15),' ');
select @result = REPLACE(@result, CHAR(16),' ');
select @result = REPLACE(@result, CHAR(17),' ');
select @result = REPLACE(@result, CHAR(18),' ');
select @result = REPLACE(@result, CHAR(19),' ');
select @result = REPLACE(@result, CHAR(20),' ');
select @result = REPLACE(@result, CHAR(21),' ');
select @result = REPLACE(@result, CHAR(22),' ');
select @result = REPLACE(@result, CHAR(23),' ');
select @result = REPLACE(@result, CHAR(24),' ');
select @result = REPLACE(@result, CHAR(25),' ');
select @result = REPLACE(@result, CHAR(26),' ');
select @result = REPLACE(@result, CHAR(27),' ');
select @result = REPLACE(@result, CHAR(28),' ');
select @result = REPLACE(@result, CHAR(29),' ');
select @result = REPLACE(@result, CHAR(30),' ');
select @result = REPLACE(@result, CHAR(31),' ');
select @result = REPLACE(@result, CHAR(127),' ');
--Replacing the pipe symbol
select @result = REPLACE(@result, CHAR(124),' ');
--Repalcing the NULs
select @result = Nullif(@result,'');
--Removing the Right Spaces
select @result = RTRIM(@result);
--Removing the Left Spaces
select @result = LTRIM(@result);
return @result;
end
or should we use CLF scalar UDF like this
using System;
using System.Data.Sql;
using System.Data.SqlTypes;
using System.Collections.Generic;
using Microsoft.SqlServer.Server;
using System.Text;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString ReplaceSplVarcharsBICD(SqlString s){
if (s.IsNull) return String.Empty;
//Removing the Right Spaces and Left Spaces
string s1 = s.ToString().Trim();
if (s1.Length == 0) return String.Empty;
StringBuilder tmpS = new StringBuilder(s1.Length);
//striping out the "control characters"
//Control characters are non-printing and formatting characters, such as ACK, BEL, CR, FF, LF, and VT. The Unicode standard assigns the following code points to control //characters: from \U0000 to \U001F, \U007F, and from \U0080 to \U009F. According to the Unicode standard, these values are to be interpreted as control characters unless //their use is otherwise defined by an application. Valid control characters are members of the UnicodeCategory.Control category.
//Replacing for Enter, newline 0D0A or \x000D (13) or \x000A (10)
//Replacing other non printable characters from Hex 00 to Hex'1F' and Hex'7F' subset of control characters
//Replacing the pipe symbol \x007C (124)
//Repalcing the NULs 00
for (int i = 0; i <= s1.Length; i++)
{
if (!Char.IsControl(s1) & !s1 =='\x000D' & !s1 =='\x000A' & !s1 =='\x007C')
{
tmpS.Append(s1);
}
else
{
tmpS.Append(' ');
}
}
string result = tmpS.ToString().Trim();
return result;
}
}
April 10, 2013 at 7:58 am
It looks you have built both versions. Have you tested them? I would recommend nesting all of your replaces instead of doing them one at a time over and over. You can convert the entire function to a single statement. This will lend itself to becoming an iTVF pretty decently. It will take some effort to nest all those replaces but it will end up being faster. The CLR version may be faster. You will have to test that yourself in your system against your data. π
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 β Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 16, 2013 at 4:33 am
I have written another UDF to use while loop shown as below:
create function dbo.[ReplaceSplVarcharsBICD_test3] (@s varchar(8000)) returns varchar(8000)
with schemabinding
begin
if @s-2 is null
return null
declare @s2 varchar(8000)
set @s2 = ''
declare @l int
set @l = len(@s)
declare @p int
set @p = 1
while @p <= @l begin
declare @C int
set @C = ascii(substring(@s, @p, 1))
--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)
if @C between 32 and 123 or @C between 125 and 126
set @s2 = @s2 + char(@c)
--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)
if @C between 0 and 31 or @C = 124 or @C= 127
set @s2 = @s2 + char(32)
set @p = @p + 1
end
if len(@s2) = 0
return null
return ltrim(rtrim(@s2))
end
go
Testing results:
1. If we don't call any udf, it took 30 minutes to dump all the data into a file
2. If we use the first udf, it took 2.5 hours to dump all the data into a file
3. If we use the second udf, it took 35 minutes to dump all the data into a file.
Can you let me know if I rewrite the UDF into inline table level UDF, how to use the cross apply
My current sql is shown as below:
SELECT 'D' AS RECORDTYPECODE
, a.[Id]
,a.[FundsMovementItemTypeId]
,a.[FundsMovementSetId]
,a.[FeeTypeId]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[GLAccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[LodgementReference])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountNumber])
,a.[IsSundryParties]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceBSB])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountNumber])
,a.[IsSundryPartiesTraceACC]
,a.[IsAttractGST]
,a.[IsCustomerAccount]
,a.[Amount]
,a.[OnDateTime]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountBankCode])
,a.[U1]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[U2])
,a.[U3]
,a.[U5]
,a.[genID]
FROM %DBNAME%.FundsMovementItem a
April 16, 2013 at 7:30 am
liujchen 26197 (4/16/2013)
I have written another UDF to use while loop shown as below:
create function dbo.[ReplaceSplVarcharsBICD_test3] (@s varchar(8000)) returns varchar(8000)
with schemabinding
begin
if @s-2 is null
return null
declare @s2 varchar(8000)
set @s2 = ''
declare @l int
set @l = len(@s)
declare @p int
set @p = 1
while @p <= @l begin
declare @C int
set @C = ascii(substring(@s, @p, 1))
--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)
if @C between 32 and 123 or @C between 125 and 126
set @s2 = @s2 + char(@c)
--any characters less than 32(dec,Space), or 124 (PIPE) or 127 (DEL) need to be replace with 32 (dec,Space)
if @C between 0 and 31 or @C = 124 or @C= 127
set @s2 = @s2 + char(32)
set @p = @p + 1
end
if len(@s2) = 0
return null
return ltrim(rtrim(@s2))
end
go
Testing results:
1. If we don't call any udf, it took 30 minutes to dump all the data into a file
2. If we use the first udf, it took 2.5 hours to dump all the data into a file
3. If we use the second udf, it took 35 minutes to dump all the data into a file.
Can you let me know if I rewrite the UDF into inline table level UDF, how to use the cross apply
My current sql is shown as below:
SELECT 'D' AS RECORDTYPECODE
, a.[Id]
,a.[FundsMovementItemTypeId]
,a.[FundsMovementSetId]
,a.[FeeTypeId]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[GLAccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[LodgementReference])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountNumber])
,a.[IsSundryParties]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceBSB])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountName])
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[TraceAccountNumber])
,a.[IsSundryPartiesTraceACC]
,a.[IsAttractGST]
,a.[IsCustomerAccount]
,a.[Amount]
,a.[OnDateTime]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[AccountBankCode])
,a.[U1]
,%DBNAME%.ReplaceSplVarcharsBICD_test3(a.[U2])
,a.[U3]
,a.[U5]
,a.[genID]
FROM %DBNAME%.FundsMovementItem a
I would RUN. don't walk, away from that looping version as fast as you can go. If you move away fast it will be the ONLY time that function will see anything go fast. The nested replace option in your original post is going to be your best choice here. It will be a little tedious to code it but if you make it an iTVF it will be super fast. To gain the benefit of an iTVF you need to keep it to one statement. That is why I suggest using nested replaces.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 β Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 16, 2013 at 8:00 am
Thanks a lot for your reply!
Can you please guide me how to call a iTVF function
in the Select SQL I posted? Same syntax or I need
to use Cross Apply?
I don't mind to try the nested replace, I
have created one, but I can't execute the Select
SQL in the original way.
Thanks and Regards,
Jingchen
April 16, 2013 at 8:09 am
liujchen 26197 (4/16/2013)
Thanks a lot for your reply!Can you please guide me how to call a iTVF function
in the Select SQL I posted? Same syntax or I need
to use Cross Apply?
I don't mind to try the nested replace, I
have created one, but I can't execute the Select
SQL in the original way.
Thanks and Regards,
Jingchen
Well an iTVF returns a table. So you can use CROSS APPLY or you can join to it. I would think in your case that a CROSS APPLY is how I would handle it.
You should check out Paul White's article on using apply. A great two article series that will help understand how it works.
http://www.sqlservercentral.com/articles/APPLY/69953/[/url]
http://www.sqlservercentral.com/articles/APPLY/69954/[/url]
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 β Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 16, 2013 at 8:46 am
Here's an iTVF which might do what you want. You will need to check that the xml concatenation can deal with the characters you wish to include:
ALTER FUNCTION [dbo].[ifn_RemoveUnwantedChars]
(@strTarget VARCHAR(1000))
RETURNS TABLE
AS
RETURN
(
WITH E1(N) AS ( -- 10 rows
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
),
iTally(n) AS (
SELECT TOP (DATALENGTH(@strTarget))
n = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))
FROM E1 a, E1 b, E1 c) -- 1000 rows
SELECT
OutputString = (
SELECT '' + Eeuatkerrikta
FROM iTally t
CROSS APPLY (SELECT kerrikta = SUBSTRING(@strTarget,n,1)) d
CROSS APPLY (SELECT Eeuatkerrikta = CASE
WHEN ASCII(kerrikta) BETWEEN 0 AND 31 OR ASCII(kerrikta) IN (124,127) THEN ' '
ELSE kerrikta END) e
ORDER BY t.n
FOR XML PATH(''), TYPE).value('.', 'varchar(1000)'))
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 17, 2013 at 4:08 am
I have read the cross apply articles
I made the following code to see if I understand it correctly.
SELECT 'D' AS RECORDTYPECODE
, a.[Id]
,a.[FundsMovementItemTypeId]
,a.[FundsMovementSetId]
,a.[FeeTypeId]
,AccountName = iTVF.result
,GLAccountName = iTVF1.result
,LodgementReference = iTVF2.result
,AccountNumber = iTVF3.result
,a.[IsSundryParties]
, TraceBSB = iTVF4.result
,TraceAccountName = iTVF5.result
,TraceAccountNumber = iTVF6.result
,a.[IsSundryPartiesTraceACC]
,a.[IsAttractGST]
,a.[IsCustomerAccount]
,a.[Amount]
,a.[OnDateTime]
,AccountBankCode = iTVF7.result
,a.[U1]
,U2 = iTVF8.result
,a.[U3]
,a.[U5]
,a.[genID]
FROM %DBNAME%.FundsMovementItem a
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountName) iTVF
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.GLAccountName) iTVF1
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.LodgementReference) iTVF2
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountNumber) iTVF3
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceBSB) iTVF4
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceAccountName) iTVF5
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.TraceAccountNumber) iTVF6
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.AccountBankCode) iTVF7
CROSS
APPLY dbo.ReplaceSplVarcharsBICD_test4(a.U2) iTVF8
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply