Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 12345»»»

How to Make Scalar UDFs Run Faster (SQL Spackle) Expand / Collapse
Author
Message
Posted Tuesday, July 31, 2012 9:57 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 6:45 AM
Points: 35,366, Visits: 31,902
Comments posted to this topic are about the item How to Make Scalar UDFs Run Faster (SQL Spackle)

--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1338291
Posted Wednesday, August 1, 2012 12:27 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, September 23, 2014 9:31 AM
Points: 32, Visits: 187
Great article. I hate it when someone says, "Never...." You've proved that there is always more to the picture than what the eye can see (or the statistics tell us). Thanks.
Post #1338319
Posted Wednesday, August 1, 2012 3:27 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Friday, September 5, 2014 11:36 AM
Points: 1, Visits: 66
Hmm, it sounds like: "Never use Scalar UDF, unless you're doing iterative string manipulation".

In which case, I recommend the CLR :o)

Also, thanks very much for the information regarding "SET STATISTICS ON"! That was extremely illuminating. Does the "Include Client Statistics" feature have an implicit STATISTICS TIME ON?

It's prompted me to have a deeper look into what this button actually does, and similarly the "Show Actual Execution Plan" button.
A bit of a tangent, but I remember a number of queries for which response times exploded when trying to view the Execution Plan.
Post #1338390
Posted Wednesday, August 1, 2012 3:49 AM


SSC Journeyman

SSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC JourneymanSSC Journeyman

Group: General Forum Members
Last Login: Friday, June 27, 2014 4:17 AM
Points: 75, Visits: 443
Jeff Moden,
Wonderful note about set statistics time! Never heard of it.

"Try to beat it using any form of "all in one query" code."
Here you go!
Sample Data (one million rows of strings, 1 to 10 word, 2-16 word length)
use tempdb;
go
if object_id('dbo.TestTable') is not null drop table dbo.TestTable;
go
select top(1000000) t.rnd_txt as s
into dbo.TestTable
from
(select char(convert(int,rand(checksum(newid()))*26+97))) c (rnd_char)
cross apply (select convert(int,rand(checksum(newid()))*15+2)) l(rnd_len)
cross apply (select replicate(c.rnd_char,l.rnd_len)+' ') w(rnd_word)
cross apply (select convert(int,rand(checksum(newid()))*10+1)) wc(rnd_wcount)
cross apply (select replicate(w.rnd_word,wc.rnd_wcount)+' ') t(rnd_txt)
cross join sys.all_columns o1
cross join sys.all_columns o2
;

here is function:
create FUNCTION dbo.InitialCapFaster(@String VARCHAR(8000))
RETURNS table
AS
return
with
c as
(
select
String = STUFF(LOWER(@String),1,1,UPPER(LEFT(@String,1))) COLLATE Latin1_General_Bin,
Position = PATINDEX('%[^A-Za-z''][a-z]%',@String COLLATE Latin1_General_Bin),
Step = 1
union all
select
String = STUFF(c.String,c.Position,2,UPPER(SUBSTRING(c.String,c.Position,2))) COLLATE Latin1_General_Bin,
Position = PATINDEX('%[^A-Za-z''][a-z]%',c.String COLLATE Latin1_General_Bin),
Step = c.Step+1
from c
where c.Position > 0
)
select top(1) String from c order by Step desc
go

Here is test
declare @s varchar(8000);
declare @ds datetime = getdate();
select @s = dbo.InitialCap(s) from testtable;
print 'InitialCap: ' + convert(varchar(10),datediff(ms,@ds,getdate())) + ' ms'
go
declare @s varchar(8000);
declare @ds datetime = getdate();
select @s = f.String from dbo.testtable cross apply dbo.InitialCapFaster(s) f;
print 'InitialCapFaster: ' + convert(varchar(10),datediff(ms,@ds,getdate())) + ' ms'
go
print @@version

Results
InitialCap: 25533 ms
InitialCapFaster: 12856 ms

Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (Intel X86)
Apr 2 2010 15:53:02
Copyright (c) Microsoft Corporation
Enterprise Edition on Windows NT 5.1 <X86> (Build 2600: Service Pack 3)

Almost 50% faster. I ran several tests, playing with test data.
Depending on row count, words count and words length I got 30%-60% percents faster.
But if there are not so many rows, 10 000 for example, than InitialCap wins, about 250 ms vs 600 ms in my experiments.
And 35 000 rows id the threshold where both functions are almost equal InitialCap: 893 ms vs InitialCapFaster: 860 ms
Try it on your data.
All the tests were made on my local machine with 2008R2 sql.



I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes.
Blog: http://somewheresomehow.ru
Twitter: @SomewereSomehow
Post #1338398
Posted Wednesday, August 1, 2012 4:28 AM
Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Tuesday, October 7, 2014 8:38 AM
Points: 1,070, Visits: 908
Interesting, well done.

Very concerned about that stats time issue.

Have you tried looking at these with a server side trace? I'd be very curious to know whether that can have a similar problem.

Equally - are you sure it hadn't just cached a dodgy query plan for the changed session settings?



Post #1338412
Posted Wednesday, August 1, 2012 4:34 AM
Old Hand

Old HandOld HandOld HandOld HandOld HandOld HandOld HandOld Hand

Group: General Forum Members
Last Login: Friday, October 17, 2014 3:06 AM
Points: 330, Visits: 2,249
Nice find about the time measurement method so strongly affecting the results. I honestly never took much notice to this, even while aware of better ways of measuring IO statistics then "set statistics io on" and knowing that when adding query plan output, things really slowed down.

As for the concept of iSF...i think it does not exist and that the highlighted line in books online is just wrong.

Here is why:

According to the definition provided by books online, the type TABLE without definition of its contents, is what is defined as the return type in our fucntion. This is just like with all iTVFs and on odds with being scalar. Exposing all of the inner workings by using a single statement makes it thus an inline table valued function (iTVF) and I think so far you will agree with me.

What I do not see as anything special is the "presence" of just only one column in the returned TABLE. I dont think this makes anything about it scalar! It is still a table type and that means non-scalar by definition.

Things to consider:

1. You have to use the function not as a scalar function, but as a table function.

2. In the case we have two or more columns in the return table of a iTVF and just one is used by the invoking query, the performance should be proved to be different from one with just one column in it. Without such a difference (and i expect none) there is no distinction between the two!

3. iTVF have as alternate name "parameterised views" for a good reason. They both have their inner logic exposed as a single statement that can be merged with the SQL code that uses them. The difference being that iTVF can accept parameters, and views cannot. Being essentially views, columns that are not used are simply optimized away from the resulting execution plan.

Considering this, what then makes an iSF, what really sets it apart and makes it scalar or being processed as such?
Post #1338414
Posted Wednesday, August 1, 2012 6:04 AM
SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Monday, October 6, 2014 11:09 AM
Points: 847, Visits: 475
While every testing method negatively affects the performance of the thing being tested by making the tester part of the experiment, some are obviously worse than others. I'd be interested to hear from some of the SQL Internals authors if they have some insight into how SET STATISTICS TIME ON works and how its use may or may not affect other situations where it is used. Paul? Kalen? Are you thinking about this yet?

------------
Buy the ticket, take the ride. -- Hunter S. Thompson
Post #1338449
Posted Wednesday, August 1, 2012 6:59 AM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Wednesday, January 23, 2013 9:53 AM
Points: 6, Visits: 39
Thanks for the detailed and METHODICAL analysis. I once was troubleshooting a performance problem in a large, complicated application and it came down to a "set quoted identifer on" statement in a scalar function. The statement was not needed, was probably generated by a developer who used a right-click-on-object to generate the "create" statement. The function was being called massive number of times so even the slightest performance difference was hugely magnified. I've been wary ever since but know that knowledge is power, so thanks for the article.
Post #1338483
Posted Wednesday, August 1, 2012 7:05 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Wednesday, February 13, 2013 7:48 AM
Points: 28, Visits: 193
Not to say that this was not a good article, which it was, it seems we are not covering the killer scalar UDF issue. a scalar function is reasonably fast EXCEPT when you put some type of data access within the function. When this happens it will kill performance. I havent tried your method yet involving that scenario but I will today:) that would be for a calendar function that accesses a calendar table.

When writing a small function that accesses a table to get one value, I find it better to just put the logic of the function directly in the calling select statement and not use a function at all and, even though it is basically the same code, it will run much faster. but then again you would lose the benefits of actually having a function that gives one place to manage the code from.


Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA
Post #1338494
Posted Wednesday, August 1, 2012 7:42 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Friday, October 17, 2014 7:46 AM
Points: 5,364, Visits: 8,948
Nice article Jeff. And you're right... it is a long spackle article - but sometimes you need that to fully explain things.

Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #1338536
« Prev Topic | Next Topic »

Add to briefcase 12345»»»

Permissions Expand / Collapse