Click here to monitor SSC
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


How to Make Scalar UDFs Run Faster (SQL Spackle)


How to Make Scalar UDFs Run Faster (SQL Spackle)

Author
Message
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)SSC Guru (52K reputation)

Group: General Forum Members
Points: 52181 Visits: 40323
Comments posted to this topic are about the item How to Make Scalar UDFs Run Faster (SQL Spackle)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
Although they tell us that they want it real bad, our primary goal is to ensure that we dont actually give it to them that way.
Although change is inevitable, change for the better is not.
Just because you can do something in PowerShell, doesnt mean you should. Wink

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
dbishop
dbishop
Valued Member
Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)Valued Member (59 reputation)

Group: General Forum Members
Points: 59 Visits: 248
Great article. I hate it when someone says, "Never...." You've proved that there is always more to the picture than what the eye can see (or the statistics tell us). Thanks.
milldyce
milldyce
Forum Newbie
Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)Forum Newbie (3 reputation)

Group: General Forum Members
Points: 3 Visits: 66
Hmm, it sounds like: "Never use Scalar UDF, unless you're doing iterative string manipulation".

In which case, I recommend the CLR :o)

Also, thanks very much for the information regarding "SET STATISTICS ON"! That was extremely illuminating. Does the "Include Client Statistics" feature have an implicit STATISTICS TIME ON?

It's prompted me to have a deeper look into what this button actually does, and similarly the "Show Actual Execution Plan" button.
A bit of a tangent, but I remember a number of queries for which response times exploded when trying to view the Execution Plan.
SomewhereSomehow
SomewhereSomehow
SSC-Enthusiastic
SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)SSC-Enthusiastic (108 reputation)

Group: General Forum Members
Points: 108 Visits: 469
Jeff Moden,
Wonderful note about set statistics time! Never heard of it.

"Try to beat it using any form of "all in one query" code."
Here you go!
Sample Data (one million rows of strings, 1 to 10 word, 2-16 word length)

use tempdb;
go
if object_id('dbo.TestTable') is not null drop table dbo.TestTable;
go
select top(1000000) t.rnd_txt as s
into dbo.TestTable
from
(select char(convert(int,rand(checksum(newid()))*26+97))) c (rnd_char)
cross apply (select convert(int,rand(checksum(newid()))*15+2)) l(rnd_len)
cross apply (select replicate(c.rnd_char,l.rnd_len)+' ') w(rnd_word)
cross apply (select convert(int,rand(checksum(newid()))*10+1)) wc(rnd_wcount)
cross apply (select replicate(w.rnd_word,wc.rnd_wcount)+' ') t(rnd_txt)
cross join sys.all_columns o1
cross join sys.all_columns o2
;



here is function:

create FUNCTION dbo.InitialCapFaster(@String VARCHAR(8000))
RETURNS table
AS
return
with
c as
(
select
String = STUFF(LOWER(@String),1,1,UPPER(LEFT(@String,1))) COLLATE Latin1_General_Bin,
Position = PATINDEX('%[^A-Za-z''][a-z]%',@String COLLATE Latin1_General_Bin),
Step = 1
union all
select
String = STUFF(c.String,c.Position,2,UPPER(SUBSTRING(c.String,c.Position,2))) COLLATE Latin1_General_Bin,
Position = PATINDEX('%[^A-Za-z''][a-z]%',c.String COLLATE Latin1_General_Bin),
Step = c.Step+1
from c
where c.Position > 0
)
select top(1) String from c order by Step desc
go



Here is test

declare @s varchar(8000);
declare @ds datetime = getdate();
select @s = dbo.InitialCap(s) from testtable;
print 'InitialCap: ' + convert(varchar(10),datediff(ms,@ds,getdate())) + ' ms'
go
declare @s varchar(8000);
declare @ds datetime = getdate();
select @s = f.String from dbo.testtable cross apply dbo.InitialCapFaster(s) f;
print 'InitialCapFaster: ' + convert(varchar(10),datediff(ms,@ds,getdate())) + ' ms'
go
print @@version



Results

InitialCap: 25533 ms
InitialCapFaster: 12856 ms

Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (Intel X86)
Apr 2 2010 15:53:02
Copyright (c) Microsoft Corporation
Enterprise Edition on Windows NT 5.1 <X86> (Build 2600: Service Pack 3)



Almost 50% faster. I ran several tests, playing with test data.
Depending on row count, words count and words length I got 30%-60% percents faster.
But if there are not so many rows, 10 000 for example, than InitialCap wins, about 250 ms vs 600 ms in my experiments.
And 35 000 rows id the threshold where both functions are almost equal InitialCap: 893 ms vs InitialCapFaster: 860 ms
Try it on your data.
All the tests were made on my local machine with 2008R2 sql.


I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes.
Blog: http://somewheresomehow.ru
Twitter: @SomewereSomehow
RichB
RichB
Ten Centuries
Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)Ten Centuries (1.2K reputation)

Group: General Forum Members
Points: 1173 Visits: 1023
Interesting, well done.

Very concerned about that stats time issue.

Have you tried looking at these with a server side trace? I'd be very curious to know whether that can have a similar problem.

Equally - are you sure it hadn't just cached a dodgy query plan for the changed session settings?



peter-757102
peter-757102
Mr or Mrs. 500
Mr or Mrs. 500 (531 reputation)Mr or Mrs. 500 (531 reputation)Mr or Mrs. 500 (531 reputation)Mr or Mrs. 500 (531 reputation)Mr or Mrs. 500 (531 reputation)Mr or Mrs. 500 (531 reputation)Mr or Mrs. 500 (531 reputation)Mr or Mrs. 500 (531 reputation)

Group: General Forum Members
Points: 531 Visits: 2553
Nice find about the time measurement method so strongly affecting the results. I honestly never took much notice to this, even while aware of better ways of measuring IO statistics then "set statistics io on" and knowing that when adding query plan output, things really slowed down.

As for the concept of iSF...i think it does not exist and that the highlighted line in books online is just wrong.

Here is why:

According to the definition provided by books online, the type TABLE without definition of its contents, is what is defined as the return type in our fucntion. This is just like with all iTVFs and on odds with being scalar. Exposing all of the inner workings by using a single statement makes it thus an inline table valued function (iTVF) and I think so far you will agree with me.

What I do not see as anything special is the "presence" of just only one column in the returned TABLE. I dont think this makes anything about it scalar! It is still a table type and that means non-scalar by definition.

Things to consider:

1. You have to use the function not as a scalar function, but as a table function.

2. In the case we have two or more columns in the return table of a iTVF and just one is used by the invoking query, the performance should be proved to be different from one with just one column in it. Without such a difference (and i expect none) there is no distinction between the two!

3. iTVF have as alternate name "parameterised views" for a good reason. They both have their inner logic exposed as a single statement that can be merged with the SQL code that uses them. The difference being that iTVF can accept parameters, and views cannot. Being essentially views, columns that are not used are simply optimized away from the resulting execution plan.

Considering this, what then makes an iSF, what really sets it apart and makes it scalar or being processed as such?
Bryant McClellan
Bryant McClellan
SSC Eights!
SSC Eights! (907 reputation)SSC Eights! (907 reputation)SSC Eights! (907 reputation)SSC Eights! (907 reputation)SSC Eights! (907 reputation)SSC Eights! (907 reputation)SSC Eights! (907 reputation)SSC Eights! (907 reputation)

Group: General Forum Members
Points: 907 Visits: 538
While every testing method negatively affects the performance of the thing being tested by making the tester part of the experiment, some are obviously worse than others. I'd be interested to hear from some of the SQL Internals authors if they have some insight into how SET STATISTICS TIME ON works and how its use may or may not affect other situations where it is used. Paul? Kalen? Are you thinking about this yet?

------------
Buy the ticket, take the ride. -- Hunter S. Thompson
mmilodragovich
mmilodragovich
Grasshopper
Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)Grasshopper (12 reputation)

Group: General Forum Members
Points: 12 Visits: 39
Thanks for the detailed and METHODICAL analysis. I once was troubleshooting a performance problem in a large, complicated application and it came down to a "set quoted identifer on" statement in a scalar function. The statement was not needed, was probably generated by a developer who used a right-click-on-object to generate the "create" statement. The function was being called massive number of times so even the slightest performance difference was hugely magnified. I've been wary ever since but know that knowledge is power, so thanks for the article.
micber
micber
SSC Rookie
SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)SSC Rookie (32 reputation)

Group: General Forum Members
Points: 32 Visits: 193
Not to say that this was not a good article, which it was, it seems we are not covering the killer scalar UDF issue. a scalar function is reasonably fast EXCEPT when you put some type of data access within the function. When this happens it will kill performance. I havent tried your method yet involving that scenario but I will todaySmile that would be for a calendar function that accesses a calendar table.

When writing a small function that accesses a table to get one value, I find it better to just put the logic of the function directly in the calling select statement and not use a function at all and, even though it is basically the same code, it will run much faster. but then again you would lose the benefits of actually having a function that gives one place to manage the code from.

Michael B
Data Architect
MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA
WayneS
WayneS
SSCertifiable
SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)SSCertifiable (7K reputation)

Group: General Forum Members
Points: 6989 Visits: 10434
Nice article Jeff. And you're right... it is a long spackle article :-D - but sometimes you need that to fully explain things.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings

Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search