|
|
|
SSC-Dedicated
           
Group: General Forum Members
Last Login: Today @ 6:23 AM
Points: 32,903,
Visits: 26,784
|
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Friday, May 17, 2013 11:48 AM
Points: 32,
Visits: 132
|
|
| Great article. I hate it when someone says, "Never...." You've proved that there is always more to the picture than what the eye can see (or the statistics tell us). Thanks.
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Friday, March 08, 2013 10:18 AM
Points: 1,
Visits: 57
|
|
Hmm, it sounds like: "Never use Scalar UDF, unless you're doing iterative string manipulation".
In which case, I recommend the CLR :o)
Also, thanks very much for the information regarding "SET STATISTICS ON"! That was extremely illuminating. Does the "Include Client Statistics" feature have an implicit STATISTICS TIME ON?
It's prompted me to have a deeper look into what this button actually does, and similarly the "Show Actual Execution Plan" button. A bit of a tangent, but I remember a number of queries for which response times exploded when trying to view the Execution Plan.
|
|
|
|
|
SSC Journeyman
      
Group: General Forum Members
Last Login: Tuesday, May 14, 2013 10:52 PM
Points: 75,
Visits: 410
|
|
Jeff Moden, Wonderful note about set statistics time! Never heard of it.
"Try to beat it using any form of "all in one query" code." Here you go! Sample Data (one million rows of strings, 1 to 10 word, 2-16 word length)
use tempdb; go if object_id('dbo.TestTable') is not null drop table dbo.TestTable; go select top(1000000) t.rnd_txt as s into dbo.TestTable from (select char(convert(int,rand(checksum(newid()))*26+97))) c (rnd_char) cross apply (select convert(int,rand(checksum(newid()))*15+2)) l(rnd_len) cross apply (select replicate(c.rnd_char,l.rnd_len)+' ') w(rnd_word) cross apply (select convert(int,rand(checksum(newid()))*10+1)) wc(rnd_wcount) cross apply (select replicate(w.rnd_word,wc.rnd_wcount)+' ') t(rnd_txt) cross join sys.all_columns o1 cross join sys.all_columns o2 ;
here is function:
create FUNCTION dbo.InitialCapFaster(@String VARCHAR(8000)) RETURNS table AS return with c as ( select String = STUFF(LOWER(@String),1,1,UPPER(LEFT(@String,1))) COLLATE Latin1_General_Bin, Position = PATINDEX('%[^A-Za-z''][a-z]%',@String COLLATE Latin1_General_Bin), Step = 1 union all select String = STUFF(c.String,c.Position,2,UPPER(SUBSTRING(c.String,c.Position,2))) COLLATE Latin1_General_Bin, Position = PATINDEX('%[^A-Za-z''][a-z]%',c.String COLLATE Latin1_General_Bin), Step = c.Step+1 from c where c.Position > 0 ) select top(1) String from c order by Step desc go
Here is test
declare @s varchar(8000); declare @ds datetime = getdate(); select @s = dbo.InitialCap(s) from testtable; print 'InitialCap: ' + convert(varchar(10),datediff(ms,@ds,getdate())) + ' ms' go declare @s varchar(8000); declare @ds datetime = getdate(); select @s = f.String from dbo.testtable cross apply dbo.InitialCapFaster(s) f; print 'InitialCapFaster: ' + convert(varchar(10),datediff(ms,@ds,getdate())) + ' ms' go print @@version
Results
InitialCap: 25533 ms InitialCapFaster: 12856 ms
Microsoft SQL Server 2008 R2 (RTM) - 10.50.1600.1 (Intel X86) Apr 2 2010 15:53:02 Copyright (c) Microsoft Corporation Enterprise Edition on Windows NT 5.1 <X86> (Build 2600: Service Pack 3)
Almost 50% faster. I ran several tests, playing with test data. Depending on row count, words count and words length I got 30%-60% percents faster. But if there are not so many rows, 10 000 for example, than InitialCap wins, about 250 ms vs 600 ms in my experiments. And 35 000 rows id the threshold where both functions are almost equal InitialCap: 893 ms vs InitialCapFaster: 860 ms Try it on your data. All the tests were made on my local machine with 2008R2 sql.
I am really sorry for my poor gramma. And I hope that value of my answers will outweigh the harm for your eyes. Blog: http://somewheresomehow.ru Twitter: @SomewereSomehow
|
|
|
|
|
Ten Centuries
      
Group: General Forum Members
Last Login: Today @ 3:06 AM
Points: 1,026,
Visits: 751
|
|
Interesting, well done.
Very concerned about that stats time issue.
Have you tried looking at these with a server side trace? I'd be very curious to know whether that can have a similar problem.
Equally - are you sure it hadn't just cached a dodgy query plan for the changed session settings?
|
|
|
|
|
SSC Veteran
      
Group: General Forum Members
Last Login: Yesterday @ 3:04 AM
Points: 287,
Visits: 1,901
|
|
Nice find about the time measurement method so strongly affecting the results. I honestly never took much notice to this, even while aware of better ways of measuring IO statistics then "set statistics io on" and knowing that when adding query plan output, things really slowed down.
As for the concept of iSF...i think it does not exist and that the highlighted line in books online is just wrong.
Here is why:
According to the definition provided by books online, the type TABLE without definition of its contents, is what is defined as the return type in our fucntion. This is just like with all iTVFs and on odds with being scalar. Exposing all of the inner workings by using a single statement makes it thus an inline table valued function (iTVF) and I think so far you will agree with me.
What I do not see as anything special is the "presence" of just only one column in the returned TABLE. I dont think this makes anything about it scalar! It is still a table type and that means non-scalar by definition.
Things to consider:
1. You have to use the function not as a scalar function, but as a table function.
2. In the case we have two or more columns in the return table of a iTVF and just one is used by the invoking query, the performance should be proved to be different from one with just one column in it. Without such a difference (and i expect none) there is no distinction between the two!
3. iTVF have as alternate name "parameterised views" for a good reason. They both have their inner logic exposed as a single statement that can be merged with the SQL code that uses them. The difference being that iTVF can accept parameters, and views cannot. Being essentially views, columns that are not used are simply optimized away from the resulting execution plan.
Considering this, what then makes an iSF, what really sets it apart and makes it scalar or being processed as such?
|
|
|
|
|
Right there with Babe
      
Group: General Forum Members
Last Login: Friday, May 17, 2013 6:13 AM
Points: 799,
Visits: 422
|
|
While every testing method negatively affects the performance of the thing being tested by making the tester part of the experiment, some are obviously worse than others. I'd be interested to hear from some of the SQL Internals authors if they have some insight into how SET STATISTICS TIME ON works and how its use may or may not affect other situations where it is used. Paul? Kalen? Are you thinking about this yet?
------------ Buy the ticket, take the ride. -- Hunter S. Thompson
|
|
|
|
|
Forum Newbie
      
Group: General Forum Members
Last Login: Wednesday, January 23, 2013 9:53 AM
Points: 6,
Visits: 39
|
|
| Thanks for the detailed and METHODICAL analysis. I once was troubleshooting a performance problem in a large, complicated application and it came down to a "set quoted identifer on" statement in a scalar function. The statement was not needed, was probably generated by a developer who used a right-click-on-object to generate the "create" statement. The function was being called massive number of times so even the slightest performance difference was hugely magnified. I've been wary ever since but know that knowledge is power, so thanks for the article.
|
|
|
|
|
SSC Rookie
      
Group: General Forum Members
Last Login: Wednesday, February 13, 2013 7:48 AM
Points: 28,
Visits: 193
|
|
Not to say that this was not a good article, which it was, it seems we are not covering the killer scalar UDF issue. a scalar function is reasonably fast EXCEPT when you put some type of data access within the function. When this happens it will kill performance. I havent tried your method yet involving that scenario but I will today:) that would be for a calendar function that accesses a calendar table.
When writing a small function that accesses a table to get one value, I find it better to just put the logic of the function directly in the calling select statement and not use a function at all and, even though it is basically the same code, it will run much faster. but then again you would lose the benefits of actually having a function that gives one place to manage the code from.
Michael B Data Architect MCT, MCITP(SQL DBA 2005/2008),MCTS, MCP,MCDBA
|
|
|
|
|
SSCertifiable
       
Group: General Forum Members
Last Login: Yesterday @ 2:02 PM
Points: 6,367,
Visits: 8,228
|
|
|
|
|