To those that don’t know me “Natural Born Killers (The Series)” sounds like a very strange choice for a SQL Server blog title and to be fair it is. Those who do know me, will know that I have been presenting a session this year titled “Natural Born Killers, performance issues to avoid” around the UK. The “Series” element of the title is there because I have decided to serialise the session for the benefit of those who have been unable to see it thus far.
The plan is to release one topic from the presentation each day of SQL Relay 2013. Today is the third day of SQL Relay, this leg is being held in Birmingham. If you have registered for the Birmingham, Norwich or Cardiff events you will be able to see this presentation live, so you might want to read this after the event as a reminder of everything that was covered.
I’ve written a couple of presentations that revolve around film titles, so I’m going to throw another film title at you now; “The good, the bad and the ugly”.
This really does sum up how I feel about Scalar UDF’s, so let’s break down the film title into its three parts so I can explain further.
To be honest it’s a struggle to find something good about these “things”. If I absolutely had to pick something it would be that it promotes the idea of re-usable code and that’s something we were all taught at some point as being a good thing. So there you go, that’s something good about them.
I fear I may have tipped my hand here and already presented a very biased view, but let’s stick with it, it’s a free world and you always have the right to disagree. If you do I’d love to hear your thoughts in the comments section.
In the original recording of this session that I did for Quest Software (under the title Learn expert tips for improving your SQL Server Performance) I covered 10 tips including RBAR vs Set based. RBAR is a term coined by Jeff Moden for when SQL Server iterates through a result set Row By Agonising Row. As it turns out Scalar UDF’s adhere to the RBAR way of doing things, that’s right a Scalar UDF runs once per record in the resultset which is definitely not great for performance.
In my opinion these things are really ugly, but don’t take just my word for it, there are other who think this way too. In fact I have a couple of demo’s lined up for you that prove that query plans and SET STATISTICS IO do too as they miss out key information from queries that use scalar UDF’s. So let’s tale a look…
Taking a peek under the covers
For this demonstration we’re going to need the database that I provided the code for on Tuesday.
What we’re going to do is to create a little Scalar Function. The function itself is immaterial in this example we really just need to create one in order to see how it performs. Before creating the function however we should take a quick look at the query we are going to be using inside it. If you could turn on “Include Actual Execution Plan” on now that would be really helpful. We’re also going to be utilising the IO statistics from turning on SET STATISTICS IO, I recently wrote a post about this feature here which talks about how you can use it to help benchmark performance.
Please note in this script I do flush the plan cache, so do not run this on a production server or you will not be popular!
Use TableDesign_Bad; GO --Flush the procedure cache for your database DECLARE @intDBID INTEGER SET @intDBID = ( SELECT dbid FROM master.dbo.sysdatabases WHERE name = 'TableDesign_Bad' ) DBCC FLUSHPROCINDB (@intDBID) GO SET STATISTICS IO ON; /* Show statistics IO for the following query */ SELECT AVG( CAST(RubbishBoolean AS INT) ) FROM dbo.SmallRubbishDataTypes t1 CROSS JOIN sys.tables st WHERE t1.RubbishID LIKE '%0%'; GO
Let’s take a look at the output from SET STATISTICS IO first, as you can see from the image below there were 4 objects involved in the query. For the purposes of this example let’s ignore the system object and keep a mental note of the 292 logical reads performed against the SmallRubbishDataTypes table:
This is what the actual execution plan looks like for our statement:
As I mentioned before, this is just an example, we’re not going to tune this query I just wanted you to see the shape of the physical plan so you understand what is happening when the function runs and runs and runs and runs…
If you’re playing along at home, here’s the code to run to create the function:
CREATE FUNCTION dbo.fn_RandomFunc (@ID UNIQUEIDENTIFIER) RETURNS INT AS BEGIN DECLARE @RetVar INT SELECT @RetVar = AVG( CAST(RubbishBoolean AS INT) ) FROM dbo.SmallRubbishDataTypes t1 CROSS JOIN sys.tables st WHERE t1.RubbishID LIKE '%0%'; RETURN @RetVar END; GO
Earlier in this post I stated that these Scalar User Defined Functions were “bad” and “ugly”. Time to put my money where my mouth (or keyboard) is and prove it. To do this we need to create a base statement that we’re going to compare the function against and it just so happens that I have one. So the next script will have our base statement and will then be followed by exactly the same statement, but adding in the Scalar UDF. Remember to have SET STATISTICS IO on and show actual execution plans!
SELECT TOP 10 RubbishID, RubbishDate, RubbishBoolean, RandomAttr1, RandomAttr2 FROM SmallRubbishDataTypes t1 WHERE t1.RubbishID LIKE '%0%'; GO SELECT TOP 10 RubbishID, RubbishDate, RubbishBoolean, RandomAttr1, RandomAttr2, dbo.fn_RandomFunc(RubbishID) FROM SmallRubbishDataTypes t1 WHERE t1.RubbishID LIKE '%0%' GO
Let’s take a look to see just how bad this execution plan is. I bet it’s bad, I bet it’s really bad, I bet it couldn’t be more bad than Michael Jackson singing bad.
Eh! Hang on, according to the properties of this Compute Scalar operator it only executed once. Didn’t I just say it executes once for every row in the result set? Weird!
Hmm okay, lets take a look at the output of statistics IO. I’m bound to be completely vindicated there, it’ll have a gazillion I/O’s if I’m right:
Hey, that can’t be right, 2 logical I/O’s!!! But wasn’t our original query 292 logical reads?
Something’s not right, two sneaky pieces of evidence have been presented by the defence that these Scalar UDF’s aren’t as “bad” or as “ugly” as I have said. I’m in real trouble here, I need to get myself a lawyer and seeing as I’m part way through watching series 3 of Breaking Bad there’s only one thing I can do. I’d better call Saul!
Under the advisement of my lawyer I would like to bring to the stand SQL Server Profiler:
It’s worth noting that profiler is something of an overhead to put it mildly and to catch this criminal Scalar UDF in the act we’re going to have to make a slight modification and also capture the SP:StmtCompleted event. So with this trace running let’s take a look at what is shown when we run our two queries again.
Now that’s more like it! Let’s interrogate the new witness.
Saul: Are you aware of the accused?
Profiler: Yes, we have had the misfortune of meeting.
Saul: On your most recent encounter with the accused how many times did the accused present himself to you to be executed?
Profiler: That would be ten.
Saul: And did any I/O occur on any of these occasions?
Profiler: On each of the ten occasions 355 I/O’s occurred.
Saul: So to be clear Mr Profiler, are you telling me that the accused Scalar User Defined Function did execute not once but ten times and 3,555 I/O’s occurred during these executions?
This new piece of evidence stuns the jury and proves beyond any doubt that the accused is indeed found to be “bad”. In a later ruling SET STATISTICS IO and the query plan were convicted for committing perjury in a statement they said that the UDF’s were just too “ugly” to be shown.