Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««123»»

Usage of Functions in Stored Procedures Expand / Collapse
Author
Message
Posted Tuesday, April 29, 2008 8:11 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Monday, April 14, 2014 1:34 PM
Points: 15,442, Visits: 9,588
I came up with a different plan from yours for "best solution":

set nocount on
go
CREATE TABLE CaseDetails
(CaseId INT IDENTITY(1,1) primary key,
QTY NUMERIC(14,4),
weightedQty NUMERIC(14,4),
ShipmentId INT,
Total as Qty + WeightedQty)
go
raiserror('First Insert', 10, 1) with nowait
go
insert into dbo.CaseDetails(qty, weightedqty, shipmentid)
select 200, 300, 1
from common.dbo.bignumbers
where number between 1 and 15000
go
raiserror('Second Insert', 10, 1) with nowait
go
insert into dbo.CaseDetails(qty, weightedqty, shipmentid)
select 200, 300, numbers.number
from common.dbo.numbers
cross join common.dbo.numbers n2
where numbers.number between 2 and 10000
go
raiserror('Index', 10, 1) with nowait
go
create index IDX_CaseDetails_TotalQty on dbo.casedetails(shipmentid)
include (qty, weightedqty, total)
go
set statistics io on
set statistics time on

print 'First Method'

select qty, weightedqty, qty + weightedqty
from dbo.casedetails
where shipmentid = 1
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'CaseDetails'. Scan count 1, logical reads 82,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 16 ms, elapsed time = 231 ms.
*/
print 'Second Method'

declare @Total int

select @total = qty + weightedqty
from dbo.casedetails
where shipmentid = 1

select qty, weightedqty, @total
from dbo.casedetails
where shipmentid = 1
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'CaseDetails'. Scan count 1, logical reads 82,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 8 ms.
Table 'CaseDetails'. Scan count 1, logical reads 82,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 230 ms.
*/
print 'Third Method'

select qty, weightedqty, total
from dbo.casedetails
where shipmentid = 1
/*
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'CaseDetails'. Scan count 1, logical reads 82,
physical reads 0, read-ahead reads 0, lob logical reads 0,
lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 226 ms.
*/

As you'll see, I added a calculated column to the test table. Then I inserted 15,000 rows for shipping ID 1, and 100-million rows of data for 10,000 other shipping IDs. (I prefer tests in bigger tables.) Then I indexed the table.

On my computer, populating the tables with over 100-million rows of data took quite a while, but by using a Numbers table solution, it wasn't too bad (under an hour). That might have been overkill, I have to admit. If you decide to test this, I recommend about 10-million rows. Will be easier to deal with and perfectly valid for testing.

I didn't test the inline function, since I already know that will fail horribly.

With the calculated column and the right index, the CPU time on these queries goes WAY down.

The first test was a straight-up calculation, without the calculated column. It used the index, and took 16 milliseconds of CPU time and 1 table scan.

Second test used the "assign the number to a variable" method, as outlined in the article. That used the index, took 31 milliseconds, and took 2 table scans (1 to assign the variable, 1 to select the data).

The final test used the calculated column. That took under 1 millisecond of CPU time (too fast to measure) and 1 table scan.

(All of these "table scans" are just the IO stats. They're actually index seeks in the execution plan.)

Rather than using an inline function for this kind of calculation, I highly recommend non-persisted, indexed, calculated columns. I have used them for many things, and they are very fast, don't slow down insert/update/delete by too much (I guess if the calculation is complex enough and the insert/update is big enough, it might slow it down measurably, but I haven't seen that yet). Obviously, they do have some cost for those operations, and do add to the size of the index, but the cost is quite good compared to the benefit.

This way, you don't have to count on the data being the same in all rows, you cut IO in half, and you get your select done so fast that "statistics time" can't really measure it. And that's in a table with 100-million rows.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #492142
Posted Tuesday, April 29, 2008 8:56 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, September 09, 2013 6:28 AM
Points: 37, Visits: 227
There is a big difference in writing :
>> Unoptimsed SP method for lookup using local var
create procedure mysum (@myid int)
as
declare @myint int
set @myint= @myid
select x,y,z from mytable
where mylookupfilter = @myint

and:
>>Optimised using input variable allows select statement
>> to be properly cached
create procedure mysum (@myid int)
as
select x,y,z from mytable
where mylookupfilter = @myid

If to test the true performance, would need to correctly structure the sp to use the input parameter on the where clause filter, otherwise the query is not cached the same. I suspect you will find performance of the set-based sql procedure will improve even further
My mantras always been : "functions like cursors should be avoided if at all possible".

I have certainly seen a good benefit of removing functions and converting to pure optimsed set based sql solutions.
Post #492204
Posted Tuesday, April 29, 2008 9:05 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Monday, April 14, 2014 1:34 PM
Points: 15,442, Visits: 9,588
d_sysuk (4/29/2008)
There is a big difference in writing :
>> Unoptimsed SP method for lookup using local var
create procedure mysum (@myid int)
as
declare @myint int
set @myint= @myid
select x,y,z from mytable
where mylookupfilter = @myint

and:
>>Optimised using input variable allows select statement
>> to be properly cached
create procedure mysum (@myid int)
as
select x,y,z from mytable
where mylookupfilter = @myid


The query plan on those two will be exactly the same. The first one just adds a fraction of a microsecond of allocating some memory for a variable and assigning a value to it, that the second one doesn't have to do. The actual query is the same.

Maybe I'm misunderstanding what you're writing.

Edit: Just to be sure, I just tested both of these concepts on my 100-million row table. Two procs, exact same execution plan, exact same execution time, IO, etc. First run on each took a few seconds, second and subsequent runs on each took too little time to measure.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #492222
Posted Tuesday, April 29, 2008 9:37 AM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Monday, September 09, 2013 6:28 AM
Points: 37, Visits: 227
There is a more subtle point, Im making in the query plan optimisation that is saved.

A greater man, than I has quite nicely explained this , have a quick look at ( a google search should pull up the resource info):
MSDN I smell a parameter. Tips, Tricks, and Advice from the SQL Server Query Optimization Team
MSDN Ken Henderson plan stability in SQL Server 2000

It goes into a lot more detail than I can in this forum.

The example posted was quite cut-down, within the given example a name lookup is used to get the PK, which is then used to get the result set back. (2 Querys onto the source table - when this should really be a single set-based query utilising the name as in posters article to get the desired result-set, without the need to lookup the ID value then assign to a local var then use the local var value as a lookup )
Post #492279
Posted Tuesday, April 29, 2008 9:51 AM


SSCommitted

SSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommittedSSCommitted

Group: General Forum Members
Last Login: Tuesday, January 15, 2013 11:11 AM
Points: 1,945, Visits: 2,782
>> What I mean is you have a bunch (about a dozen) of tables with charts of tax rates, freight rates, etc. Each row will have different values from each table and then they get computed for the final result. We have functions for various lookups and calculations, some will be repeated, some won't How can you improve this type of process? <<

You might want to get a copy of THINKING IN SETS, which deals with auxiliary tables for such things. Instead of trying to compute business days (which is impossible), you use a Calendar table and join to it. Instead of trying to compute shipping zones rates, look them up based on ZIP codes. Etc.

SQL is good at JOINs and bad at math. Functions cannot be parallelized or optimized; JOINs can.


Books in Celko Series for Morgan-Kaufmann Publishing
Analytics and OLAP in SQL
Data and Databases: Concepts in Practice
Data, Measurements and Standards in SQL
SQL for Smarties
SQL Programming Style
SQL Puzzles and Answers
Thinking in Sets
Trees and Hierarchies in SQL
Post #492297
Posted Tuesday, April 29, 2008 11:13 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Monday, April 14, 2014 1:34 PM
Points: 15,442, Visits: 9,588
d_sysuk (4/29/2008)
There is a more subtle point, Im making in the query plan optimisation that is saved.

A greater man, than I has quite nicely explained this , have a quick look at ( a google search should pull up the resource info):
MSDN I smell a parameter. Tips, Tricks, and Advice from the SQL Server Query Optimization Team
MSDN Ken Henderson plan stability in SQL Server 2000

It goes into a lot more detail than I can in this forum.

The example posted was quite cut-down, within the given example a name lookup is used to get the PK, which is then used to get the result set back. (2 Querys onto the source table - when this should really be a single set-based query utilising the name as in posters article to get the desired result-set, without the need to lookup the ID value then assign to a local var then use the local var value as a lookup )


I think you may have partially misinterpreted the articles you cite. I just read them, and what you outlined in your example isn't actually what's gone over in the articles.

Yes, setting a default can override certain aspects of query optimization. My prefered method for doing this is creating separate procs called by a "meta-proc".

What you have in your example doesn't actually accomplish this. I tested it, and the execution plans are identical, except for the name of the variable in the scalar operator in the seek predicates.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #492349
Posted Tuesday, April 29, 2008 11:23 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Monday, April 14, 2014 1:34 PM
Points: 15,442, Visits: 9,588
Matt Minnis (4/29/2008)
What about matrix type calculations?
How can they be accelerated?
What I mean is you have a bunch (about a dozen) of tables with charts of tax rates,freight rates, etc.
Each row will have different values from each table and then they get computed for the final result.
We have functions for various lookups and calculations, some will be repeated, some won't
How can you improve this type of process?
I can see your technique for the 1 function, but I'm not sure how it will scale in a large stored proc with lots of function calls and computations.


Pre-compute the values as much as possible, put them in tables, and join to them. Using inline scalar functions makes the proc operate row-by-row, which is inherently slow. Joining to pre-computed values is inherently fast.

If the calculation is too complex for that, then move it out of the database into something that's more efficient at it, or move it to a CLR UDF, which will (if built correctly), be better at it than SQL is.

But it's very, very rare that building tables for these things can't get you the right results, and do it fast.

Let's take tax rates for example. Maybe sales tax.

You could build a function that would calculate sales tax based on state, county, city of merchant, and state, county, city of customer, and then use a function to look this up. Or you could build a table of state-to-state rates, including counties and cities, and then just join to that. Do you end up with a larger table to join to? Yep. Does your update code have to do more work when tax rates are changed or tax laws are modified? Yep. But the cost of those things is tiny compared to the fact that querying by joining to that table will be potentially thousands of times faster than an inline UDF. You have to build the table once. You have to update it every few years (in most cases). You might have to query it a few thousand times per hour. Put your speed on the query, in that case.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #492359
Posted Tuesday, April 29, 2008 2:37 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Confucius247 (4/29/2008)
My attitude with functions is avoid like the plague, as they break the rule of thinking in sets while programming SQL.


Table-Valued Functions.
Post #492492
Posted Tuesday, April 29, 2008 10:10 PM


Right there with Babe

Right there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with BabeRight there with Babe

Group: General Forum Members
Last Login: Friday, April 04, 2014 4:40 PM
Points: 751, Visits: 917
There are definite places where functions are of value, but it is important that they be written carefully to avoid replicating work. In one of the examples listed, the function runs a select statement against the table and then is used inside of a select statement against the same table. It is doing vastly more reads than it needs to. On the other hand, there are places like complex manipulation of a single string that will be done the same way regardless of the strings source can usefully be put into a function with minimal performance hits and increased testing possibilities and code reuse.

---
Timothy A Wiseman
SQL Blog: http://timothyawiseman.wordpress.com/
Post #492609
Posted Wednesday, April 30, 2008 3:58 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Friday, March 27, 2009 4:01 AM
Points: 14, Visits: 40
Mike C (4/29/2008)

Table-Valued Functions.


And if that were what we were talking about...
Post #492765
« Prev Topic | Next Topic »

Add to briefcase ««123»»

Permissions Expand / Collapse