Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase ««12

Table Valued functions Vs Scalar Valued Functions Expand / Collapse
Author
Message
Posted Thursday, June 26, 2008 12:48 PM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
I currently use 2 table-valued functions. With those, it's high-performance, many-reuse code, and neither of them uses any table variables. (Inline select functions) That means they can actually access table/index statistics meaningfully, and can be used for complex, set-based solutions. I've tested, and I get a .1% performance increase when I incorporate their code directly into procs instead of calling the function, but that loss in speed in worth it for the ease of code re-use in these two cases.

(One of them resolves complex hierarchies, the other splits delimited strings.)

In almost all other cases, incorporating the code into a proc directly has been the better option.

I don't currently use any multi-select table-value functions. Used to use them a lot, but that was because my Access front-end couldn't deal with procs that did the same thing, but could deal with functions that did. (I never did get a good answer from MS on why that was built that way.) Now, I avoid them as much as possible. They're generally performance killers, and can almost always be replaced with code in the proc and a good temp table.

I have a few scalar functions that get used on very small recordsets where the complex calculations can't readily be done in a set-based fashion. All three of these have been tested against multi-step set-based solutions, and found to be better performers, on the size of recordset they get used on.

Otherwise, I've found that tearing apart inline scalars into set-based multi-step updates, is generally MUCH faster and better than inline scalars.

For example, I have to regularly calculate the distances between all records in two data sets. One set might be anywhere from 5-thousand to 3-million rows, all with latitude and longitude, and the other set might be anywhere from 100 rows to 5000 rows. (Size between the two is generally proportionate, with larger first sets being because of larger second sets.)

One option, which was tested, was having an inline scalar that takes the lat and long from list one, and compares it to the lat and long from list two, and gives the distance on a row-by-row basis. It was very convenient and easy to build. And horribly slow.

The other option was taking each line of the calculation, adding a column to a join table between the two sets, and calculating one column at a time on the whole set of the join table. This was ugly code, and nowhere near so simple to maintain. But it was amazingly fast.

The difference between the two solutions was lists that would take 4-6 hours to run, or lists that would take 30-60 seconds to run, on the largest lists, and 1-2 hours on the smaller versus under 10 seconds.

Since as many as five or six of these comparisons need to be done in a day sometimes, the faster solution was necessary. Like I say, ugly, not as easy to read, not as easy to maintain, required adding a whole bunch of columns to an otherwise unnecessary table, and so on. But the speed is worth it.

Test various solutions. Try things that may not be obvious. Test them under load and with realistic record set sizes. What works in one case may not work well in another.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #524500
Posted Thursday, June 26, 2008 5:07 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:24 PM
Points: 36,995, Visits: 31,522
The other option was taking each line of the calculation, adding a column to a join table between the two sets, and calculating one column at a time on the whole set of the join table. This was ugly code, and nowhere near so simple to maintain. But it was amazingly fast.


How many rows did you end up with in that table, Gus? Was it a full Cartesian product or just a triangular join ((2-1)=ABS(1-2))


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #524637
Posted Friday, June 27, 2008 7:30 AM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Today @ 8:23 AM
Points: 4,352, Visits: 6,168
At each of my last 3 SQL 2005 performance tuning/analysis client gigs I have addressed tremendous performance issues by removing the use of TVFs (and also table variables). It is amazing how inefficient these features can make queries and how frequently they can lead the optimizer to implement an inefficient query! Nested loops on M+ row tables is soooo ugly.

Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #524981
Posted Friday, June 27, 2008 8:09 AM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:24 PM
Points: 36,995, Visits: 31,522
TheSQLGuru (6/27/2008)
At each of my last 3 SQL 2005 performance tuning/analysis client gigs I have addressed tremendous performance issues by removing the use of TVFs (and also table variables). It is amazing how inefficient these features can make queries and how frequently they can lead the optimizer to implement an inefficient query! Nested loops on M+ row tables is soooo ugly.


I absolutely agree, Kevin... not so hidden RBAR on steriods!


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #525021
Posted Friday, June 27, 2008 8:11 AM
Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Today @ 9:34 AM
Points: 74, Visits: 517
Would you guys share some real examples where you stop using functions in order to increase performance? I mean, could you share the function code and the SQL statement code where you used the proc?

I ask this because I also had bad performance using functions, and so I stopped using them. But after more study, I realized I didn't know how to use them. I also know a couple of good and experienced DBAs that were doing my same mistake. After learning the right way to use functions I saw no drop in performance. I'm not saying that the performanced drop just a little bit and it was aceptable. I'm saying that I experienced no drop in performance at all.

Thanks a lot,
Luiz
Post #525023
Posted Friday, June 27, 2008 8:23 AM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Saturday, August 30, 2014 5:24 PM
Points: 7,139, Visits: 15,191
Have any of you had a chance to try out some of the stuff Adam Machanic did with Inline TVF's? Just curious, becuase the perf differences are fairly remarkable.

I still don't think it gets past all of the circumstances, and I also suspect it's a preciously narrow window where this would be as good or better, but still, interesting read nonetheless....

Here's the blog post I had in mind.

http://sqlblog.com/blogs/adam_machanic/archive/2006/08/04/scalar-functions-inlining-and-performance-an-entertaining-title-for-a-boring-post.aspx

I adapted this to a 10M row test, and the inline BEAT the "straight" group by scenario.


----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
Post #525033
Posted Friday, June 27, 2008 9:05 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
Jeff Moden (6/26/2008)
The other option was taking each line of the calculation, adding a column to a join table between the two sets, and calculating one column at a time on the whole set of the join table. This was ugly code, and nowhere near so simple to maintain. But it was amazingly fast.


How many rows did you end up with in that table, Gus? Was it a full Cartesian product or just a triangular join ((2-1)=ABS(1-2))


Limited cartesian join, depending on business rules. For example, the 5-million to 1,500 join had about 300-million rows. (To be more precise, it was an inner theta join on a range of data, not a full cross join, but definitely many-to-many within pre-specified limits.)

Then I broke up each part of a spherical distance calculation into a column. For example, the first thing you calculate is SIN(Latitude1/57.29577951), so I had a SinLat1 column, and had it run that as one calculation; next is SIN(Latitude2/57.29577951), same treatment. Instead of trying to run the whole calculation at once (which I did test), I had it run each piece on each row, and store the data in the column. Doing it as calculated columns ended up being slower in this case, as did trying to run the whole thing at once. Main problem was that 300-million floating point operations takes a LOT of RAM, and there are 12 such operations per distance calculation. That 3.6-billion calculations, and each takes a minimum of 16 bytes (two float data types), plus whatever overhead is used for the actual calculations. Meant it had to push the larger sets onto the hard drive, since I just simply didn't have that much RAM available. (That's a minimum of just about 57 Gig just for this, not including what the OS, SQL Server, etc., need, if I'm calculating it correctly. As opposed to 4.8 Gig per calculation when broken up.)

Breaking it up like this meant each calculation one column at a time darn near pegged the RAM, but didn't end up in the swap file/tempdb.

Another option would have been doing whole calculations atomically, but on a smaller subset of the rows at a time. That means a cursor/loop, and I just plain don't like messing around with those if I can avoid it.

As mentioned, this is a heavy-lifting option that probably doesn't apply in very many cases, but it sure as heck was better than an inline function in this case, which is the point.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #525074
Posted Saturday, June 28, 2008 12:22 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Today @ 4:24 PM
Points: 36,995, Visits: 31,522
about 300-million rows


That's kinda what I thought you were going to say.

What I was suggesting is that the distance from Point A to Point B is exactly the same as the distance from Point B to Point A... wouldn't that cut your row count about in half if you did it like half a multiplication table for each "group"?


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #525530
Posted Monday, June 30, 2008 8:03 AM


SSCoach

SSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoachSSCoach

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 15,444, Visits: 9,596
Jeff Moden (6/28/2008)
about 300-million rows


That's kinda what I thought you were going to say.

What I was suggesting is that the distance from Point A to Point B is exactly the same as the distance from Point B to Point A... wouldn't that cut your row count about in half if you did it like half a multiplication table for each "group"?


It's already one-way. The two tables have completely different data in them. No self-join at all. A to B or B to A, it's the same number of calculations in what I'm doing. And I'm already filtering for lat/long outside of the possible distances range, etc. (otherwise, it would be a full cartesian). It's a question of taking a large number of addresses in one list, and finding the four closest addresses in a different list for each one in the first list. There's no overlap in the two lists.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #525951
« Prev Topic | Next Topic »

Add to briefcase ««12

Permissions Expand / Collapse