RE: Table Valued functions Vs Scalar Valued Functions

SSC Guru

Points: 260824

June 26, 2008 at 12:48 pm

I currently use 2 table-valued functions. With those, it's high-performance, many-reuse code, and neither of them uses any table variables. (Inline select functions) That means they can actually access table/index statistics meaningfully, and can be used for complex, set-based solutions. I've tested, and I get a .1% performance increase when I incorporate their code directly into procs instead of calling the function, but that loss in speed in worth it for the ease of code re-use in these two cases.

(One of them resolves complex hierarchies, the other splits delimited strings.)

In almost all other cases, incorporating the code into a proc directly has been the better option.

I don't currently use any multi-select table-value functions. Used to use them a lot, but that was because my Access front-end couldn't deal with procs that did the same thing, but could deal with functions that did. (I never did get a good answer from MS on why that was built that way.) Now, I avoid them as much as possible. They're generally performance killers, and can almost always be replaced with code in the proc and a good temp table.

I have a few scalar functions that get used on very small recordsets where the complex calculations can't readily be done in a set-based fashion. All three of these have been tested against multi-step set-based solutions, and found to be better performers, on the size of recordset they get used on.

Otherwise, I've found that tearing apart inline scalars into set-based multi-step updates, is generally MUCH faster and better than inline scalars.

For example, I have to regularly calculate the distances between all records in two data sets. One set might be anywhere from 5-thousand to 3-million rows, all with latitude and longitude, and the other set might be anywhere from 100 rows to 5000 rows. (Size between the two is generally proportionate, with larger first sets being because of larger second sets.)

One option, which was tested, was having an inline scalar that takes the lat and long from list one, and compares it to the lat and long from list two, and gives the distance on a row-by-row basis. It was very convenient and easy to build. And horribly slow.

The other option was taking each line of the calculation, adding a column to a join table between the two sets, and calculating one column at a time on the whole set of the join table. This was ugly code, and nowhere near so simple to maintain. But it was amazingly fast.

The difference between the two solutions was lists that would take 4-6 hours to run, or lists that would take 30-60 seconds to run, on the largest lists, and 1-2 hours on the smaller versus under 10 seconds.

Since as many as five or six of these comparisons need to be done in a day sometimes, the faster solution was necessary. Like I say, ugly, not as easy to read, not as easy to maintain, required adding a whole bunch of columns to an otherwise unnecessary table, and so on. But the speed is worth it.

Test various solutions. Try things that may not be obvious. Test them under load and with realistic record set sizes. What works in one case may not work well in another.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon