I've actually built very performant scalar UDFs. The problem with them is when you start using them in join criteria or Where clauses, and kill any possibility of index use.
Of course, if you build them complex enough (say, for example, with multiple table variables), you can make anything suck.
But if you build a scalar UDF that's basically a select statement, and use it for a calculated column in a select statement, it can work quite nicely. Cross Apply is better, but that isn't available in Compat 80 or on SQL 2k.
How did you determine that your scalar UDFs were "very performant"? Have you run profiler at the batch/sp complete level to see that the actual IOs are way higher than set statistics IO on returns? Or that CPU usage is actually significantly higher than SSMS actual query plan reports? Have you run profiler with statement complete on to see the data now being processed row-by-row under the covers with absolutely no notification to SSMS of that fact? Have you compared a true set-based (not triangular join, etc) solution to actual UDF performance?
Yes. I routinely keep traces running on dev databases that track all T-SQL batches and commands, specifically for performance comparison. In many cases, I keep them running on production databases and review them routinely to track performance issues. I don't do this in Profiler, I use server-side tracing without a GUI to slow it all down, saving the data to text files, to minimize the impact on the server, but it's the same concept as what you asked.
I'm not sure where you get the idea that the UDFs I'm talking about involve triangular joins or any other form of "faux set-based" processing.
Very specifically, I've used scalar UDFs in situations where complex calculations were needed on each of the returned rows of a set of data. Various solutions were tested, and in a few cases, scalar UDFs were either the best performer, or were only outperformed by code that was so complex it would be nearly impossible to document and maintain.
In each case, it was in the Select clause, not in any of the Join/Where math.
I had to take lists of locations where events were being held, and large mailing lists of invitees, and assign each invitee to the five closest locations, sometimes with more business rules regarding crossing county or state lines, sometimes not. This could have been done with a UDF to calculate the five closest, but that would have been a matter of placing one in the Where clause or Join clause, so instead I created a table that did a semi-cross-join based on maximum distance and simple differences in latitude and longitude, then performed the necessary math on all rows all at the same time. Very set-based, no UDFs, and as fast as the floating point calculations could be performed by the CPU. That was a place to avoid them, so I did.
Then, once we had results back in for the mailings, it was necessary to generate reports on who went to which events and how distant they were. In this case, there was nothing in the Join or Where clause, it was simply necessary to take the lat and long of the attendee's address and the lat and long of the location they attended at, and calculate the distance. While it was possible to include all that math directly in the query, it also (a) was needed in a dozen or so reports, each with its own query, and (b) made the queries almost unreadable because of the complex math needed. Thus, a UDF that performed this calculation was created, tested, and found to be quite acceptable in terms of performance, reusability, and code maintenance. There, I did use it.
Those illustrate the places to use one, and the places to avoid one.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon