The Case for Scalar-valued, User-defined Functions in T-SQL

Question

The Case for Scalar-valued, User-defined Functions in T-SQL

Dwain Camps

SSC Guru

Points: 86908
More actions
May 11, 2015 at 10:36 pm

#297966

Comments posted to this topic are about the item The Case for Scalar-valued, User-defined Functions in T-SQL
My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply

Dwain Camps SSC Guru Points: 86908 More actions · Answer 1

Before anyone gets the chance to call me on this point, I ran across this bit of information after submitting this article, provided by Paul White in his blog:

"Any reference to a table (or view) with a computed column that uses a T-SQL scalar function will result in a serial plan, even if the problematic column is not referenced in the query."

Forcing a Parallel Query Execution Plan

Great information in that blog about the things that can cause SQL to generate serial instead of parallel plans, including the use of a SVF in a computed column.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Nathan Davis-411378 Say Hey Kid Points: 679 More actions · Answer 2

I'm intrigued by the use of the scalar UDF in a check constraint.

However, I'm not convinced that the check constraint will be 100% accurately enforced - due to the effects of various transaction isolation levels.

For example, I've used triggers in the past to check that modifications to a hierarchy would not cause problems - but I've always understood that you are *not* guaranteed 100% that such triggers would enforce the condition (unless you were under the SERIALIZABLE isolation level) - due to such things as phantom reads, non-repeatable reads, etc.

Do check constraints using UDFs have the same potential issues with different transaction isolation levels? (If not, then I assume that a scalar udf in a check constraint could potentially increase locking in the table?)

Dwain Camps SSC Guru Points: 86908 More actions · Answer 3

Nathan Davis-411378 (5/11/2015)
Do check constraints using UDFs have the same potential issues with different transaction isolation levels? (If not, then I assume that a scalar udf in a check constraint could potentially increase locking in the table?)

I'd have to say that is a very interesting question. Unfortunately I can't answer it.

One would expect that a check constraint failing within a transaction, regardless of whether the failure is induced by a function call or a deterministic check, would drive the error behavior in the same direction.

Now could you overload the check constraint by a really heavy function - that answer is probably a resounding yes. In which case, you could probably expect locking and maybe even deadlocking.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86908 More actions · Answer 4

Oh yes. I'm simply reporting here how these things can be done.

Other than the notation I made above from Paul White, I can't report on any of the dangers (other than function overload which can happen anywhere). So anybody with any evidence when these ideas should not be used, please chime in.

Those are stories we'd like to hear about!

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

peter-757102 SSCertifiable Points: 6877 More actions · Answer 5

It might be my memory...

I distinctly remember that in the past any function used in a constraint needed to be deterministic.

Querying a table violates that rule, obviously.

When did this change..or am I simply mistaken?

Alan Burstein SSC Guru Points: 61136 More actions · Answer 6

peter-757102 (5/12/2015)
It might be my memory...
I distinctly remember that in the past any function used in a constraint needed to be deterministic.
Querying a table violates that rule, obviously.
When did this change..or am I simply mistaken?

I don't believe that is true. The sUDF used for a constraint in Dwain's article is not Schema-Bound and therefore is not deterministic. According to Microsoft's article on Deterministic and Nondeterministic Functions:

" User-defined functions are deterministic if: The function is schema-bound. ..."

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

Sporidium SSCommitted Points: 1550 More actions · Answer 7

I've hit a few problems in the past using scalar UDFs in CHECK constraints.

The main one is that the constraint didn't always fire.

If the column(s) passed into the UDF as parameter(s) hadn't changed, it seemed like SQL Server decided it was unnecessary to reapply the constraint.

That may not be precisely what was happening, but changing the UDF call to include the columns that had changed (and whose values were queried separately in the UDF) fixed the problem for me.

For example, changing the call from Func(Id) to Func(Id + Status - Status).

This was in SQL Server 2008.

Alan Burstein SSC Guru Points: 61136 More actions · Answer 8

I love this article Dwain. Very good work as always. 5 stars from me. I particularly enjoy how your hierarchies example - very clever.

I do want to add - I have played around with Paul White's make_parallel function and I have never seen it improve the performance of a sUDF. That's just my experience though. I have seen it improve the performance of an iTVF.

It's also worth noting that, to index a computed column that uses a sUDF the column must be persisted and the sUDF must be deterministic. I know that you know this but I wanted to include that for others who might not (I learned this so recently it feels like yesterday).

"I cant stress enough the importance of switching from a sequential files mindset to set-based thinking. After you make the switch, you can spend your time tuning and optimizing your queries instead of maintaining lengthy, poor-performing code."

-- Itzik Ben-Gan 2001

akljfhnlaflkj SSC Guru Points: 76202 More actions · Answer 9

akljfhnlaflkj

SSC Guru

Points: 76202

May 12, 2015 at 8:51 am

#1797301

Thanks for the lesson.

Dwain Camps SSC Guru Points: 86908 More actions · Answer 10

Gary Harding (5/12/2015)
I've hit a few problems in the past using scalar UDFs in CHECK constraints.
The main one is that the constraint didn't always fire.
If the column(s) passed into the UDF as parameter(s) hadn't changed, it seemed like SQL Server decided it was unnecessary to reapply the constraint.
That may not be precisely what was happening, but changing the UDF call to include the columns that had changed (and whose values were queried separately in the UDF) fixed the problem for me.
For example, changing the call from Func(Id) to Func(Id + Status - Status).
This was in SQL Server 2008.

I'm a bit confused about this, unless you mean that "if the column(s) passed into the UDF as parameter(s) hadn't changed" was part of an UPDATE statement where you used DEFAULT.

Perhaps if you had an example to share, someone could figure out what is causing this.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Dwain Camps SSC Guru Points: 86908 More actions · Answer 11

Alan.B (5/12/2015)
I love this article Dwain. Very good work as always. 5 stars from me. I particularly enjoy how your hierarchies example - very clever.
I do want to add - I have played around with Paul White's make_parallel function and I have never seen it improve the performance of a sUDF. That's just my experience though. I have seen it improve the performance of an iTVF.
It's also worth noting that, to index a computed column that uses a sUDF the column must be persisted and the sUDF must be deterministic. I know that you know this but I wanted to include that for others who might not (I learned this so recently it feels like yesterday).

Thanks Alan. As to your last paragraph, yes I did and I was recently involved in an interesting give and take related to it in the discussion thread of this article:

Self-maintaining, Contiguous Effective Dates in Temporal Tables[/url]

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Sporidium SSCommitted Points: 1550 More actions · Answer 12

dwain.c (5/12/2015)
Gary Harding (5/12/2015)
I've hit a few problems in the past using scalar UDFs in CHECK constraints.
The main one is that the constraint didn't always fire.
If the column(s) passed into the UDF as parameter(s) hadn't changed, it seemed like SQL Server decided it was unnecessary to reapply the constraint.
That may not be precisely what was happening, but changing the UDF call to include the columns that had changed (and whose values were queried separately in the UDF) fixed the problem for me.
For example, changing the call from Func(Id) to Func(Id + Status - Status).
This was in SQL Server 2008.
I'm a bit confused about this, unless you mean that "if the column(s) passed into the UDF as parameter(s) hadn't changed" was part of an UPDATE statement where you used DEFAULT.
Perhaps if you had an example to share, someone could figure out what is causing this.

I don't have the constraints to hand at the moment, but I'll try and post something early next week.

Sporidium SSCommitted Points: 1550 More actions · Answer 13

Gary Harding (5/13/2015)
dwain.c (5/12/2015)
Gary Harding (5/12/2015)
I've hit a few problems in the past using scalar UDFs in CHECK constraints.
The main one is that the constraint didn't always fire.
If the column(s) passed into the UDF as parameter(s) hadn't changed, it seemed like SQL Server decided it was unnecessary to reapply the constraint.
That may not be precisely what was happening, but changing the UDF call to include the columns that had changed (and whose values were queried separately in the UDF) fixed the problem for me.
For example, changing the call from Func(Id) to Func(Id + Status - Status).
This was in SQL Server 2008.
I'm a bit confused about this, unless you mean that "if the column(s) passed into the UDF as parameter(s) hadn't changed" was part of an UPDATE statement where you used DEFAULT.
Perhaps if you had an example to share, someone could figure out what is causing this.
I don't have the constraints to hand at the moment, but I'll try and post something early next week.

OK, I've tracked down the CHECK constraints.

The reason they needed to use a UDF was because the checks involve querying another table.

Basically, I need to maintain consistency between a column in the first table, T1 (the one on which the constraint is defined) and a similar column in a related second table, T2 (for which a FK is stored in T1).

To keep the UDF interface clean, I just pass the PK of T1 into the function, which then does the T1-T2 JOIN and compares the two column values from T1 and T2 to ensure they remain compatible.

It seems that the fact that I'm only passing the PK into the UDF can cause the constraint not to fire.

Possibly the optimizer is assuming that the UDF is deterministic and, as the PK is static, nothing can have changed since the last time the row was created or updated. Referencing the relevant column in the UDF call did seem to ensure the constraint was applied reliably on all updates.

I appreciate that this is not an ideal way to implement such a constraint, and maybe a trigger might be a more robust and RBAR-less way to do it. It's just that I don't recall seeing any caveats on whether constraints will fire or not, and wanted to share my experiences.