Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase «««1234»»

SQL CLR Data Types and Performance Expand / Collapse
Author
Message
Posted Thursday, July 22, 2010 12:21 PM


SSCarpal Tunnel

SSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal TunnelSSCarpal Tunnel

Group: General Forum Members
Last Login: Yesterday @ 7:58 PM
Points: 4,133, Visits: 5,848
mzak (7/21/2010)
Kevin Rathgeber (7/21/2010)
mzak (7/21/2010)
Kevin Rathgeber (7/21/2010)
I have a comment regarding your T-SQL

In trying to keep all things equal, Shouldn't you declare a variable at the beginning of your function to hold the value of PI() / 180 instead of calculating it 4 times.

That was the first item I noticed.


Yep!

The point of the article wasn't to pit SQL CLR against T-SQL implementations (I debated whether to include the T-SQL results with it's current implementation in the article). Rather, it was to show that you don't have to use a sql data type in your SQL CLR (you can use native .net data types) and that doing so may result in performance gains - in my particular example, rather significant performance gains.

I figured Jeff would mention this shortly after I posted the requested T-SQL so I was anticipating it .

Good catch Kevin.

It should also be stated that this wouldn't make 'all things equal' per se. The SQL CLR implementation doesn't perform the calculation once, let alone 4 times with each call. By declaring a variable to eliminate the T-SQL redundancy, it still must be performed once with each call.


Oops....posted without saying anything.

Good point that it wouldn't be equal, just a little more equal

Sorry to detract from your article, but in case you were interested, I just did some testing creating two functions. The first function calculates "PI() / 180" 4 times (your T-SQL example). The 2nd function assigns "PI() / 180" to a variable and then uses that variable 4 times (my suggestion). Surpisingly the 2nd one (my suggestion) was slower. The amount it was slower is pretty much insignificant.....by about 4/10ths of a second over 1million iterations. Long story short, ignore my comment


That's interesting. Although I wonder what the "cost" of declaring a variable is with respect to this scenario? If I get some time, I'll rewrite the T-SQL udf based on your suggestion and see what results.

Based on your test, I'd also be curious if changing it to eliminate some variable declarations would have much impact (remove @dDistance, @EarthRadius, etc). Also, there's no reason I couldn't assign a reasonable constant for pi() / 180.0 which essentially would get us to equality .


I too was wondering about the need (or more correctly the lack of need) for all those variables in the tsql code. Just inline everything you can and see what happens. I also wonder if things can REALLY be sped up if you can make the entire compuation a single statement (even computing the pi/180 thing every time to ensure NO variable declarations exist). If you can get to that, then try simply inlining that statement in your SELECT and avoiding the UDF altogether and see if that is even faster.


Best,

Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru at GMail
Post #957459
Posted Thursday, July 22, 2010 12:31 PM


SSC Eights!

SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!SSC Eights!

Group: General Forum Members
Last Login: Thursday, February 06, 2014 12:59 PM
Points: 801, Visits: 1,962
If you do in-line everything then contrast the execution plan with using a variable.

ATB

Charles Kincaid

Post #957476
Posted Thursday, July 22, 2010 1:07 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, April 14, 2011 7:04 AM
Points: 20, Visits: 177
Some interesting results!

I first tested the suggested hypothesis of declaring/setting the pi/180 as a constant in the T-SQL udf.

Average time = 1444.4 milliseconds

Then I tested using no declared variables within the T-SQL udf.

Average time = 1456.3 milliseconds

Finally, I tested no udf call at all (inline expression).

Average time = 664.4 milliseconds!

Of course, the inline expression would be a nightmare to maintain so there are some tradeoffs.
Post #957497
Posted Thursday, July 22, 2010 1:38 PM
Forum Newbie

Forum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum NewbieForum Newbie

Group: General Forum Members
Last Login: Thursday, January 16, 2014 6:03 AM
Points: 6, Visits: 33
It would be very interesting to see the differences using the profiler. In a demo I saw recently the client statistics showed a performance gain but when they ran profiler to compare the t-sql and clr, there was a lot higher cpu utilization on the server.

I think you are really onto something with converting the data types to .net, excellent article! I learned a lot.
Post #957531
Posted Thursday, July 22, 2010 1:49 PM
SSC Rookie

SSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC RookieSSC Rookie

Group: General Forum Members
Last Login: Tuesday, January 14, 2014 10:21 AM
Points: 36, Visits: 152
mzak (7/22/2010)
Some interesting results!

I first tested the suggested hypothesis of declaring/setting the pi/180 as a constant in the T-SQL udf.

Average time = 1444.4 milliseconds

Then I tested using no declared variables within the T-SQL udf.

Average time = 1456.3 milliseconds

Finally, I tested no udf call at all (inline expression).

Average time = 664.4 milliseconds!

Of course, the inline expression would be a nightmare to maintain so there are some tradeoffs.


Now that is something that one fights with in his head. I am an application developer (GIS and Web based) and I must admit my raw SQL skills have not been keeping up. My head is more stuck in the .Net Object Oriented world and because of this I tend to want to keep things clean and readable and create reuse. That was one of my main reasons for suggesting the variable to hold the PI()/ 180 value.......for reuse.

The second reason is I seriously thought that caculating once and referencing the variable over again would create a performance improvement. In my tests I did not see any and in fact the variable declare was slightly worse but minimal. In the slightly worse but minimal scenario, I would opt for readability over performance. In the case you just showed above, doing everything inline, I would have to go the other way and as a developer this bugs the hell out of me lol.


EDIT: Just want to clarify - I am not saying that you can't keep things clean/readable/reusable in SQL.....the fact that it is more top down vs object oriented makes it a less cleaner/readable by nature. Because of this I find myself not caring so much about readability as I do about performance at this tier.
Post #957539
Posted Thursday, July 22, 2010 1:58 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
You should be able to access the SqlDecimal values directly using the .Value property of the SqlDecimal variables. The reason people recommend using SqlDecimal instead of "native" .NET Decimal is simple: NULL. Try passing a NULL into your functions; without using the nullable types your function won't be able to handle them.

My advice would be to try this:

* Use SqlDecimal for your parameters to your function
* First check for NULLs and handle appropriately, first thing in the function
* Then assign your SqlDecimal parameter values to local Decimal variables
* Do all your calculations using the non-nullable (standard) .NET types like Decimal
* Return your result by creating a new SqlDecimal instance (i.e., return new SqlDecimal(3956.0 * c))

I wouldn't use standard non-nullable .NET types for parameters and return values, but I also wouldn't use the nullable types for calculations internal to the function. Using the above I'd expect your testing time to be somewhat more than your best time, but significantly less than your worst time.

Mike C
Post #957551
Posted Thursday, July 22, 2010 2:19 PM


Ten Centuries

Ten CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen CenturiesTen Centuries

Group: General Forum Members
Last Login: Monday, April 14, 2014 4:18 PM
Points: 1,276, Visits: 1,132
Oh yeah, almost forgot to mention, your performance difference between the version with global constants and the one with no global constants is probably due to implicit conversions from Decimal and/or SqlDecimal to the .NET Double type.

That is to say when you define Decimal constants like EarthRadiusInMiles = 3956.0, .NET implicitly converts 3956.0 from Double to Decimal for you. If you define it at the top, it only has to do the implicit conversion once. If you just use 3956.0 in the code in a calculation the implicit conversion would have to take place for every reference. I tried to modify your code, as shown below, to do all internal calculations using the Double data type and convert from/to SqlDecimal only at the beginning and end of the calculation. Please forgive if I have any typos below, I usually write SQL and C#.

' Code begins here
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Imports System.Math

Partial Public Class UserDefinedFunctions
Public Const EarthRadiusInMiles As Double = 3956.0
Public Const RadianConversionConst As Double = Math.PI / 180.0
<SqlFunction()> _
Public Shared Function Distance( _
ByVal Latitude1 As SqlDecimal, _
ByVal Longitude1 As SqlDecimal, _
ByVal Latitude2 As SqlDecimal, _
ByVal Longitude2 As SqlDecimal) As SqlDecimal
' Return NULL automatically if any nulls are passed in
' Use OrElse for short-circuit logic
If (Latitude1.IsNull OrElse Latitude2.IsNull OrElse Longitude1.IsNull OrElse Longitude2.IsNull) Then
Return SqlDecimal.Null
End If
' Explicitly cast all inputs to Double and do all calculations as Doubles
Dim Lat1 As Double = CType(Latitude1.Value, Double)
Dim Long1 As Double = CType(Longitude1.Value, Double)
Dim Lat2 As Double = CType(Latitude2.Value, Double)
Dim Long2 As Double = CType(Longitude2.Value, Double)
Dim Latitude1Radian As Double = Lat1 * RadianConversionConst
Dim Longitude1Radian As Double = Long1 * RadianConversionConst
Dim Latitude2Radian As Double = Lat2 * RadianConversionConst
Dim Longitude2Radian As Double = Long2 * RadianConversionConst
Dim LongSpanRadian As Double = Longitude2Radian - Longitude1Radian
Dim LatSpanRadian As Double = Latitude2Radian - Latitude1Radian
'intermediate result a
Dim a As Double = Math.Pow(Math.Sin(LatSpanRadian / 2.0), 2) + Math.Cos(Latitude1Radian) * Math.Cos(Latitude2Radian) * Math.Pow(Math.Sin(LongSpanRadian / 2.0), 2)
'intermediate result c (great circle distance in radians)
Dim c As Double = 2.0 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1.0 - a))
Return New SqlDecimal(EarthRadiusInMiles * c)
End Function
End Class
' Code ends here

Mike C
Post #957564
Posted Thursday, July 22, 2010 2:31 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, April 14, 2011 7:04 AM
Points: 20, Visits: 177
Mike C (7/22/2010)
Oh yeah, almost forgot to mention, your performance difference between the version with global constants and the one with no global constants is probably due to implicit conversions from Decimal and/or SqlDecimal to the .NET Double type.

That is to say when you define Decimal constants like EarthRadiusInMiles = 3956.0, .NET implicitly converts 3956.0 from Double to Decimal for you. If you define it at the top, it only has to do the implicit conversion once. If you just use 3956.0 in the code in a calculation the implicit conversion would have to take place for every reference. I tried to modify your code, as shown below, to do all internal calculations using the Double data type and convert from/to SqlDecimal only at the beginning and end of the calculation. Please forgive if I have any typos below, I usually write SQL and C#.

' Code begins here
Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server
Imports System.Math

Partial Public Class UserDefinedFunctions
Public Const EarthRadiusInMiles As Double = 3956.0
Public Const RadianConversionConst As Double = Math.PI / 180.0
<SqlFunction()> _
Public Shared Function Distance( _
ByVal Latitude1 As SqlDecimal, _
ByVal Longitude1 As SqlDecimal, _
ByVal Latitude2 As SqlDecimal, _
ByVal Longitude2 As SqlDecimal) As SqlDecimal
' Return NULL automatically if any nulls are passed in
' Use OrElse for short-circuit logic
If (Latitude1.IsNull OrElse Latitude2.IsNull OrElse Longitude1.IsNull OrElse Longitude2.IsNull) Then
Return SqlDecimal.Null
End If
' Explicitly cast all inputs to Double and do all calculations as Doubles
Dim Lat1 As Double = CType(Latitude1.Value, Double)
Dim Long1 As Double = CType(Longitude1.Value, Double)
Dim Lat2 As Double = CType(Latitude2.Value, Double)
Dim Long2 As Double = CType(Longitude2.Value, Double)
Dim Latitude1Radian As Double = Lat1 * RadianConversionConst
Dim Longitude1Radian As Double = Long1 * RadianConversionConst
Dim Latitude2Radian As Double = Lat2 * RadianConversionConst
Dim Longitude2Radian As Double = Long2 * RadianConversionConst
Dim LongSpanRadian As Double = Longitude2Radian - Longitude1Radian
Dim LatSpanRadian As Double = Latitude2Radian - Latitude1Radian
'intermediate result a
Dim a As Double = Math.Pow(Math.Sin(LatSpanRadian / 2.0), 2) + Math.Cos(Latitude1Radian) * Math.Cos(Latitude2Radian) * Math.Pow(Math.Sin(LongSpanRadian / 2.0), 2)
'intermediate result c (great circle distance in radians)
Dim c As Double = 2.0 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1.0 - a))
Return New SqlDecimal(EarthRadiusInMiles * c)
End Function
End Class
' Code ends here

Mike C


I was going to implement your suggestions and call it Distance_MikeC_Style!

Good catch on the null values. I shouldn't have any passed into this based on how it's currently implemented in our environment but the function should be able to handle it eloquently. It bothers me somewhat that I didn't think about this and provide for it - let alone have a disclaimer in the article in case wanderers miss this discussion. I'll see if that can be updated (but likely next week as I'm about gone for the day!)

I need to read up more on .net data types in reference to differences between double's and decimal's and admittedly should have done that before posting this. Thanks for sharing your knowledge with respect to this.

I appreciate the code representation of your suggestions. I will test this "Distance_MikeC_Style" and post the results!
Post #957577
Posted Thursday, July 22, 2010 2:51 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, April 14, 2011 7:04 AM
Points: 20, Visits: 177
As per Mike C's insight, the following is an update to the SQL CLR distance function using global constants where possible and handling the SqlDecimal -> Double -> SqlDecimal processing correctly.

Average execution time using the same 30 iteration test was still impressive at 784.9 milliseconds.

Thanks Mike. Much appreciated!

Imports System.Data.SqlTypes
Imports Microsoft.SqlServer.Server

Partial Public Class UserDefinedFunctions

Public Const EarthRadiusInMiles As Double = 3956.0
Public Const RadianConversionConst As Double = Math.PI / 180.0

<SqlFunction()> _
Public Shared Function Distance_MikeC_Style( _
ByVal Latitude1 As SqlDecimal, _
ByVal Longitude1 As SqlDecimal, _
ByVal Latitude2 As SqlDecimal, _
ByVal Longitude2 As SqlDecimal) As SqlDecimal

' Return NULL automatically if any nulls are passed in
' Use OrElse for short-circuit logic
If (Latitude1.IsNull OrElse Latitude2.IsNull OrElse Longitude1.IsNull OrElse Longitude2.IsNull) Then
Return SqlDecimal.Null
End If
' Explicitly cast all inputs to Double and do all calculations as Doubles
Dim Lat1 As Double = CType(Latitude1.Value, Double)
Dim Long1 As Double = CType(Longitude1.Value, Double)
Dim Lat2 As Double = CType(Latitude2.Value, Double)
Dim Long2 As Double = CType(Longitude2.Value, Double)

Dim Latitude1Radian As Double = Lat1 * RadianConversionConst
Dim Longitude1Radian As Double = Long1 * RadianConversionConst
Dim Latitude2Radian As Double = Lat2 * RadianConversionConst
Dim Longitude2Radian As Double = Long2 * RadianConversionConst

Dim LongSpanRadian As Double = Longitude2Radian - Longitude1Radian
Dim LatSpanRadian As Double = Latitude2Radian - Latitude1Radian

'intermediate result a
Dim a As Double = Math.Pow(Math.Sin(LatSpanRadian / 2.0), 2) + Math.Cos(Latitude1Radian) * Math.Cos(Latitude2Radian) * Math.Pow(Math.Sin(LongSpanRadian / 2.0), 2)

'intermediate result c (great circle distance in radians)
Dim c As Double = 2.0 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1.0 - a))

Return New SqlDecimal(EarthRadiusInMiles * c)
End Function

Post #957594
Posted Thursday, July 22, 2010 3:09 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Thursday, April 14, 2011 7:04 AM
Points: 20, Visits: 177
Incidently, I tested the "Mike C style" function with minimal variable dimensioning (basically a hybrid of what we were trying on the T-SQL side based on the Kevins' postings).

It results in a very long return new sqldecimal( ... ) line!

Average execution time over 30 iterations was slightly better than non-"minimal variable dimensioning" at 777.6 milliseconds.

Again, I'd rather have more maintainable code than such a subtle performance improvement but in some instances, that may not be a luxury!
Post #957603
« Prev Topic | Next Topic »

Add to briefcase «««1234»»

Permissions Expand / Collapse