SQL CLR Data Types and Performance

  • Thanks Matt for the comparisons. My experience has been similar with floating point calculations, between 50% to 90% improvement. For text (regex-type functions), the improvements were many fold. Of course the comparable T-SQL functions could themselves be improved but for the quickest bang-for-the-buck, CLRs really shine when the functions are well-defined. In a couple of cases, I took some previously existing functions written in C++ and made them into CLR functions so the calculations came out identical to external apps. This avoided a lot of discrepancies between T-SQL and external apps.

  • Kevin Rathgeber (7/21/2010)


    I have a comment regarding your T-SQL

    In trying to keep all things equal, Shouldn't you declare a variable at the beginning of your function to hold the value of PI() / 180 instead of calculating it 4 times.

    That was the first item I noticed.

    Yep!

    The point of the article wasn't to pit SQL CLR against T-SQL implementations (I debated whether to include the T-SQL results with it's current implementation in the article). Rather, it was to show that you don't have to use a sql data type in your SQL CLR (you can use native .net data types) and that doing so may result in performance gains - in my particular example, rather significant performance gains.

    I figured Jeff would mention this shortly after I posted the requested T-SQL so I was anticipating it :-).

    Good catch Kevin.

    It should also be stated that this wouldn't make 'all things equal' per se. The SQL CLR implementation doesn't perform the calculation once, let alone 4 times with each call. By declaring a variable to eliminate the T-SQL redundancy, it still must be performed once with each call.

  • mzak (7/21/2010)


    Kevin Rathgeber (7/21/2010)


    I have a comment regarding your T-SQL

    In trying to keep all things equal, Shouldn't you declare a variable at the beginning of your function to hold the value of PI() / 180 instead of calculating it 4 times.

    That was the first item I noticed.

    Yep!

    The point of the article wasn't to pit SQL CLR against T-SQL implementations (I debated whether to include the T-SQL results with it's current implementation in the article). Rather, it was to show that you don't have to use a sql data type in your SQL CLR (you can use native .net data types) and that doing so may result in performance gains - in my particular example, rather significant performance gains.

    I figured Jeff would mention this shortly after I posted the requested T-SQL so I was anticipating it :-).

    Good catch Kevin.

    It should also be stated that this wouldn't make 'all things equal' per se. The SQL CLR implementation doesn't perform the calculation once, let alone 4 times with each call. By declaring a variable to eliminate the T-SQL redundancy, it still must be performed once with each call.

    Oops....posted without saying anything.

    Good point that it wouldn't be equal, just a little more equal 🙂

    Sorry to detract from your article, but in case you were interested, I just did some testing creating two functions. The first function calculates "PI() / 180" 4 times (your T-SQL example). The 2nd function assigns "PI() / 180" to a variable and then uses that variable 4 times (my suggestion). Surpisingly the 2nd one (my suggestion) was slower. The amount it was slower is pretty much insignificant.....by about 4/10ths of a second over 1million iterations. Long story short, ignore my comment 🙂

  • Kevin Rathgeber (7/21/2010)


    mzak (7/21/2010)


    Kevin Rathgeber (7/21/2010)


    I have a comment regarding your T-SQL

    In trying to keep all things equal, Shouldn't you declare a variable at the beginning of your function to hold the value of PI() / 180 instead of calculating it 4 times.

    That was the first item I noticed.

    Yep!

    The point of the article wasn't to pit SQL CLR against T-SQL implementations (I debated whether to include the T-SQL results with it's current implementation in the article). Rather, it was to show that you don't have to use a sql data type in your SQL CLR (you can use native .net data types) and that doing so may result in performance gains - in my particular example, rather significant performance gains.

    I figured Jeff would mention this shortly after I posted the requested T-SQL so I was anticipating it :-).

    Good catch Kevin.

    It should also be stated that this wouldn't make 'all things equal' per se. The SQL CLR implementation doesn't perform the calculation once, let alone 4 times with each call. By declaring a variable to eliminate the T-SQL redundancy, it still must be performed once with each call.

    Oops....posted without saying anything.

    Good point that it wouldn't be equal, just a little more equal 🙂

    Sorry to detract from your article, but in case you were interested, I just did some testing creating two functions. The first function calculates "PI() / 180" 4 times (your T-SQL example). The 2nd function assigns "PI() / 180" to a variable and then uses that variable 4 times (my suggestion). Surpisingly the 2nd one (my suggestion) was slower. The amount it was slower is pretty much insignificant.....by about 4/10ths of a second over 1million iterations. Long story short, ignore my comment 🙂

    That's interesting. Although I wonder what the "cost" of declaring a variable is with respect to this scenario? If I get some time, I'll rewrite the T-SQL udf based on your suggestion and see what results.

    Based on your test, I'd also be curious if changing it to eliminate some variable declarations would have much impact (remove @dDistance, @EarthRadius, etc). Also, there's no reason I couldn't assign a reasonable constant for pi() / 180.0 which essentially would get us to equality :-).

  • I enjoyed reading this article.

    Thank you for taking the time to do this.

  • mzak (7/21/2010)


    Kevin Rathgeber (7/21/2010)


    mzak (7/21/2010)


    Kevin Rathgeber (7/21/2010)


    I have a comment regarding your T-SQL

    In trying to keep all things equal, Shouldn't you declare a variable at the beginning of your function to hold the value of PI() / 180 instead of calculating it 4 times.

    That was the first item I noticed.

    Yep!

    The point of the article wasn't to pit SQL CLR against T-SQL implementations (I debated whether to include the T-SQL results with it's current implementation in the article). Rather, it was to show that you don't have to use a sql data type in your SQL CLR (you can use native .net data types) and that doing so may result in performance gains - in my particular example, rather significant performance gains.

    I figured Jeff would mention this shortly after I posted the requested T-SQL so I was anticipating it :-).

    Good catch Kevin.

    It should also be stated that this wouldn't make 'all things equal' per se. The SQL CLR implementation doesn't perform the calculation once, let alone 4 times with each call. By declaring a variable to eliminate the T-SQL redundancy, it still must be performed once with each call.

    Oops....posted without saying anything.

    Good point that it wouldn't be equal, just a little more equal 🙂

    Sorry to detract from your article, but in case you were interested, I just did some testing creating two functions. The first function calculates "PI() / 180" 4 times (your T-SQL example). The 2nd function assigns "PI() / 180" to a variable and then uses that variable 4 times (my suggestion). Surpisingly the 2nd one (my suggestion) was slower. The amount it was slower is pretty much insignificant.....by about 4/10ths of a second over 1million iterations. Long story short, ignore my comment 🙂

    That's interesting. Although I wonder what the "cost" of declaring a variable is with respect to this scenario? If I get some time, I'll rewrite the T-SQL udf based on your suggestion and see what results.

    Based on your test, I'd also be curious if changing it to eliminate some variable declarations would have much impact (remove @dDistance, @EarthRadius, etc). Also, there's no reason I couldn't assign a reasonable constant for pi() / 180.0 which essentially would get us to equality :-).

    I too was wondering about the need (or more correctly the lack of need) for all those variables in the tsql code. Just inline everything you can and see what happens. I also wonder if things can REALLY be sped up if you can make the entire compuation a single statement (even computing the pi/180 thing every time to ensure NO variable declarations exist). If you can get to that, then try simply inlining that statement in your SELECT and avoiding the UDF altogether and see if that is even faster.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • If you do in-line everything then contrast the execution plan with using a variable.

    ATBCharles Kincaid

  • Some interesting results!

    I first tested the suggested hypothesis of declaring/setting the pi/180 as a constant in the T-SQL udf.

    Average time = 1444.4 milliseconds

    Then I tested using no declared variables within the T-SQL udf.

    Average time = 1456.3 milliseconds

    Finally, I tested no udf call at all (inline expression).

    Average time = 664.4 milliseconds!

    Of course, the inline expression would be a nightmare to maintain so there are some tradeoffs.

  • It would be very interesting to see the differences using the profiler. In a demo I saw recently the client statistics showed a performance gain but when they ran profiler to compare the t-sql and clr, there was a lot higher cpu utilization on the server.

    I think you are really onto something with converting the data types to .net, excellent article! I learned a lot.

  • mzak (7/22/2010)


    Some interesting results!

    I first tested the suggested hypothesis of declaring/setting the pi/180 as a constant in the T-SQL udf.

    Average time = 1444.4 milliseconds

    Then I tested using no declared variables within the T-SQL udf.

    Average time = 1456.3 milliseconds

    Finally, I tested no udf call at all (inline expression).

    Average time = 664.4 milliseconds!

    Of course, the inline expression would be a nightmare to maintain so there are some tradeoffs.

    Now that is something that one fights with in his head. I am an application developer (GIS and Web based) and I must admit my raw SQL skills have not been keeping up. My head is more stuck in the .Net Object Oriented world and because of this I tend to want to keep things clean and readable and create reuse. That was one of my main reasons for suggesting the variable to hold the PI()/ 180 value.......for reuse.

    The second reason is I seriously thought that caculating once and referencing the variable over again would create a performance improvement. In my tests I did not see any and in fact the variable declare was slightly worse but minimal. In the slightly worse but minimal scenario, I would opt for readability over performance. In the case you just showed above, doing everything inline, I would have to go the other way and as a developer this bugs the hell out of me lol.

    EDIT: Just want to clarify - I am not saying that you can't keep things clean/readable/reusable in SQL.....the fact that it is more top down vs object oriented makes it a less cleaner/readable by nature. Because of this I find myself not caring so much about readability as I do about performance at this tier.

  • You should be able to access the SqlDecimal values directly using the .Value property of the SqlDecimal variables. The reason people recommend using SqlDecimal instead of "native" .NET Decimal is simple: NULL. Try passing a NULL into your functions; without using the nullable types your function won't be able to handle them.

    My advice would be to try this:

    * Use SqlDecimal for your parameters to your function

    * First check for NULLs and handle appropriately, first thing in the function

    * Then assign your SqlDecimal parameter values to local Decimal variables

    * Do all your calculations using the non-nullable (standard) .NET types like Decimal

    * Return your result by creating a new SqlDecimal instance (i.e., return new SqlDecimal(3956.0 * c))

    I wouldn't use standard non-nullable .NET types for parameters and return values, but I also wouldn't use the nullable types for calculations internal to the function. Using the above I'd expect your testing time to be somewhat more than your best time, but significantly less than your worst time.

    Mike C

  • Oh yeah, almost forgot to mention, your performance difference between the version with global constants and the one with no global constants is probably due to implicit conversions from Decimal and/or SqlDecimal to the .NET Double type.

    That is to say when you define Decimal constants like EarthRadiusInMiles = 3956.0, .NET implicitly converts 3956.0 from Double to Decimal for you. If you define it at the top, it only has to do the implicit conversion once. If you just use 3956.0 in the code in a calculation the implicit conversion would have to take place for every reference. I tried to modify your code, as shown below, to do all internal calculations using the Double data type and convert from/to SqlDecimal only at the beginning and end of the calculation. Please forgive if I have any typos below, I usually write SQL and C#.

    ' Code begins here

    Imports System.Data.SqlTypes

    Imports Microsoft.SqlServer.Server

    Imports System.Math

    Partial Public Class UserDefinedFunctions

    Public Const EarthRadiusInMiles As Double = 3956.0

    Public Const RadianConversionConst As Double = Math.PI / 180.0

    <SqlFunction()> _

    Public Shared Function Distance( _

    ByVal Latitude1 As SqlDecimal, _

    ByVal Longitude1 As SqlDecimal, _

    ByVal Latitude2 As SqlDecimal, _

    ByVal Longitude2 As SqlDecimal) As SqlDecimal

    ' Return NULL automatically if any nulls are passed in

    ' Use OrElse for short-circuit logic

    If (Latitude1.IsNull OrElse Latitude2.IsNull OrElse Longitude1.IsNull OrElse Longitude2.IsNull) Then

    Return SqlDecimal.Null

    End If

    ' Explicitly cast all inputs to Double and do all calculations as Doubles

    Dim Lat1 As Double = CType(Latitude1.Value, Double)

    Dim Long1 As Double = CType(Longitude1.Value, Double)

    Dim Lat2 As Double = CType(Latitude2.Value, Double)

    Dim Long2 As Double = CType(Longitude2.Value, Double)

    Dim Latitude1Radian As Double = Lat1 * RadianConversionConst

    Dim Longitude1Radian As Double = Long1 * RadianConversionConst

    Dim Latitude2Radian As Double = Lat2 * RadianConversionConst

    Dim Longitude2Radian As Double = Long2 * RadianConversionConst

    Dim LongSpanRadian As Double = Longitude2Radian - Longitude1Radian

    Dim LatSpanRadian As Double = Latitude2Radian - Latitude1Radian

    'intermediate result a

    Dim a As Double = Math.Pow(Math.Sin(LatSpanRadian / 2.0), 2) + Math.Cos(Latitude1Radian) * Math.Cos(Latitude2Radian) * Math.Pow(Math.Sin(LongSpanRadian / 2.0), 2)

    'intermediate result c (great circle distance in radians)

    Dim c As Double = 2.0 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1.0 - a))

    Return New SqlDecimal(EarthRadiusInMiles * c)

    End Function

    End Class

    ' Code ends here

    Mike C

  • Mike C (7/22/2010)


    Oh yeah, almost forgot to mention, your performance difference between the version with global constants and the one with no global constants is probably due to implicit conversions from Decimal and/or SqlDecimal to the .NET Double type.

    That is to say when you define Decimal constants like EarthRadiusInMiles = 3956.0, .NET implicitly converts 3956.0 from Double to Decimal for you. If you define it at the top, it only has to do the implicit conversion once. If you just use 3956.0 in the code in a calculation the implicit conversion would have to take place for every reference. I tried to modify your code, as shown below, to do all internal calculations using the Double data type and convert from/to SqlDecimal only at the beginning and end of the calculation. Please forgive if I have any typos below, I usually write SQL and C#.

    ' Code begins here

    Imports System.Data.SqlTypes

    Imports Microsoft.SqlServer.Server

    Imports System.Math

    Partial Public Class UserDefinedFunctions

    Public Const EarthRadiusInMiles As Double = 3956.0

    Public Const RadianConversionConst As Double = Math.PI / 180.0

    <SqlFunction()> _

    Public Shared Function Distance( _

    ByVal Latitude1 As SqlDecimal, _

    ByVal Longitude1 As SqlDecimal, _

    ByVal Latitude2 As SqlDecimal, _

    ByVal Longitude2 As SqlDecimal) As SqlDecimal

    ' Return NULL automatically if any nulls are passed in

    ' Use OrElse for short-circuit logic

    If (Latitude1.IsNull OrElse Latitude2.IsNull OrElse Longitude1.IsNull OrElse Longitude2.IsNull) Then

    Return SqlDecimal.Null

    End If

    ' Explicitly cast all inputs to Double and do all calculations as Doubles

    Dim Lat1 As Double = CType(Latitude1.Value, Double)

    Dim Long1 As Double = CType(Longitude1.Value, Double)

    Dim Lat2 As Double = CType(Latitude2.Value, Double)

    Dim Long2 As Double = CType(Longitude2.Value, Double)

    Dim Latitude1Radian As Double = Lat1 * RadianConversionConst

    Dim Longitude1Radian As Double = Long1 * RadianConversionConst

    Dim Latitude2Radian As Double = Lat2 * RadianConversionConst

    Dim Longitude2Radian As Double = Long2 * RadianConversionConst

    Dim LongSpanRadian As Double = Longitude2Radian - Longitude1Radian

    Dim LatSpanRadian As Double = Latitude2Radian - Latitude1Radian

    'intermediate result a

    Dim a As Double = Math.Pow(Math.Sin(LatSpanRadian / 2.0), 2) + Math.Cos(Latitude1Radian) * Math.Cos(Latitude2Radian) * Math.Pow(Math.Sin(LongSpanRadian / 2.0), 2)

    'intermediate result c (great circle distance in radians)

    Dim c As Double = 2.0 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1.0 - a))

    Return New SqlDecimal(EarthRadiusInMiles * c)

    End Function

    End Class

    ' Code ends here

    Mike C

    I was going to implement your suggestions and call it Distance_MikeC_Style!

    Good catch on the null values. I shouldn't have any passed into this based on how it's currently implemented in our environment but the function should be able to handle it eloquently. It bothers me somewhat that I didn't think about this and provide for it - let alone have a disclaimer in the article in case wanderers miss this discussion. I'll see if that can be updated (but likely next week as I'm about gone for the day!)

    I need to read up more on .net data types in reference to differences between double's and decimal's and admittedly should have done that before posting this. Thanks for sharing your knowledge with respect to this.

    I appreciate the code representation of your suggestions. I will test this "Distance_MikeC_Style" and post the results!

  • As per Mike C's insight, the following is an update to the SQL CLR distance function using global constants where possible and handling the SqlDecimal -> Double -> SqlDecimal processing correctly.

    Average execution time using the same 30 iteration test was still impressive at 784.9 milliseconds.

    Thanks Mike. Much appreciated!

    Imports System.Data.SqlTypes

    Imports Microsoft.SqlServer.Server

    Partial Public Class UserDefinedFunctions

    Public Const EarthRadiusInMiles As Double = 3956.0

    Public Const RadianConversionConst As Double = Math.PI / 180.0

    <SqlFunction()> _

    Public Shared Function Distance_MikeC_Style( _

    ByVal Latitude1 As SqlDecimal, _

    ByVal Longitude1 As SqlDecimal, _

    ByVal Latitude2 As SqlDecimal, _

    ByVal Longitude2 As SqlDecimal) As SqlDecimal

    ' Return NULL automatically if any nulls are passed in

    ' Use OrElse for short-circuit logic

    If (Latitude1.IsNull OrElse Latitude2.IsNull OrElse Longitude1.IsNull OrElse Longitude2.IsNull) Then

    Return SqlDecimal.Null

    End If

    ' Explicitly cast all inputs to Double and do all calculations as Doubles

    Dim Lat1 As Double = CType(Latitude1.Value, Double)

    Dim Long1 As Double = CType(Longitude1.Value, Double)

    Dim Lat2 As Double = CType(Latitude2.Value, Double)

    Dim Long2 As Double = CType(Longitude2.Value, Double)

    Dim Latitude1Radian As Double = Lat1 * RadianConversionConst

    Dim Longitude1Radian As Double = Long1 * RadianConversionConst

    Dim Latitude2Radian As Double = Lat2 * RadianConversionConst

    Dim Longitude2Radian As Double = Long2 * RadianConversionConst

    Dim LongSpanRadian As Double = Longitude2Radian - Longitude1Radian

    Dim LatSpanRadian As Double = Latitude2Radian - Latitude1Radian

    'intermediate result a

    Dim a As Double = Math.Pow(Math.Sin(LatSpanRadian / 2.0), 2) + Math.Cos(Latitude1Radian) * Math.Cos(Latitude2Radian) * Math.Pow(Math.Sin(LongSpanRadian / 2.0), 2)

    'intermediate result c (great circle distance in radians)

    Dim c As Double = 2.0 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1.0 - a))

    Return New SqlDecimal(EarthRadiusInMiles * c)

    End Function

  • Incidently, I tested the "Mike C style" function with minimal variable dimensioning (basically a hybrid of what we were trying on the T-SQL side based on the Kevins' postings).

    It results in a very long return new sqldecimal( ... ) line!

    Average execution time over 30 iterations was slightly better than non-"minimal variable dimensioning" at 777.6 milliseconds.

    Again, I'd rather have more maintainable code than such a subtle performance improvement but in some instances, that may not be a luxury!

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply