SUM of FLOAT inconsistency

  • Comments posted to this topic are about the item SUM of FLOAT inconsistency

  • Great QotD! One of the rare ones that isn't easy to cheat on.

    Thumbs up!

    Ronald

    Ronald HensbergenHelp us, help yourself... Post data so we can read and use it: http://www.sqlservercentral.com/articles/Best+Practices/61537/-------------------------------------------------------------------------2+2=5 for significant large values of 2

  • 3 of the options seemed very unlikely, and the answer is easily verified by changing the order of @float* inserts. Cheating rules ok 😉

  • That's true:

    DECLARE @SumA float, @SumB float

    DECLARE @MyFloat1 float, @MyFloat2 float, @MyFloat3 float

    DECLARE @MyTable table

    (

    ID int primary key identity,

    NumA float,

    NumB float

    )

    SET @MyFloat1 = 10000000000020000

    SET @MyFloat2 = -10000000000010000

    SET @MyFloat3 = 1

    INSERT INTO @MyTable

    SELECT @MyFloat1, CAST(@MyFloat3 AS FLOAT)

    UNION

    SELECT @MyFloat2, @MyFloat1

    UNION ALL

    SELECT @MyFloat3, @MyFloat2

    SELECT SUM(NumA), SUM(NumB) FROM

    (select top 100 * from @MyTable

    order by 1

    ) AS A

    SELECT SUM(NumA), SUM(NumB) FROM

    (select top 100 * from @MyTable

    order by 2

    ) AS B

    Result:

    (3 row(s) affected)

    1000110000

    (1 row(s) affected)

    1000010001

    (1 row(s) affected)

  • Were the declarations of @SumA and @SumB supposed to confuse us on this one? I found them to be completely unnecessary.

    Thanks for posting. I have often found SQL calculations that create inconsistent or incorrect results as a misunderstanding of datatype size and rounding issues which can often be fixed by simply calculating in the appropriate order.

    Of course, even the most experienced query writers can miss this sort of thing sometimes. If the calculations are critical, it is usually a good idea to put in some sort of validation check so that if there are problems, they can be found early and corrected quickly.

  • Main rule is never to use float, because of unwanted effetcs:

    Here the version with decimal. It always is correct.

    DECLARE @SumA float, @SumB float

    DECLARE @MyFloat1 float, @MyFloat2 float, @MyFloat3 float

    DECLARE @MyTable table

    (

    ID int primary key identity,

    NumA DECIMAL(17,0),

    NumB DECIMAL(17,0)

    )

    SET @MyFloat1 = 10000000000020000

    SET @MyFloat2 = -10000000000010000

    SET @MyFloat3 = 1

    INSERT INTO @MyTable

    SELECT @MyFloat1, CAST(@MyFloat3 AS FLOAT)

    UNION

    SELECT @MyFloat2, @MyFloat1

    UNION ALL

    SELECT @MyFloat3, @MyFloat2

    SELECT SUM(NumA), SUM(NumB) FROM

    (select top 100 * from @MyTable

    order by 1

    ) AS A

    SELECT SUM(NumA), SUM(NumB) FROM

    (select top 100 * from @MyTable

    order by 2

    ) AS B

  • This is my first QotD, and I'm pleased that it's gone quite well. It was prompted by outrage at missing out on points for another QotD (Sep 23, 2008, Accessing and changing data 2008) where it was deemed that the SqlServer 2008 GROUP BY GROUPING SETS was not the same as equivalent GROUP BY code because "Aggregates on floating-point numbers might return slightly different results." I protested but didn't get my 2 points back. So I wrote this QotD to prove my point, that identical aggregate code on an identical set of FLOAT numbers can ALSO produce "slightly different results", depending on the order of execution (which *should* be irrelevant in a perfect theoretical world without truncation).

    BTW playing with the UNION & UNION ALL can reverse the results, but they will always (in my experience) produce the 2 differing results. Others noticed similar effects by sorting etc.

    Carlo Romagnano (10/13/2008)


    Main rule is never to use float, because of unwanted effects:

    Here the version with decimal. It always is correct.

    It always is correct WITH THIS SET OF DATA. Try adding a couple of zero(e)s at the end of each of the 3 numbers and watch "Msg 8115, Level 16, State 6, Line 12 - Arithmetic overflow error converting float to data type numeric." appear.

    The FLOAT still works, and produces even stranger (but predictable) results of 1000164 and 1000192.

    That's the point - if the numbers were coming from an external source (and may have decimal places), you might not have the luxury of knowing the full range of numbers used. As a chemical engineer I learnt about floating point numbers before integers, and converted all measurements to Megafurlongs per microfortnight for consistency (the speed of light is just above 1.8 with those units - far more manageable).

    Chris.Strolia-Davis (10/13/2008)


    Were the declarations of @SumA and @SumB supposed to confuse us on this one? I found them to be completely unnecessary.

    The declarations were not intended to confuse; as a software developer for many decades (mainly with C# for the last 6 years, with some T-SQL on the side), I like to define all my variables of a type that I decide. I don't want to rely on some box of blacklegging binary bits* to make the decision for me. With the latest releases of dotNet allowing implicit declaration types, I'm a bit nervous actually.

    Brewmanz

    aka Bryan White, NZ

    * Thanks to Douglas Adams for this term

  • brewmanz (10/13/2008)


    Chris.Strolia-Davis (10/13/2008)


    Were the declarations of @SumA and @SumB supposed to confuse us on this one? I found them to be completely unnecessary.

    The declarations were not intended to confuse; as a software developer for many decades (mainly with C# for the last 6 years, with some T-SQL on the side), I like to define all my variables of a type that I decide. I don't want to rely on some box of blacklegging binary bits* to make the decision for me. With the latest releases of dotNet allowing implicit declaration types, I'm a bit nervous actually.

    All I am saying is that I don't see you actually using @SumA or @SumB in this query. They were declared, but don't seem to have been used. If you comment out that line altogether, it changes nothing ... that I'm aware of ;).

  • Chris.Strolia-Davis (10/13/2008)


    All I am saying is that I don't see you actually using @SumA or @SumB in this query. They were declared, but don't seem to have been used. If you comment out that line altogether, it changes nothing ... that I'm aware of ;).

    Doh! You are quite right.

    When I re-read the QotD text, I thought that maybe it had been editorially modified as I *did* use them in my earlier playing, assigning SUM(Num%) to the @Sum% variables. Maybe they'd changed things.

    But no; I've checked my draft pre-submission, and the editors are not to blame.

    Mea culpa.

    Please don't tell my daughter. I take great delight in proofreading her work and finding mistakes (she works for a brochure publishing company). Sadly, it seems that I am capable of making mistakes, too. Soon I'll be into double figures this century 😉

    Regards

    brewmanz

  • Great QOTD, from another Douglas Adams fan, SQL Using, Kiwi! There can't be many of us! 🙂

  • Nice question......

  • brewmanz (10/13/2008)


    T

    It always is correct WITH THIS SET OF DATA. Try adding a couple of zero(e)s at the end of each of the 3 numbers and watch "Msg 8115, Level 16, State 6, Line 12 - Arithmetic overflow error converting float to data type numeric." appear.

    The FLOAT still works, and produces even stranger (but predictable) results of 1000164 and 1000192.

    I prefer an overflow error instead of wrong data. You can specify a precision of 38 digits

    From BOL:

    Numeric data types that have fixed precision and scale.

    decimal[ (p[ , s] )] and numeric[ (p[ , s] )]

    Fixed precision and scale numbers. When maximum precision is used, valid values are from - 10^38 +1 through 10^38 - 1. The SQL-92 synonyms for decimal are dec and dec(p, s). numeric is functionally equivalent to decimal.

    p (precision)

    The maximum total number of decimal digits that can be stored, both to the left and to the right of the decimal point. The precision must be a value from 1 through the maximum precision of 38. The default precision is 18.

    s (scale)

    The maximum number of decimal digits that can be stored to the right of the decimal point. Scale must be a value from 0 through p. Scale can be specified only if precision is specified. The default scale is 0; therefore, 0 <= s <= p. Maximum storage sizes vary, based on the precision.

  • brewmanz (10/13/2008)


    Please don't tell my daughter. I take great delight in proofreading her work and finding mistakes (she works for a brochure publishing company). Sadly, it seems that I am capable of making mistakes, too. Soon I'll be into double figures this century 😉

    Ha ha, it happens to the best of us my friend. Seems like life enjoys dishing out the humble pie every now and then.

    Thanks again for the article. It's good for developers to be aware of, and judging by the results of the quiz, there are quite a few out there who could use the enlightenment.

  • Brewmanz... keep them coming. That was great, and exactly what we all need! Good explanation, too. I found that it was dependent upon order, but could not figure out why. A+++ QotD!

    Todd Carrier
    MCITP - Database Administrator (SQL 2008)
    MCSE: Data Platform (SQL 2012)

  • You can validate this using Excel as well. You will find the same thing in Excel also (Refer attachment)

    Excellent post!

    Thanks & Regards,
    Nakul Vachhrajani.
    http://nakulvachhrajani.com

    Follow me on
    Twitter: @sqltwins

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply