Exact and Approximate

  • Frank Hamersley (1/4/2012)


    How weird stuff like this survives in a commercial offering beggars belief (well mine if no-one elses)!

    I'm with you! It baffled me when I first came across it, and it baffles me still.

  • Frank Hamersley (1/4/2012)


    Whomever the bright sparks were that decided an arbitary 6 digit minimum is to be forced on the resultant without any other consideration except adding up the number of significant digits on the input side....I am still gobsmacked that anyone would think this is viable. Slack in my book.

    Two points: first DECIMAL(77,40) into 38 precision just won't go. Second, a strongly-typed language requires a type that has a defined precision and scale before the computation occurs. As far as the decision to choose 6 is concerned, here's a quote from the SQL Server Programmability Team:

    [font="Verdana"]...we try to avoid truncating the integral part of the value by reducing the scale (thus truncating the decimal part of the value instead). How much scale should be sacrificed? There is no right answer. If we preserve too much, and the result of the multiplication of large numbers will be way off. If we preserve too little, multiplication of small numbers becomes an issue.[/font]

    http://blogs.msdn.com/b/sqlprogrammability/archive/2006/03/29/564110.aspx

  • Frank Hamersley (1/4/2012)


    How weird stuff like this survives in a commercial offering beggars belief (well mine if no-one elses)!

    My implicit expectation - formed in earlier days (aka Fortran IV and CDC ASM albeit with some gray cells "preserved" by vitamin B) - that whilst decimal precision would not neccesarily be increased by multiplication ... it would _never_ be diminished.

    Whomever the bright sparks were that decided an arbitary 6 digit minimum is to be forced on the resultant without any other consideration except adding up the number of significant digits on the input side....I am still gobsmacked that anyone would think this is viable. Slack in my book.

    I guess I am living in earlier days where the computation unit used accumulators that greatly exceeded the accuracy of the inputs for precisely this reason - and this has not been encoded in SQL Server. That said I accept no apologists - as how hard would it be to lock in the decimals and trap any digits overflow occuring (in the significant digits)?

    </rant>

    I too regard it as crazy nonsense, but there are some better ways of addressing it than the one you suggest:

    (1) for multiplication, compute exact result (allowing up to 77 digits in the result); eliminate leading zeroes; if integer part of the result won't fit in 28 digits, throw an error; if it will, and there are fewer than 38 digits including fractional part, return exact result; if there are more than 38 digits, round the fractional part to get down to 38 and if this results in 0 throw an underflow error. Probably "lost significance" errors should be introduced for cases where the integer part is 0 and such a long string of initial zeroes is required after the decimal point that rounding causes a significant proportional error. For division, calculate to 38 significant digits, detecting whether this includes the whole of the integer part (if it doesn't, throw an overflow error) or there are 38 zeroes immediately after the decimal point (if so throw an underflow error). This is rather like your solution but avoids throwing an error when trivial rounding is needed to fit into 38 digits. Of course some programs may rely on the bizarre rounding behaviour of the current numeric type(s) so there would have to be an option to switch that behavious on (a per connection option, not a global one), but of course that option should be deprecated immediately.

    (2) throw away the whole numeric nonsense and introduce base 10 floating point (mantissa and exponent are both binary, but the exponent indicates a power of 10 not of 2) as per the version of the IEEE floating point standard ratified in 2008 as its replacement. Of course this only works if all the hardware is going to support IEEE 754-2008, or there is software support of that standard whenever the hardware doesn't support it. It also means that SQL has to wake up and decide to treat the exception conditions (including lost significance error as well as overflow and underflow), infinities, and NaN of the standard instead of just ignoring them (and the platform below the RDBMS's data engine has to allow it to do so). So it probably can't happen any time soon, which is a great pity. In any case, converting to it could potentially make certain programs which rely on "rounding as you go" (usually with the money datatype, rather than with the numeric type on which it is based) stop working {because, for example, they rely on inequalities like (10.00/3)*5.00 <> (10.00*5.00)/3.00} unless they were rewritten to do any "premature" rounding they want explicitly (or we could have a "rounded decimal float" type, which used decimal float format but did the extra rounding behind the scenes); although I would like to think that no-one writes code like that I suspect people probably do.

    (3) introduce a binary floating point based on the binary128 format of IEEE 754-2008 and get rid of the numeric nonsense, without initially introducing proper exception, NaN, and infinity handling. This would vastly reduce the representation error caused in converting from decimal string format to binary floating point format, and would be vastly less development that 2 above. As with 2, converting to it from exact numerics might involve work to reproduce the strange rounding behaviour of numeric types for sufficiently bizarre programs. This should of course be seen as a first step on the road to 2.

    In everything above, "get rid of" should of course be read as "deprecate for a couple of releases and then remove support".

    Tom

  • SQL Kiwi (1/4/2012)


    Two points: first DECIMAL(77,40) into 38 precision just won't go.

    Clearly not.

    Second, a strongly-typed language requires a type that has a defined precision and scale before the computation occurs.

    There I disagree with you on two counts. Firstly, strong typing no requirement that the precision and scale be fixed before computation occurs (indeed all the languages I know that impose such a requirement are at best weakly typed). Secondly, you appear to be claiming that SQL is strongly typed.

    As far as the decision to choose 6 is concerned, here's a quote from the SQL Server Programmability Team:

    [font="Verdana"]...we try to avoid truncating the integral part of the value by reducing the scale (thus truncating the decimal part of the value instead). How much scale should be sacrificed? There is no right answer. If we preserve too much, and the result of the multiplication of large numbers will be way off. If we preserve too little, multiplication of small numbers becomes an issue.[/font]

    http://blogs.msdn.com/b/sqlprogrammability/archive/2006/03/29/564110.aspx

    Well, they got that wrong. If they ever truncate the integral part of the value instead of raising an error, which seems to me to be implied by their statement that preserving too much scale makes the multiplication of large numbers produce way off results, the have a hopelessly broken type. If they think that they have to determine the scale before carrying out computation, they are hopelesly deluded; and if they don't think that, what is the basis for picking a fixed scale at all in the cases where the precision as calculated by their rules (the only basis I can think of is a decision that doing it right would be too much work)?

    In previous discussions on this topic we've noticed that with the utterly bizarre rules we have here it's possible to obtain an overflow error by multiplying a numeric value by 1, and also by dividing a numeric value by 1. We can do this even if the 1 we are using is typed decimal(p,0). Surely no one can imagine that such bizarre results are remotely acceptable?

    edit: If anyone wants a good survey of modern type theory, there's an excellent paper by Luca Cardelli and Peter Wegner which, although it was written 26 years ago, is still very highly regarded. You can get it from Luca's website, PDF at either here for A4 format or here for American Letter format.

    Tom

  • SQL Kiwi (1/3/2012)


    steven.malone (1/3/2012)


    Very interesting.

    Declaring the decimal variables as DECIMAL(35,20) gives the correct result.

    But declaring the decimal variables as DECIMAL(35,19) gives the rounded result.

    Two DEC(35,20) multiplied gives a result of (71,40). Precision 71 exceeds the available 38 by 33, so scale is reduced by 33 to 7. Result is DEC(38,7) and the result is correct.

    Two DEC(35,19) multiplied gives a result of (71,38). Precision 71 exceeds the available 38 by 33, so scale is reduced by 33 to 5. However, minimum scale is 6, so the result is DEC(38,6) with a risk of very large values causing an error, and the result is rounded to 6 decimal places.

    @paul-2

    I need a clarification here.

    Microsoft article states that the minimum scale of a decimal can be 0.

    http://msdn.microsoft.com/en-us/library/ms187746.aspx

    So I don't get the point where the minimum scale is defined as 6

  • very good question!!!

    thanks Paul!!!


    [font="Times New Roman"]rfr.ferrari[/font]
    DBA - SQL Server 2008
    MCITP | MCTS

    remember is live or suffer twice!
    the period you fastest growing is the most difficult period of your life!
  • prudhviraj.sql (1/4/2012)


    @Paul

    I need a clarification here.

    Microsoft article states that the minimum scale of a decimal can be 0.

    http://msdn.microsoft.com/en-us/library/ms187746.aspx

    So I don't get the point where the minimum scale is defined as 6

    6 isn't the minimum scale of a numeric type, it is the minimum scale of the type of a numeric value which is the direct result of the multiplication of two numeric values whose scales add up to 6 or more. Of course if the result is placed immediately into a variable or a column with a predefined type you never see the type of the direct result. It's also the minimum scale of the type of a numeric which is the direct result of a division where the precision of the divisor plus the scale of the dividend is greater than 4 and also of the type of teh direct result of an addition or subtraction or modulus where either of the two scales of the things being added/subtracted/remaindered is greater than 5.

    See http://msdn.microsoft.com/en-us/library/ms190476.aspx

    Tom

  • Toreador (1/4/2012)

    ... It baffled me when I first came across it, and it baffles me still.

    I put it down to expediency prevailing over sensibility.

    It was seen as too hard to build an ALU (in code) to support the SQL numeric data type that manages overflow and underflow in a consistent and intuative way.

    They were probably concerned about performance first (accuracy second) and so limited it to simple arithmetic on the total number of significant digits - with a minima of 6 decimals thown in "because we reckon no-one will care/notice/perhaps it will get fixed by a later release".

  • Just to jazz things up ... I have refactored the code to make it work on another SQL dialect as follows ...

    DECLARE @n1 DECIMAL(38,20)

    , @n2 DECIMAL(38,20)

    , @n3 REAL

    , @n4 REAL

    , @n5 DOUBLE PRECISION

    , @n6 DOUBLE PRECISION

    SELECT @n1 = 123.4567

    , @n2 = 0.001

    SELECT @n3 = @n1

    , @n4 = @n2

    , @n5 = @n1

    , @n6 = @n2

    SELECT n_decimal = CONVERT(VARCHAR, @n1 * @n2)

    , n_real = CONVERT(VARCHAR, @n3 * @n4)

    , n_double = CONVERT(VARCHAR, @n5 * @n6)

    GO

    .... which produces ....

    (1 row affected)

    (1 row affected)

    n_decimal n_real n_double

    ------------------------------ ------------------------------ ------------------------------

    0.1234567000000000000000000000 .12345670908689499 .1234567

    (1 row affected)

    ... which is what I expected and suggests that a competent solution is possible if you care enough!

    FWIW if I get a chance I will run it on some other (mainstream) platforms at hand.

  • The various schemes for encoding numbers all have advantages and disadvantages. SQL Server uses a decimal type that has fixed precision and scale. All expressions have a well-defined type, and for SQL Server that means fixed precision and scale if that type is decimal. For example, the computed column in the following table has a type of DECIMAL(19,8):

    CREATE TABLE dbo.Example

    (

    col1 DECIMAL(9,4) NULL,

    col2 DECIMAL(9,4) NULL,

    col3 AS col1 * col2

    )

    GO

    EXECUTE sys.sp_columns

    @table_owner = N'dbo',

    @table_name = N'Example',

    @column_name = N'col3'

    There are many quirks to SQL Server, including the way it handles rounding, truncation, and conversions. In many cases these quirks are preserved to avoid breaking existing applications. That's a purely practical matter, and doesn't imply that everyone is happy about the state of affairs, or wouldn't approach things differently if done again. On that note, the proper place to suggest improvements or alternatives is Connect. By and large, the people that make decisions about future product directions do not carefully read QotD comments.

    The point of this QotD is very much to emphasise that using excessive precision or scale can have unintended consequences. Very few real-world uses would require anything like the DECIMAL(38,20) types specified in the question. Using appropriate types (which in SQL Server can include both precision and scale) is important.

    As far as I recall, the issue with multiplying or dividing by one was a bug in type inference, which has since been fixed.

  • SQL Kiwi (1/4/2012)

    There are many quirks to SQL Server

    A bad thing tm!

    SQL Kiwi (1/4/2012)

    The point of this QotD is very much to emphasise that using excessive precision or scale can have unintended consequences.

    And has done so in spades!

    SQL Kiwi (1/4/2012)

    Very few real-world uses would require anything like the DECIMAL(38,20) types specified in the question.

    I beg to differ - Financial Markets where big $ times %ages times year fractions - happens all the time. "Consequences" like this one would be real pain - so thanks again for raising it - fortunately it doesn't happen in the platforms I commonly use!

    BTW - even 11.9.2 behaves per my prior post!

  • Frank,

    My prior post was written before I saw your latest, so it wasn't aimed at the points you raised. I'm going to decline getting into a debate about whether DECIMAL(38,20) is common or not (though I do work at a hedge fund right now, and we use it in a couple of places, though most stuff uses lower numbers, and floats (double precision) are also common), because that discussion generally goes nowhere productive fast. BTW 11.9.2 means nothing to me, just so you know 🙂

  • SQL Kiwi (1/4/2012)


    Frank,

    My prior post was written before I saw your latest, so it wasn't aimed at the points you raised. I'm going to decline getting into a debate about whether DECIMAL(38,20) is common or not (though I do work at a hedge fund right now, and we use it in a couple of places, though most stuff uses lower numbers, and floats (double precision) are also common), because that discussion generally goes nowhere productive fast. BTW 11.9.2 means nothing to me, just so you know 🙂

    G'day Paul, No problems about the uses (or not) - I would guess in many FM situations these days computation may well be done outside the db and simply stored (thin) ... reducing the "risk" this QOTD will materialise. A recent engagement of mine saw the database sproc side was fat with lots of number crunching for numeric datatypes. Of course in risk rather than accounting float/double may well be good enough.

    11.9.2 is Sybase ASE - which is the oldest version that I can lay hands on as in indicator of the common code base that became SQL Server and I was curious to see if it also (mis)behaved. Given it didn't it seems the 6 decimal minima is a Redmond specific enhancement.

  • Frank Hamersley (1/5/2012)


    11.9.2 is Sybase ASE - which is the oldest version that I can lay hands on as in indicator of the common code base that became SQL Server and I was curious to see if it also (mis)behaved. Given it didn't it seems the 6 decimal minima is a Redmond specific enhancement.

    Interesting. The documentation looks very similar between ASE 15 and SQL Server:

    http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.ase_15.0.blocks/html/blocks/blocks14.htm

    http://msdn.microsoft.com/en-us/library/ms190476.aspx

    The magic number 6 appears for division in both, but not multiplication.

  • SQL Kiwi (1/4/2012)


    As far as I recall, the issue with multiplying or dividing by one was a bug in type inference, which has since been fixed.

    Well, I'm on an XP platform so I can't test if it is fixed in Denali, but it certainly isn't fixed in SQL 2008 R2 with latest updates.

    Maybe someone could try this in Denali:

    use tempdb

    set nocount off

    go

    declare @t table (A decimal(38,5) primary key, B decimal(2,1))

    insert @t(A,B) values((1234567890.0*10000000000.0*10000000000*1000)+0.12345,1.0)

    select A,B from @t

    select A*B from @t

    and see if they get this result in the messages tab of ssms:-

    (1 row(s) affected)

    (1 row(s) affected)

    Msg 8115, Level 16, State 2, Line 4

    Arithmetic overflow error converting expression to data type numeric.

    which clearly shows multiplication by one generating overflow. And the similar overflow for division can be demonstrated by changing "*" in the last line of that code to "/".

    You are right that it's a type inference problem, of course - if you have types where the precision and scale (in the T-SQL sense; T-SQL makes awfully inaccurate use of both those terms, of course, so this is far from the normal sense) are part of the type definition and you also allow type inference to decide the type of a computed column in a table definition so that the definition doesn't have to specify that type you will have to have a mechanism for allowing a special type inference to decide precision and scale; that becomes a problem when you allow that same special mechanism to rule when the result of the computation is being assigned to a column or variable of known type, because overflow should be determined by whether the value resulting can be represented in that known type, not on whether some rather pointless kludge to avoid explicitly defining the type of a computed column when defining the table that contains it would implicitly produce a type that would allow the value to be represented.

    Tom

Viewing 15 posts - 31 through 45 (of 55 total)

You must be logged in to reply to this topic. Login to reply