SQL for 8-year-old Math…But Be Careful of Implicit Type Conversion

,

Solving a Math Problem

The Guardian published a story (http://gu.com/p/494q6/stw) with a math problem, supposedly for Vietnamese eight-year-olds, where the solution required filling in a digit from 1-9 in each of 9 boxes (using each numeral only once), where the resulting total from the operations as specified by the other boxes equaled the specified value. (The : indicates a division operation. Standard order of operations must be followed.)

Math problem from theguardian.com
From article at theguardian.com

Since the solution seems to require a brute-force sort of approach, I thought it would be fun to write a simple SQL script to generate the solution.  It seemed like a Cartesian join would be a useful tool to generate the possible values to try.

First, working from the top left I decided to refer to each box as:   a, b, c, d, e, f, g, h, i.

Next, I decided to create a numbers table to hold the numerals 1-9.  (I know there are lots of ways to create a number table without looping, but for readability I am intentionally using looping here.)

CREATE TABLE #num(n float)
DECLARE @i int
SET @i = 1
WHILE @i < 10 BEGIN
  INSERT INTO #num (n) SELECT @i
  SET @i = @i + 1
END

Then I created the SELECT query:

SELECT
  a.n AS a,
  b.n AS b,
  c.n AS c,
  d.n AS d,
  e.n AS e,
  f.n AS f,
  g.n AS g,
  h.n AS h,
  i.n AS i
FROM
  #num a
  JOIN #num b ON
    b.n NOT IN (a.n)
  JOIN #num c ON
    c.n NOT IN (a.n, b.n)
  JOIN #num d ON
    d.n NOT IN (a.n, b.n, c.n)
  JOIN #num e ON
    e.n NOT IN (a.n, b.n, c.n, d.n)
  JOIN #num f ON
    f.n NOT IN (a.n, b.n, c.n, d.n, e.n)
  JOIN #num g ON
    g.n NOT IN (a.n, b.n, c.n, d.n, e.n, f.n)
  JOIN #num h ON
    h.n NOT IN (a.n, b.n, c.n, d.n, e.n, f.n, g.n)
  JOIN #num i ON
    i.n NOT IN (a.n, b.n, c.n, d.n, e.n, f.n, g.n, h.n)
WHERE
  a.n + 13 * b.n / c.n + d.n + 12 * e.n - f.n - 11 + g.n * h.n / i.n - 10 = 66

Essentially, we are joining to the numbers table 9 times, and using the ON criteria of the join to prevent numerals from being used more than once.

In the WHERE clause, we have simply stated the original problem, using  a.n, b.n and so forth to represent the empty boxes we are trying to fill.

Executing this query returns 128 rows, something like this:

a b c d e f g h i
5 3 1 7 2 6 9 8 4
5 3 1 7 2 6 8 9 4
6 3 1 9 2 5 8 7 4

....with 125 additional rows.  In other words, SQL says that there are 128 possible solutions to the problem.  (Note: this includes 16 rows that do total 66.00000000, but involve irrational numbers. Technically these are only "approximately" right. See below for an explanation.)

We could check this out by testing the values that were returned.  For the first row:

PRINT (5) + 13 * (3) /(1) + (7) + 12 * (2) - (6) - 11 +(9) * (8) / (4) - 10

Sure enough--the result is 66.

Note that the parenthesis are not needed: I include them here just to show where we substituted values that were returned in each row.

Conclusion

SQL is a pretty good way to solve problems like this that require a brute force, trial-and-error approach.

By using a number table with JOINs we can easily generate a set of numbers that conform to the specified criteria: in this case, using each number only once. And by using a WHERE clause, we can easily select the combination of numbers that produced the desired solution.

This query tested 362,880 possible combinations of digits...in just a few seconds.

Surprise #1: Implicit Type Conversion

An associate of mine says that "Programming is an art form that fights back." I often find this to be true. This math problem is one example of that.

Originally, when I created my numbers table I used a data type of int. This makes sense, because in this problem we are only allowed to use integers.

But when I ran the query, I was surprised to see erroneous results. The query returned 2672 rows, the first one of which was:

a b c d e f g h i
1 2 3 4 6 5 7 9 8

When we test this, this is what we see:

PRINT (1) + 13 * (2) /(3) + (4) + 12 * (6) - (5) - 11 +(7) * (9) / (8) - 10

SQL does say the answer to this is 66. But if we use a calculator, we get a result of 67.5417! Yikes! If we had printed out our results and turned it in to the teacher we would have failed the assignment. What is going on?

The problem here is that SQL uses implicit data type conversion. To illustrate, consider this simple example:

DECLARE @x varchar(100)
SET @x = '1'
PRINT @x + 2

This returns the result of 3. Notice that @x is really a character string...and you can't really perform an arithmetic operation on a character string. SQL was kind enough to automatically (implicitly) convert the string to a number for us before performing the calculation. Very convenient and intuitive, right?

Well, not always. Watch what happens if we make a small change to the query:

DECLARE @x varchar(100)
SET @x = '1.1'
PRINT @x + 2

This returns an error!

Msg 245, Level 16, State 1, Line 3

Conversion failed when converting the varchar value '1.1' to data type int.

What? Why is SQL trying to convert '1.1' to an integer? It is obvious that this is a floating point value. Furthermore we haven't indicated in our query that SQL ought to be working with integers instead of floating point values.

Well, actually, we have indicated that SQL ought to be working with integers. SQL sees the "+ 2", and says "Hmm. 2 is an integer. Therefore the caller wants the operation to be performed on integers. Therefore I must cast the string '1.1' to an integer...but that results in an error". The type of integer is implied (hence the phrase "implicit type conversion").

If we change the query slightly one more time, we can make it work more like we expected:

DECLARE @x varchar(100)
SET @x = '1.1'
PRINT @x + 2.0

This returns the result of 3.1. The only thing we did different here was to use 2.0 instead of 2. But this was enough to imply to SQL that we wanted to work with floating point values instead of integer, so then SQL's implicit type conversion converted our string '1.1' to the floating point value 1.1 as we desired.

What is more, when SQL performs an operation on two integers, if the result contains a fraction, the fractional portion is discarded. The result is NOT rounded. For example:

PRINT 2 / 3

This returns a 0. We know that 2/3 really equals .666667, and we know that if we were to round to the nearest integer the result of 2/3 would be 1. But for integer-based calculations SQL simply discards the fractional portion, so 0.666667 gets truncated to simply 0.

If we change the query slightly, SQL will return a floating point result:

PRINT 2.0 / 3

This returns 0.666666 But wait! Shouldn't this be rounded to 0.666667? Well, even when working with floating point values SQL will not round unless we explicitly tell it to.

DECLARE @a decimal(10,1)
DECLARE @b decimal(10,1)
SET @a = 2
SET @b = 3
PRINT @a / @b

Result: 0.666666666666 (even though our data type of decimal(10,1) would seem to imply we only want a single decimal place).

DECLARE @a decimal(10,1)
DECLARE @b decimal(10,1)
SET @a = 2
SET @b = 3
PRINT ROUND(@a / @b, 1)

Result: 0.700000000000 This time SQL rounded, but still returns a bunch of trailing zeros.

DECLARE @a decimal(10,1)
DECLARE @b decimal(10,1)
SET @a = 2
SET @b = 3
PRINT CAST(ROUND(@a / @b, 1) AS decimal(10,1))

Result: 0.7 This is time SQL omitted the trailing zeros, because we explicitly casted the result to decimal(10,1)

How about if we try to tell SQL what we want the data type of the result to be, such as:

PRINT CAST(2/3 AS float)

Nope. This returns 0. Why? Because implicit type conversion indicates that the arithmetic operation is to be performed on integers (resulting in 0), and we then cast 0 to a floating point value.

From these tests, we can identify the following rules to follow when performing math calculations:

  • Rule 1: SQL will always try to determine the data type involved in each arithmetic operation, and will perform the calculation according to that type.
  • Rule 2: If multiple data types are involved, SQL may use a different data type than you expect, thus causing unexpected results
  • Rule 2: SQL will always truncate (and not round) results of arithmetic operations, unless rounding is explicitly requested.
  • Rule 3: SQL may increase the precision of the data type for the result of an operation. This may mean that you need to explicitly cast the result back to the desired data type to achieve the desired number of decimals.
  • Rule 4: Explicit typecasting of the result of an arithmetic operation does not affect implicit type conversion of the values in the operation itself.
  • Recommendation #1: It is best to explicitly indicate the type of each value involved in an arithmetic operation, either via data type of the column or variable, or via an explicit CAST or CONVERT. Failing to do this can lead to unexpected results
  • Recommendation #2: For greatest safety, always use variables or columns instead of literal values.  This will help eliminate unexpected type conversion surprises.

Surprise #2: Different data types produce different results

In our math problem, if we use a data type of int, SQL performs each operation as an integer operation. Effectively this means that each operation that would result in a decimal value has the fractional portion of the number discarded (not even rounded)...which results in the unexpected results our query returned.  It makes sense that using an integer type would produce incorrect results.

It also makes sense that mixing data types within the calculation could produce unexpected results due to implicit type conversion.  But here we are carefully using consistent, explicit data types and still seeing strange results.

It would seem that a float data type, as well as all flavors of decimal data types should yield the correct results...but this is not the case.

The problem is not as simple as just rounding.  The data type used makes a big difference. as we will see below.

Try changing int to other data type:

IF OBJECT_ID('tempdb..#num') IS NOT NULL BEGIN
  DROP TABLE #num
END
CREATE TABLE #num(n int) --Replace int with other types and re-run to see the difference!
DECLARE @i int
SET @i = 1
WHILE @i < 10 BEGIN
  INSERT INTO #num (n) SELECT @i
  SET @i = @i + 1
END
DECLARE @C1 int --Replace int with other types and re-run to see the difference!
DECLARE @C2 int --Replace int with other types and re-run to see the difference!
DECLARE @C3 int --Replace int with other types and re-run to see the difference!
DECLARE @C4 int --Replace int with other types and re-run to see the difference!
DECLARE @R  int --Replace int with other types and re-run to see the difference!
SET @C1 = 13
SET @C2 = 12
SET @C3 = 11
SET @C4 = 10
SET @R = 66
SELECT COUNT(*)
FROM
  #num a
  JOIN #num b ON
    b.n NOT IN (a.n)
  JOIN #num c ON
    c.n NOT IN (a.n, b.n)
  JOIN #num d ON
    d.n NOT IN (a.n, b.n, c.n)
  JOIN #num e ON
    e.n NOT IN (a.n, b.n, c.n, d.n)
  JOIN #num f ON
    f.n NOT IN (a.n, b.n, c.n, d.n, e.n)
  JOIN #num g ON
    g.n NOT IN (a.n, b.n, c.n, d.n, e.n, f.n)
  JOIN #num h ON
    h.n NOT IN (a.n, b.n, c.n, d.n, e.n, f.n, g.n)
  JOIN #num i ON
    i.n NOT IN (a.n, b.n, c.n, d.n, e.n, f.n, g.n, h.n)
WHERE
  a.n + @C1 * b.n / c.n + d.n + @C2 * e.n - f.n - @C3 + g.n * h.n / i.n - @C4 = @R

I wrote a dynamic SQL query to do that for many numeric data types.  (See attached file for the actual query.)  Note that I am using MSSQL 2008R2.  This is what I found:

Accuracy by Data Type
Data Type Rows Returned Notes
int 2672 Incorrect:  Returns lots of rows that do not match the requirements due to rounding.  Behavior makes sense.
float 128 Imprecise:  Returns 16 extra rows that contain irrational numbers (repeating decimals) like .666666 that may not exactly match the requirements--though they do approximately match the requirements. (These do total 66.00000000. Confusing, but consistent with what Microsoft says:  "Avoid using float or real columns in WHERE clause search conditions, especially the = and <> operators."

decimal:

too small

0

Overflow:  If the data type is too small to hold the number, we expect to get an overflow error.  Makes sense.  Happens any time Precision - Scale < 2, because our variables neet to store 2-digit integer values.

decimal:

working

112

Correct.  These are the results that we expect ALL properly-sized decimal types to return.

money 112 Correct.  These are the results that we expect.

decimal:

approximate

136 INCORRECT.  Some decimal types that have too small a scale in comparison to their precision incorrectly return 24 extraneous rows.  In other words these data types behave kind of like "approxamate" float, but allow even more rows. Why??
decimal:

broken

260 INCORRECT.  Two decimal types, (18,0) and (38,1) return 148 extraneous rows. The values in these extra rows don't even total 66.00 Why?

decimal:

really broken

2900 INCORRECT.  Decimals with precision greater than 18 and scale of 0 are wrong...kind of like int behavior, but worse.  Again, these values don't even total 66.00  Why are they included?

Float

Regarding the float type, the 16 additional rows that are returned when we use the float data type all involve infinite repeating decimals (known as irrational numbers).  This is why these rows do not "exactly" match our requirements, though they do "approximately" match (i.e. they do total 66.0000000).

Inexact Decimal Types

The really strange results involve the so-called "exact" decimal data type, but seem to return "approximate" matches. Some flavors of this type, for example decimal(16,0). result in more rows than we expect.

When we look at the specific numbers returned for one of these types such as decimal(16,0), we see that they are sort of like float, in that they return numbers that do total out to 66.00000000, but they do involve some irrational numbers.

For example, one such row is:

1 8 3 7 4 5 2 6 9

If we perform the mutiplication and division first (as per standard order of operations), and then look at the intermediate results before we add and subtract, we see these numbers result in:
1.00000000 34.66666667 7.00000000 48.00000000 -5.00000000 -11.00000000 1.33333333 -10.00000000
When added together, these intermediate values do total 66.00000000.  But it seems that if the 16 additional "approximate" rows from using the float type vs. most of the decimal types are excluded from the "exact" results, it seems that these extraneous rows should be excluded whenver we use the decimal types, including decimal(16,0)
Incorrect Decimal Types
Still worse, some of what I labed the "broken" types above actualy return "wrong" results that do not even total 66.00
For example, look at an extraneous row returned by decimal(38,1):
2 8 9 5 6 4 3 1 7

When I cacluate this, the total is not 66, but is rather 65.98412698  The intermediate results from the calculation are:
2.00000000 11.55555556 5.00000000 72.00000000 -4.00000000 -11.00000000 0.42857143 -10.00000000
I understand that if we cast 65.98412698 to a decimal(38,1) we will get 66.0  But the same would be true if we were casting to a decimal(16,1)...and the decimal(16,1) excludes incorrect rows such as this one.  The decimal(38,1) includes them.  Why the different behavior?

To help depict this, I created this graph in Excel.  The horizontal axis shows the precision of the decimal data type, while the vertical axis shows the scale.  The color indicates the accuracy:  black shows the overflow conditions that resulted in an error; green shows correct results; red shows the unexpected error results.  (See attached Excel file for details.)

Decimal Data Type Accuracy Chart

To read this chart, from the lower left it indicates that the following data types result in incorrect results:  decimal(16,0), decimal(17,0), decimal(17,1), decimal(18,0), decimal(18,1), decimal(18.2), etc.

Surprise #3:  I'm actually giving up

I set out to write a simple article about using SQL to solve a grade school math problem, and then I found myself deep into implicit data type conversion issues.  Worse, I ended up getting stuck trying to figure out why 72 flavors of the decimal data type result in the incorrect results while 225 flavors work fine.

Microsoft says:  "The decimal data type stores an exact representation of the number; there is no approximation of the stored value."  This page also clarifies the distinction between exact and approximate data types.  Yet in my testing it seems that some decimal types yield "approximate" results, and some yield inaccurate results.

So I am stumped.  I'm sure you readers will have some good insight as to what is going on here.

Summary

SQL can be a good way to solve trial-and-error kinds of problems. But, be careful: SQL's implicit type conversion can result in confusing or erroneous answers. Getting in the habit of explicitly indicating the desired data type of each value involved in an arithmetic calculation will help avoid errors.  Even so, sometimes mysteries arise.

Resources

Rate

4.57 (37)

Share

Share

Rate

4.57 (37)