SQLServerCentral Article

Celko's Summer SQL Stumpers: Prime numbers

,

Joe Celko kicks off our series of Summer SQL Stumpers with a challenge to improve on his solution to calculating the prime numbers between 1 and 10000. Once the various solutions have been contributed and judged, the winner of a $100 Amazon Voucher will be announced. The competition will be run on Simple-Talk and SQL Server Central together.

A PRIME SQL PUZZLE

I was teaching SQL classes for YAPC-10 (“Yet Another PERL Conference” #10) at Carnegie Mellon University at the end of June 2009.  For the record, I have never used PERL and had to Google up an overview before I went; it is a very different creature from SQL. 

One of my students asked if you could write an SQL statement to generate the prime numbers less than 1000 (or any other limit) that scales well.  He was bothered by the lack of loops in SQL and a Prime Number sieve is a common PERL programming exercise.  You can Google it and see an animation at Eratosthenes' sieve and some PERL code at Sieve of Eratosthenes with closures

My immediate answer was “sure, but you might have to use a recursive CTE to replace the loop.  Later I realized that was a really bad answer; you don’t need recursion, just a little math.  There are two useful facts from Number Theory:

  1. The prime factors of a given number (n) cannot be greater than ceiling (vn).  Think about it; by definition (vn * vn)) = n, and by definition, ceiling (vn) >= floor(vn) so integer rounding up will be safe.  This says that if I look at (a * b = c) where (a < b), then I don’t have to look at (b * a = c), so I can start searching for prime factors with small values. 

  2. All primes are of the form (6 * n ± 1), but not all number of that form are Primes.  For example (n = 1) gives us {5, 7} and they are both primes.  But for (n = 4) gives us {23, 25} where (25 = 5 * 5).  What this does is remove the multiples of 2 and 3 from consideration.

Let’s get all of that into SQL statements.  Let’s start with a table for the primes:

CREATE TABLE Primes
(p INTEGER NOT NULL PRIMARY KEY
  CHECK
(p > 1));

Now, your puzzle is to fill the table up to some limit, say 1000 just to keep it simple. 

 ANSWERS:

Let’s assume we have a table named Sequence with integers from 1 to (n) that we can use.  This is a common SQL programming idiom, so you don’t have to feel bad about using it.

CREATE TABLE Sequence
(seq INTEGER NOT NULL PRIMARY KEY
CHECK (seq  > 0));

There are lots of ways of filling this table, but here is one I like:

WITH Digits(i)
AS (SELECT i
  
FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (0)) AS X(i))
INSERT INTO Sequence(seq)
SELECT
(D3.i * 1000 + D2.i * 100 + D1.i * 10 + D0.i + 1) AS seq
    
FROM Digits AS D0, Digits AS D1, Digits AS D2, Digits AS D3;

This template is easy to extend and the “.. + 1” gets rid of the zero. 

ANSWER #1

For the first attempt, let’s load the Primes table with candidate numbers using math fact #2 from above. 

INSERT INTO Primes (p)
(
SELECT (6 * seq) + 1
  
FROM Sequence
WHERE (6 * seq) + 1 <= 1000
UNION ALL
SELECT (6 * seq) - 1
  
FROM Sequence
WHERE (6 * seq) + 1 <= 1000);

An improvement which gets rid of the UNION ALL uses a table constant:

INSERT INTO Primes (p)
SELECT (6 * seq) + S.switch
  
FROM Sequence
      
CROSS JOIN
     
(SELECT switch
        
FROM (VALUES (-1), (+1))
      
AS F(switch))S
  
WHERE (6 * seq) + 1 <= 1000;

Now we have too many rows in Primes and need to remove the non-primes.  Now math fact #1 can come into play; test the set of numbers less than the square root to see if there is a factor among them. 

DELETE FROM Primes
WHERE EXISTS
  (
SELECT *
    
FROM Primes AS P1
    
WHERE P1.p <= CEILING (SQRT (Primes.p))
      AND
(Primes.p % P1.p) = 0);

ANSWER #2

Another way to load the candidates into Primes is to have the first few known primes hardwired into a query.  This is a generalization of the math fact #2, which dealt with multiples of only 2 and 3. 

INSERT INTO Primes (p)
SELECT seq
  
FROM Sequence
  
WHERE 0 NOT IN (seq % 2, seq % 3, seq % 5, seq % 7, .. );

The idea is that if we can limit the candidate set for Primes, performance will improve.  At the extreme, if the list of “MOD (seq, <prime>)” expressions goes to a value equal or higher than the upper limit we are looking at, we get the answer immediately.

This is a good trick; many SQL programmers think that an IN() list can only be constants.  You might also want to look at how many values it can hold –It is larger than you think. 

Another candidate pruning trick is based on the math fact that integers with final digits {2, 4, 6, 8, 0} are even numbers; those with final digits {5, 0} are multiples of five. Let’s not look at them when we build a candidate table.

WITH Digits(i)
AS (SELECT i
  
FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (0)) AS X(i)
   )
INSERT INTO Sequence(seq)
SELECT (D3.i * 1000 + D2.i * 100 + D1.i * 10 + Units.i)
  
FROM (SELECT i
  
FROM (VALUES (1), (3), (7), (9)) AS X(i)) AS Units,
      
Digits AS D1, Digits AS D2, Digits AS D3

ANSWER #3

Another approach is to generate all of the non-primes and remove them from the Sequence table.

INSERT INTO Primes (p)
(SELECT seq FROM Sequence WHERE seq <= 1000)
EXCEPT
(SELECT (F1.seq * F2.seq) AS composite_nbr
  
FROM Sequence AS F1, Sequence AS F2
WHERE F1.seq BETWEEN 2 AND CEILING (SQRT (1000))
  AND
F2.seq BETWEEN 2 AND CEILING (SQRT (1000))
  AND
F1.seq <= F2.seq
  
AND (F1.seq * F2.seq) <= 1000)

Obviously, the Sequence table in the left hand clause could be anyone of the trimmed candidate tables we previously constructed.  

What answers to do you have? As a hint, there are faster but more complicated algorithms, like the Sieve of Atkin and the various Wheel Sieves.

We have attached files containing code that runs in SQL 200 SQL 2005 and SQL 2008. Joe's code in the article  has been fixed to run in SQL 2008.

The best answer to each stumper will be given a prize of a $100 Amazon voucher. The stumper will be run simultaneously on  SQL Server Central and Simple-Talk. To see all the comments so far, you will need to visit both sites. We will take entries for a week after the first Monday of publication,  posted as comments to the articles. Two weeks after the challenge is sent out, the judge's decision and comments will be sent out in the newsletter, and published on the site.

Joe Celko and Phil Factor will judge the answers to this puzzle. Your answer should :
   1) Solve the problem -- Duh!
   2) Avoid proprietary features in SQL Server that will not port or be good across
        all releases, present and future.
   3) Use Standard features in SQL Server that will be good across all releases,
        present and future. Extra points for porting code.
   4) Be clever but not obscure.
   5) Explain how you got your answer.  

Resources

Rate

3.6 (5)

You rated this post out of 5. Change rating

Share

Share

Rate

3.6 (5)

You rated this post out of 5. Change rating