/* A PRIME SQL PUZZLE I was teaching SQL classes for YAPC-10 (“Yet Another PERL Conference” #10) at Carnegie Mellon University at the end of June 2009. For the record, I have never used PERL and had to Google up an overview before I went; it is a very different creature from SQL. One of my students asked if you could write an SQL statement to generate the prime numbers less than 1000 (or any other limit) that scales well. He was bothered by the lack of loops in SQL and a Prime Number sieve is a common PERL programming exercise. You can Google it and see an animation http://www.hbmeyer.de/eratosiv.htm and some PERL code at http://www.perlmonks.org/?node_id=276103. My immediate answer was “sure, but you might have to use a recursive CTE to replace the loop. Later I realized that was a really bad answer; you don’t need recursion, just a little math. There are two useful facts from Number Theory: 1) The prime factors of a given number (n) cannot be greater than ceiling (?n). Think about it; by definition (?n * ?n)) = n, and by definition, ceiling (?n) >= floor(?n) so integer rounding up will be safe. This says that if I look at (a * b = c) where (a < b), then I don’t have to look at (b * a = c), so I can start searching for prime factors with small values. 2) All primes are of the form (6 * n ± 1), but not all number of that form are Primes. For example (n = 1) gives us {5, 7} and they are both primes. But for (n = 4) gives us {23, 25} where (25 = 5 * 5). What this does is remove the multiples of 2 and 3 from consideration. Let’s get all of that into SQL statements. Let’s start with a table for the primes: */ CREATe TABLE Primes (p INTEGER NOT NULL PRIMARY KEY CHECK (p > 1)); /* Now, your puzzle is to fill the table up to some limit, say 1000 just to keep it simple. ANSWERS: Let’s assume we have a table named Sequence with integers from 1 to (n) that we can use. This is a common SQL programming idiom, so you don’t have to feel bad about using it. */ CREATE TABLE Sequence (seq INTEGER NOT NULL PRIMARY KEY CHECK (seq > 0)); /* There are lots of ways of filling this table, but here is one I like: */ WITH Digits(i) AS (SELECT i FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (0)) AS X(i)) INSERT INTO Sequence(seq) SELECT (D3.i * 1000 + D2.i * 100 + D1.i * 10 + D0.i + 1) AS seq FROM Digits AS D0, Digits AS D1, Digits AS D2, Digits AS D3; /* This template is easy to extend and the “.. + 1” gets rid of the zero. ANSWER #1 For the first attempt, let’s load the Primes table with candidate numbers using math fact #2 from above. */ INSERT INTO Primes (p) (SELECT (6 * seq) + 1 FROM Sequence WHERE (6 * seq) + 1 <= 1000 UNION ALL SELECT (6 * seq) - 1 FROM Sequence WHERE (6 * seq) + 1 <= 1000); --An improvement which gets rid of the UNION ALL uses a table constant: --TRUNCATE TABLE primes INSERT INTO Primes (p) SELECT (6 * seq) + S.switch FROM Sequence CROSS JOIN (SELECT switch FROM (VALUES (-1), (+1)) as F(switch)) AS S WHERE (6 * seq) + 1 <= 1000; --Now we have too many rows in Primes and need to remove the non-primes. Now math fact #1 can come into play; test the set of numbers less than the square root to see if there is a factor among them. DELETE FROM Primes WHERE EXISTS (SELECT * FROM Primes AS P1 WHERE P1.p <= CEILING (SQRT (Primes.p)) AND (Primes.p % P1.p) = 0); /* ANSWER #2 Another way to load the candidates into Primes is to have the first few known primes hardwired into a query. This is a generalization of the math fact #2, which dealt with multiples of only 2 and 3. */ INSERT INTO Primes (p) SELECT seq FROM Sequence WHERE 0 NOT IN (seq %2,Seq % 3) /* The idea is that if we can limit the candidate set for Primes, performance will improve. At the extreme, if the list of “MOD (seq, )” expressions goes to a value equal or higher than the upper limit we are looking at, we get the answer immediately. This is a good trick; many SQL programmers think that an IN() list can only be constants. You might also want to look at how many values it can hold –It is larger than you think. ANSWER #3 Another candidate pruning trick is based on the math fact that integers with final digits {2, 4, 6, 8, 0} are even numbers; those with final digits {5, 0} are multiples of five. Let’s not look at them when we build a candidate table. */ INSERT INTO Primes (p) ( WITH Digits(i) AS (SELECT i FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (0)) AS X(i) SELECT (D3.i * 1000 + D2.i * 100 + D1.i * 10 + Units.i) FROM (SELECT i FROM (VALUES (1), (3), (7), (9))) AS X(i) AS Units, Digits AS D1, Digits AS D2, Digits AS D3); ANSWER #4 Another approach is to generate all of the non-primes and remove them from the Sequence table. INSERT INTO Primes (p) (SELECT seq FROM Sequence WHERE seq <= 1000) EXCEPT (SELECT (F1.seq * F2.seq) AS composite_nbr FROM Sequence AS F1, Sequence AS F2 WHERE F1.seq BETWEEN 2 AND CEILING (SQRT (1000)) AND F2.seq BETWEEN 2 AND CEILING (SQRT (1000)) AND F1.seq <= F2.seq AND (F1.seq * F2.seq) <= 1000) Obviously, the Sequence table in the left hand clause could be anyone of the trimmed candidate tables we previously constructed. What answers to do you have? As a hint, there are faster but more complicated algorithms, like the Sieve of Atkin and the various Wheel Sieves. TRUNCATE TABLE sequence go WITH Digits(i) AS (SELECT i FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (0)) AS X(i) ) INSERT INTO Sequence(seq) SELECT (D3.i * 1000 + D2.i * 100 + D1.i * 10 + Units.i) FROM (SELECT i FROM (VALUES (1), (3), (7), (9)) AS X(i)) AS Units, Digits AS D1, Digits AS D2, Digits AS D3