SQLServerCentral Article

Calculating Prime Numbers With One Query

,

Introduction

I want to share the solution to the problem of searching for prime numbers using T-SQL. The problem was proposed on a reputable educational web resource. The problem is defined as follows: write all prime numbers less than or equal to 1000.

Of course, you can create a loop where numbers are sequentially checked against the definition of prime numbers and then displayed. But this is ugly and not interesting. It would be nice to formulate a single query that will solve the problem. Let's try to do this.

Getting a set of numbers

How do I get a set of numbers that will then be checked to see if they are prime numbers? Tables are not suitable - after all, we need to solve the problem with only one query. How to do it? There is such a magic thing as recursive common table expression (CTE). This construction allows you to generate datasets, according to given rules. You can generate not only sets of numbers. In my projects, I used CTE to generate a table of dates with specific tokens, but that's a separate story.

Here's what the generator of a sequence of numbers from 1 to 1000 inclusive looks like:

WITH NUMBER_SEQUENCE(C0) AS
(
SELECT 1 AS C0
UNION ALL
SELECT C0 + 1 AS C0 FROM NUMBER_SEQUENCE WHERE NUMBER_SEQUENCE.C0 < 1000
)
SELECT C0 FROM NUMBER_SEQUENCE
ORDER BY C0
OPTION (MAXRECURSION 0);

Calculating a set of numbers and their divisors

A number is prime if the remainder of the division of the number by any permissible divisor is not zero. Permissible divisors are such numbers that are not equal to 1, the number itself checked and less than or equal to half of the number checked (in terms of integer division).

First, let's form a set of numbers and their admissible divisors. The set must contain lines of verifiable numbers by the number of permissible divisors, i.e. one verifiable number corresponds to n lines of the number of permissible divisors.

Here's how we can get a set of numbers (from 2 to 100 inclusive, because 2 is the first prime number) and their permissible divisors using the previously described number generator:

WITH NUMBER_SEQUENCE(C0) AS
(
SELECT 2 AS C0
UNION ALL
SELECT C0 + 1 AS C0 FROM NUMBER_SEQUENCE WHERE NUMBER_SEQUENCE.C0 < 100
)
SELECT T1.C0 AS C0, T0.C0 AS C1
FROM NUMBER_SEQUENCE AS T0, NUMBER_SEQUENCE AS T1
WHERE T1.[C0] / 2 >= T0.C0 OR T1.C0 = 2 OR T1.C0 = 3
ORDER BY T1.C0, T0.C0
OPTION (MAXRECURSION 0);

The basic idea is to get the Cartesian product of two sets (CROSS JOIN) obtained by a number generator:

FROM NUMBER_SEQUENCE AS T0, NUMBER_SEQUENCE AS T1

Table T0 contains divisors, and table T1 contains checkable numbers. But we must somehow limit the set of divisors, select only valid divisors. Divisors equal to 1 are not automatically included in the resulting set, because the generator starts with 2. Dividers equal to the test number are also excluded, because only divisors less than or equal to half of the test number are selected:

WHERE T1.[C0] / 2 >= T0.C0

But there are two magic equations in the WHERE clause:

T1.C0 = 2 OR T1.C0 = 3

It turns out that numbers 2 and 3 are exceptions when using this method to determine prime numbers, so these two numbers are included in the resulting set explicitly.

In the FROM Clause, instead of CROSS JOIN, we can more elegantly structure this to obtain the set of numbers to be checked and their admissible divisors:

FROM NUMBER_SEQUENCE AS T0 INNER JOIN NUMBER_SEQUENCE AS T1 ON T1.[C0] / 2 >= T0.C0 OR T1.C0 = 2 OR T1.C0 = 3

Solution

We only need to determine the remainder of the integer division of the tested numbers by their possible divisors and determine whether there is a remainder equal to 0. If there is, then the number is not prime. The final query looks like this:

WITH NUMBER_SEQUENCE(C0) AS
(
SELECT 2 AS C0
UNION ALL
SELECT C0 + 1 AS C0 FROM NUMBER_SEQUENCE WHERE NUMBER_SEQUENCE.C0 < 20
)
SELECT C0
FROM (
SELECT C0, CASE WHEN (REST <> 0 OR C0 = 2 OR C0 = 3) THEN 0 ELSE 1 END AS REST_CONDITION
FROM (
SELECT T.C0 AS C0, T.C1, T.C0 % T.C1 AS REST
FROM (
SELECT T1.C0 AS C0, T0.C0 AS C1
FROM NUMBER_SEQUENCE AS T0, NUMBER_SEQUENCE AS T1
WHERE T1.[C0] / 2 >= T0.C0 OR T1.C0 = 2 OR T1.C0 = 3
) AS T
) AS T
) AS T
GROUP BY C0
HAVING SUM(REST_CONDITION) = 0
OPTION (MAXRECURSION 0);

The key here is to determine whether there are any residuals from division for a given number to be checked:

CASE WHEN (REST <> 0 OR C0 = 2 OR C0 = 3) THEN 0 ELSE 1 END AS REST_CONDITION

If the remainder of the division of the given tested number for any divisor is 0, then the calculated field REST_CONDITION will be equal to one, and, consequently, the subsequent check the sum of the field REST_CONDITION to 0 for the given tested number:

HAVING SUM(REST_CONDITION) = 0

will give a negative result, i.e. the number is not prime. If all the remainders of division by all permissible divisors for the given tested number are non-zero, then the condition of checking the sum in the REST_CONDITION field for equality to 0 will be satisfied, i.e. the number is prime. Again, the magic equals to numbers 2 and 3 - for the same reasons previously described.

Conclusion

Remarks on the query performance. Below are the results of rough measures of query execution time, depending on the range of numbers studied. The number of primes calculated is listed first, and then the rough time after the hyphen.

  • 10 - instantaneously
  • 100 - instantaneously
  • 1,000 - instantaneously
  • 10,000 - ~50 seconds
  • 100,000 - ~1 hour
  • 1,000,000 - ~4 days 16 hours (I ran this query just that out of curiosity. After all, the server works, not me)

As you can see, this can be an intensive calculation.

Rate

4.8 (5)

You rated this post out of 5. Change rating

Share

Share

Rate

4.8 (5)

You rated this post out of 5. Change rating