Calculating Prime Numbers With One Query

  • Jonathan AC Roberts wrote:

    Jeff Moden wrote:

    Peter "Peso" Larsson wrote a nasty fast Prime Numbers function years ago and I tested it to confirm it.  I'll see if I can find it in my archives this week if someone else doesn't find it first.

    I think this must be it, it is in the comments on this page https://www.red-gate.com/simple-talk/databases/sql-server/t-sql-programming-sql-server/celkos-summer-sql-stumpers-prime-numbers/

    That's the one.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • A simplified version is found here

    https://www.sqltopia.com/mathematics/prime-number-generation/

     


    N 56°04'39.16"
    E 12°55'05.25"

  • A simplified version is found here

    https://www.sqltopia.com/mathematics/prime-number-generation/


    N 56°04'39.16"
    E 12°55'05.25"

  • SwePeso wrote:

    A simplified version is found here https://www.sqltopia.com/mathematics/prime-number-generation/%5B/quote%5D

    Very good and a clear explanation of how it works.

  • Jonathan AC Roberts wrote:

    Each number you test for primality needs O(sqrt(n)*n) calculations, so it will slow down for larger numbers. Also, if you increase @n to 10 times its previous value each run you would expect the elapsed time to go up by a factor of over 30.

    Makes sense.  The SQL environment I use for online stuff isn't really set up to test absolute performance.  My laptop starts gasping on Youtube videos and Azure SQL is using 50 DTU's which is much less cpu than my laptop.  At the higher search counts I wasn't getting any results at all with that earlier query.  For sure I didn't delete anything.  The takeaway was to give it a go in C# which I never did

    The SSC topic says "one query" which some of the scripts aren't exactly.  Even so performance-wise the allocations are likely to not help and could be counterproductive.  Suppose you try to speed it up by getting rid of the 2nd tally function and allocating a temp table to use in its place in the correlated subquery.  It makes it slower afaik

    This is 2x slower

    declare @n      int=1000000;

    declare @lkup table(n int primary key clustered);

    insert @lkup(n)
    select fn.N*2+1 from dbo.fnTally(1, sqrt(@n/2)) fn;

    select count(*)
    from dbo.fnTally(1, @n/2-1) fn
    where not exists(select 1
    from @lkup l
    where (fn.N*2+1)%l.N=0
    and (fn.N*2+1)<>l.N);

    versus just this

    declare @n      int=1000000;

    select count(*)
    from dbo.fnTally(1, @n/2-1) fn
    where not exists(select 1
    from dbo.fnTally(1, sqrt(fn.N/2)) ffn
    where (fn.N*2+1)%(ffn.N*2+1)=0
    and (fn.N*2+1)<>(ffn.N*2+1));

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Here is a "one query" solution 🙂

    Finished in about 10 seconds for prime numbers up to 1 million.

    DECLARE @max_prime INT = 1000000;

    -- swePeso
    SELECT value
    FROM (
    VALUES (2),
    (3),
    (5),
    (7)
    ) AS x(value)
    WHERE value <= @max_prime

    UNION ALL

    SELECT x.value
    FROM GENERATE_SERIES(30, @max_prime + 29, 30) AS pc
    CROSS APPLY (
    VALUES (pc.value - 23),
    (pc.value - 19),
    (pc.value - 17),
    (pc.value - 13),
    (pc.value - 11),
    (pc.value - 7),
    (pc.value - 1),
    (pc.value + 1)
    ) AS x(value)
    CROSS APPLY GENERATE_SERIES(3, CAST(SQRT(pc.value + 29) AS INT), 2) AS d
    WHERE x.value <= @max_prime
    GROUP BY x.value
    HAVING MIN(x.value % d.value) >= 1;

    N 56°04'39.16"
    E 12°55'05.25"

  • Here's another one query solution.

    Finished in about 3 seconds for prime numbers up to 1 million (on my machine).

    I can't test it against SwePeso's solution as I don't have SQL Server 2022 installed so can't use GENERATE_SERIES

    DECLARE @Max int = 1000000;

    WITH TO_SQRT AS
    (
    SELECT TOP(CAST(SQRT(@Max)/2 AS bigint))
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))*2+1 N
    FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS T(N)
    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS T1(N)
    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS T2(N)
    ),
    S AS
    (
    SELECT N FROM (VALUES(2),(3),(5),(7),(11),(13),(17),(19),(23),(29),(31),(37),(41),(43),(47),(53),(59)) T(N) -- Add the primes we factored out
    UNION ALL
    SELECT n
    FROM TO_SQRT
    WHERE N <= CAST(SQRT(@Max) AS bigint)
    AND N%6 IN(1,5)
    AND 0 NOT IN (N%5,N%7,N%11,N%13,N%17,N%19,N%23,N%29,N%31,N%37,N%41,N%43,N%47,N%53,N%59)
    ),
    TO_MAX AS
    (
    SELECT TOP(CAST(@Max/2 AS int))
    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))*2+1 N
    FROM (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS T(N)
    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS T1(N)
    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS T2(N)
    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS T3(N)
    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS T4(N)
    CROSS JOIN (VALUES (0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS T5(N)
    )
    SELECT N
    FROM (VALUES(2),(3),(5),(7),(11),(13),(17),(19),(23),(29),(31),(37),(41),(43),(47),(53),(59)) T(N) -- Add the primes we factored out
    WHERE T.N <= @Max
    UNION ALL
    SELECT p.N
    FROM TO_MAX p
    WHERE p.N <= @Max
    AND p.N%6 IN(1,5)
    AND 0 NOT IN (p.N%5,p.N%7,p.N%11,p.N%13,p.N%17,p.N%19,p.N%23,p.N%29,p.N%31,p.N%37,p.N%41,p.N%43,p.N%47,p.N%53,p.N%59)
    AND NOT EXISTS(SELECT *
    FROM S
    WHERE p.N%S.N=0
    AND S.N <= SQRT(p.N))
    ;

    • This reply was modified 10 months, 3 weeks ago by  Jonathan AC Roberts. Reason: Changed last line from [code]AND S.N < SQRT(p.N)[/code] to [code]AND S.N
  • @jonathon-2 - I just tested your method and found that it is including 151 additional non-prime numbers in the results.  I also tested both version on one of my systems, yours took 8 seconds and @SwePeso's took 6 seconds.

    Inserting the data into temp tables instead of to the screen - yours took 3 seconds vs 2 seconds for Peter's.

     

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Jeffrey Williams wrote:

    @Jonathon - I just tested your method and found that it is including 151 additional non-prime numbers in the results.  I also tested both version on one of my systems, yours took 8 seconds and @SwePeso's took 6 seconds.

    Inserting the data into temp tables instead of to the screen - yours took 3 seconds vs 2 seconds for Peter's. 

    I appreciate the comparison. I have made the necessary correction to my SQL. In the last line, I replaced the "<" operator with "<=" to ensure that no squares are included in the results.

     

  • I haven't yet installed SQL Server 2022 and thought it would be possible to write a TVF with the same functionality as SQL 2022's GENERATE_SERIES so I could get SwePeso's SQL Statement to work. Then I thought why not get ChatGPT to write the TVF. So I gave ChatGPT a copy of Jeff Moden's fnTally code and asked it to create a TVF that uses similar way of generating a large quantity of numbers to create a TVF with exactly the same functionality as Microsoft's GENERATE_SERIES function. I pasted in the functional specifications, I had to tell it to make a few corrections but this is what it came up with:

    CREATE FUNCTION [dbo].[GENERATE_SERIES]
    (
    @Start BIGINT,
    @Stop BIGINT,
    @Step BIGINT = 1
    )
    RETURNS TABLE WITH SCHEMABINDING AS
    RETURN WITH
    H2(N) AS ( SELECT 1
    FROM (VALUES
    (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    ,(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)
    )V(N)) --16^2 or 256 rows
    , H4(N) AS (SELECT 1 FROM H2 a, H2 b) --16^4 or 65,536 rows
    , H8(N) AS (SELECT 1 FROM H4 a, H4 b) --16^8 or 4,294,967,296 rows
    , Tally AS (
    SELECT N = ROW_NUMBER() OVER (ORDER BY N)
    FROM H8
    )
    , StepTally AS (
    SELECT value = @Start + (@Step * (N - 1))
    FROM Tally
    )
    SELECT TOP(ABS(@Stop - @Start) / ABS(@Step) + 1) value
    FROM StepTally
    WHERE @Step <> 0
    GO

    Then to get SwePeso's code to work I just had to change all the GENERATE_SEQUENCE to  dbo.GENERATE_SEQUENCE

  • Why mess with fnTally when what we really need is for Jeffrey to rewrite the query?  Ha, then we also get a baseline of the functions.  Also how does the 2 tally method with SQRT compare to SwePeso's code?

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Steve Collins wrote:

    Why mess with fnTally when what we really need is for Jeffrey to rewrite the query?  Ha, then we also get a baseline of the functions.  Also how does the 2 tally method with SQRT compare to SwePeso's code?

    I think SQL Server's GENERATE_SERIES is very efficient and fast. You can try a comparison yourself with the dbo function, ChatGPT and I with a bit of help from Jeff Moden's fnTally wrote, and see for yourself.

  • At the moment I don't have access to an instance of SQL 2022 to test with.  That could change in the near future tho.  For now I was just fishing for information 🙂

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Steve Collins wrote:

    At the moment I don't have access to an instance of SQL 2022 to test with.  That could change in the near future tho.  For now I was just fishing for information 🙂

    Yes, SQL Server 2022 is going on my personal machine today.

  • I installed SQL Server 2022 today and changed my version to use GENERATE_SERIES. I also amended Steve Collins algorithm to use it and remove all the multiplications by "*2+1" as GENERATE_SERIES can go up in steps so no need for them, it was also missing the number 2 from the results so I added that. Here is a script of all the single query algorithms submitted in this thread with their timings.

    SET STATISTICS TIME ON

    DROP TABLE IF EXISTS #swePeso;
    DROP TABLE IF EXISTS #JRoberts;
    DROP TABLE IF EXISTS #SCollinsfnTally;
    DROP TABLE IF EXISTS #SCollinsGENERATE_SERIES;

    GO

    DECLARE @max_prime INT = 1000000;

    PRINT 'Method swePeso **************************************************************************************************'
    -- swePeso
    SELECT value
    into #swePeso
    FROM (
    SELECT value
    FROM (
    VALUES (2),
    (3),
    (5),
    (7)
    ) AS x(value)
    WHERE value <= @max_prime
    UNION ALL
    SELECT x.value
    FROM GENERATE_SERIES(30, @max_prime + 29, 30) AS pc
    CROSS APPLY (
    VALUES (pc.value - 23),
    (pc.value - 19),
    (pc.value - 17),
    (pc.value - 13),
    (pc.value - 11),
    (pc.value - 7),
    (pc.value - 1),
    (pc.value + 1)
    ) AS x(value)
    CROSS APPLY GENERATE_SERIES(3, CAST(SQRT(pc.value + 29) AS INT), 2) AS d
    WHERE x.value <= @max_prime
    GROUP BY x.value
    HAVING MIN(x.value % d.value) >= 1
    ) x
    ;

    GO

    PRINT 'JRoberts **************************************************************************************************';
    DECLARE @Max int = 1000000;
    WITH S AS
    (

    SELECT value FROM (VALUES(2),(3),(5),(7),(11),(13),(17),(19),(23),(29),(31),(37),(41),(43),(47),(53),(59)) T(value) -- Add the primes we factored out
    UNION ALL
    SELECT value N
    FROM GENERATE_SERIES(3, CAST(SQRT(@Max) AS int) , 2)
    WHERE value%6 IN(1,5)
    AND 0 NOT IN (value%5,value%7,value%11,value%13,value%17,value%19,value%23,value%29,value%31,value%37,value%41,value%43,value%47,value%53,value%59)
    ),
    A AS
    (
    SELECT N
    FROM (VALUES(2),(3),(5),(7),(11),(13),(17),(19),(23),(29),(31),(37),(41),(43),(47),(53),(59)) T(N) -- Add the primes we factored out
    WHERE N <= @Max
    UNION ALL
    SELECT value N
    FROM GENERATE_SERIES(61, @Max, 2) p
    WHERE value%6 IN(1,5)
    AND 0 NOT IN (value%5,value%7,value%11,value%13,value%17,value%19,value%23,value%29,value%31,value%37,value%41,value%43,value%47,value%53,value%59)
    AND NOT EXISTS(SELECT *
    FROM S
    WHERE p.value%S.value=0
    AND S.value <= SQRT(p.value))
    )
    SELECT N value
    INTO #JRoberts
    FROM A

    GO

    PRINT 'Steve Collins fnTally **************************************************************************************************'
    declare @n int=1000000;
    select fn.N*2+1 value
    into #SCollinsfnTally
    from dbo.fnTally(1, @n/2-1) fn
    where not exists(select 1
    from dbo.fnTally(1, sqrt(fn.N/2)) ffn
    where (fn.N*2+1)%(ffn.N*2+1)=0
    and (fn.N*2+1)<>(ffn.N*2+1));
    GO
    PRINT 'Steve Collins amended to use GENERATE_SERIES and get rid of calculations **************************************************************************************************'
    DECLARE @Max int = 1000000;
    SELECT value
    INTO #SCollinsGENERATE_SERIES
    FROM (values(2)) p(value)
    WHERE value <= @Max
    UNION all
    SELECT value
    FROM GENERATE_SERIES(3, @Max, 2) p
    WHERE NOT EXISTS(SELECT *
    FROM GENERATE_SERIES(3, CAST(SQRT(p.value) AS int), 2) f
    WHERE p.value%f.value=0
    AND p.value <> f.value)
    ;

    MethodCPU time (ms)Elapsed time (ms)Rows affected
    Steve Collins amended to use GENERATE_SERIES and get rid of calculations1969204478498
    JRoberts2734277378498
    Steve Collins fnTally4906495378497
    swePeso7922805178498

    The victorious algorithm belongs to Steve, which I refined by incorporating GENERATE_SERIES and eliminating certain calculations. As a result, it achieved completion of 2 seconds for all primes less than 1 million. It's worth noting that my machine may exhibit unique behaviour, so I encourage others to give it a try and share their results.

Viewing 15 posts - 16 through 30 (of 77 total)

You must be logged in to reply to this topic. Login to reply