Performance challenge

  • Hello guys out there,

    here is a simple sounding but yet interesting challenge.

    Create a database with a single table named "numbers" with a single column of type int (or bigint).

    create table numbers (number int not null)

    create unique clustered index abc on numbers (number)

    1.) Use as few as possible insert statements to add at least the numbers 1 to 60.000 (no gaps). I can do it in 15 insert statements.

    2.) Use a standard SELECT query (no cursors etc) to select all prime numbers (prime number: can only be devided by 1 or itself without rest, 1 is not a prime number)

    3.) Find a way to get the query result in less then 1 second on a normal PC (not a server with 36 CPU#s). Dual core machine with 4 GB RAM, 64 bit version of SQL Server 2008.

    1.) and 2.) are quite easy. I bet many of you will find a solution easily.

    But 3.) is a challenge

    Let's see if someone can find a solution for 3.) I did not find one yet.

  • The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.

    IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL

    DROP TABLE #Numbers;

    CREATE TABLE #Numbers (Number INT PRIMARY KEY);

    INSERT INTO #Numbers (Number)

    SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)

    FROM sys.all_columns AS T1

    CROSS JOIN sys.all_columns AS T2;

    Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)

    The third part, will depend almost completely on the CPU.

    On a low-end machine, I got this to run in 13 seconds:

    SELECT Number

    FROM #Numbers AS N1

    WHERE Number > 1

    AND NOT EXISTS

    (SELECT 1

    FROM #Numbers AS N2

    WHERE N1.Number%N2.Number = 0

    AND N2.Number > 1

    AND N2.Number <= N1.Number/2);

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared, I used your query and got 6 seconds on a 8GB SQL2008 SP2 machine. πŸ™‚

  • The code on the thread below uses the Sieve of Eratosthenes and Sieve of Aitken to find prime numbers.

    Tests for the Sieve of Aitken returned all primes below 100,000 in 0.6 seconds and all primes below 5,000,000 in 55 seconds. I believe this was on an old desktop computer. The Sieve of Eratosthenes took over twice as long.

    Prime Number Table Function

    http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=69646

  • GSquared (12/27/2010)


    The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.

    IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL

    DROP TABLE #Numbers;

    CREATE TABLE #Numbers (Number INT PRIMARY KEY);

    INSERT INTO #Numbers (Number)

    SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)

    FROM sys.all_columns AS T1

    CROSS JOIN sys.all_columns AS T2;

    Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)[/code]

    If you change the code to use the master.sys.all_columns, there is (thru SQL 2008R2) > 5000 rows in this view. When cross-joined to itself one time, this gives > 25 million rows.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (12/27/2010)


    GSquared (12/27/2010)


    The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.

    IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL

    DROP TABLE #Numbers;

    CREATE TABLE #Numbers (Number INT PRIMARY KEY);

    INSERT INTO #Numbers (Number)

    SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)

    FROM sys.all_columns AS T1

    CROSS JOIN sys.all_columns AS T2;

    Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)[/code]

    If you change the code to use the master.sys.all_columns, there is (thru SQL 2008R2) > 5000 rows in this view. When cross-joined to itself one time, this gives > 25 million rows.

    Yep.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • hreisterp (12/27/2010)


    Hello guys out there,

    here is a simple sounding but yet interesting challenge.

    Create a database with a single table named "numbers" with a single column of type int (or bigint).

    create table numbers (number int not null)

    create unique clustered index abc on numbers (number)

    1.) Use as few as possible insert statements to add at least the numbers 1 to 60.000 (no gaps). I can do it in 15 insert statements.

    2.) Use a standard SELECT query (no cursors etc) to select all prime numbers (prime number: can only be devided by 1 or itself without rest, 1 is not a prime number)

    3.) Find a way to get the query result in less then 1 second on a normal PC (not a server with 36 CPU#s). Dual core machine with 4 GB RAM, 64 bit version of SQL Server 2008.

    1.) and 2.) are quite easy. I bet many of you will find a solution easily.

    But 3.) is a challenge

    Let's see if someone can find a solution for 3.) I did not find one yet.

    So which "contest" are you entering? Or is this homework? Also, if it's taking 6 seconds to gen only 60.000 rows even on an old machine, you should figure out what's wrong with the machine. πŸ˜‰

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hello,

    great post so far. The idea of using row_num is really smart.

    I did this unsing standard SQL in the following way:

    insert into numbers values (1)

    insert into numbers select (select MAX(number) from numbers)+number from numbers

    Repeat this 15 times! It will double the number of row on each iteration.

    So already 1 point for the community πŸ™‚

    For the select I started with

    select * from numbers a

    where

    a.number>1 and

    not exists (select null from numbers b

    where b.number < a.number

    and b.number>1

    and a.number % b.number = 0)

    order by a.number desc

    OK. If you delete the 1 from the table (delete from numbers where number = 1) then you can remove the two > 1 constraints.

    But surprisingly it takes 9 seconds on my dual core machine. Compared to the classic sieves this is PAINFULLY slow. The plan looks not very good (attached it).

    It does a nest loop join and both times used the clustered index. It does an index scan on the outer loop (fine) and tries an index seek on the innner loop. Here it expects a hit count of 1 but ends up with seekin the complete index (that's why it is so slow).

    One common optimization would be to change the condition from b.number < a.number to b.number*b.number<=a.number (once the square of the number is larger a the number to check we can stop). But this doesn't help at all. It acutally causes an overflow error if you use type int for number. Changing to bigint solves this issue - but still performance is sad.

    Then I though - SQL Server issue?? Tried it on a colleagues local Oracle 9 intallation - it ran for 10 minutes (then we stopped it)! OK - maybe on this machine something was wrong πŸ˜‰ I'll try it again on my machine today.

    This could actually be a good test for performance of the underlying query execution system of the DB. The problem does not depend much on the I/O system as the amount of data is minimal (<1 MB).

    I tried a couple of other things:

    - use a table variable : declare @numbers table (number int) ... --> surprisingly MUCH slower

    - used LINQ (this is for sure in memory. For table variables it is not 100% sure that they will not go to file system). Here's my code (VS 2010 Express compiles this)

    using System;

    using System.Collections.Generic;

    using System.Linq;

    using System.Text;

    namespace ConsoleApplication1

    {

    class Program

    {

    static void Main(string[] args)

    {

    const int N = 65536;

    int[] data = new int[N];

    for (int i = 0; i < N; ++i) data = i + 1; // should have used foreach ...

    DateTime d = DateTime.Now;

    // BEAUTIFUL !! Sometimes I luv myself for doing these things...

    var q = from s in data where !(from o in data where (o < s) && (o > 1) && (s % o == 0) select 0).Any() && s > 1 select s;

    Console.WriteLine(DateTime.Now - d);

    // Console.ReadKey();

    StringBuilder sb = new StringBuilder();

    foreach (var item in q)

    {

    sb.Append(item + "");

    }

    Console.WriteLine(DateTime.Now - d);

    Console.ReadKey();

    Console.WriteLine(sb.ToString());

    Console.ReadKey();

    }

    }

    }

    Runs a little faster (8 sec) on my machine. But that is most likely due to the network data transport system of SQL Server. That does not matter for me.

    Actually there is an issue in C# here. If you use the square break conditon in C# it yield the wrong result!

    var q = from s in data where !(from o in data where (o*o <= s) && (o > 1) && (s % o == 0) select 0).Any() && s > 1 select s;

    It obviously uses int as data type for s and o. As there is no overflow check in C# (which acutally sucks) this will end somewhere around 40.000. Actually the compiler should be smart enough to figure out that for computing o*o it is required to use long int instead.... Maybe i should post a bug...

    I'll keep trying..

    P.S. We all know that the classical sieves do the job ultra fast. But those are speciallized algorithms. SQL MUST be smart enough to do this at least 10 times faster.

  • OK. Gave it a try in the "FAST" MySQL -> out of the box: 50 sec on the same machine. Silly. MySQL is ridiculously :laugh:

    Switching from InnoDB engine to MyISAM at least improves it to 18 sec. Only twice as slow as SQL Server (anyway: MyISAM is not a suitable engine at all...)

    The SUPER DUPER InMemory engine ... forget it: 33 seconds

    Done with MySQL ...

  • it take me 8 secunds on my laptop

    IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL

    DROP TABLE #Numbers;

    CREATE TABLE #Numbers (Number INT PRIMARY KEY);

    INSERT INTO #Numbers (Number)

    SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)

    FROM sys.all_columns AS T1

    CROSS JOIN sys.all_columns AS T2;

  • The performance challenge is not to insert the data but to select the prime number. πŸ™‚

  • OK. Another try... πŸ˜‰

    Used then outstanding LINQPad [/url]tool to do the same.

    int[] data=data=Enumerable.Range(2,65536).ToArray(); // learned this today πŸ™‚

    var q = from s in data where !(from o in data where (o < s) && (s % o == 0) select 0).Any() select s;

    q.Dump(); // that is a LINQPad extension...

    3 seconds!

    That is almost twice as fast as running it from withing VCS 2010 Express. Hmmmhhhh ... that is a big surprise.

    And it outperformns SQL server by a factor of two... This needs further investigation.

  • Thanks for your post. But that was not the challenge. Actually loading all prime number from a pre-computed list is MUCH faster (if it is not huge number you need you could simply define the list as code).

  • Oracle is such an amzing source of annoyance:

    - downlad -> MUST register

    - register. Does not respect international aspects. My first name contains an ΓΆ. For Oracle this seems to be a challenge

    - download. .EXE extension is missing

    - install to c:\program files (x86)\orcale -> can not install. Path contains space characters. ...

    :angry:

  • After a while of using OracleXE (this a creapled 10g version) the results could not be any worth....

    1.) The thing installs an HTTP server!! Without even asking.

    2.) there is no acceptable logon. You have to use a predefined user SYS -> welcome Hacker. You already know my name

    3.) The browser based GUI is a productivity NIGHTMARE. Even MySQL's community GUI is much better

    4.) create table numbers (number <- error reserved word???

    5.) Performance is a DESASTER. If you run the query in the browser it will first only show the 10 first items - did I ask to show only 10 itmes??? Once you turn this off and allow 100.000 items it takes MINUTES until the result is there. I thought this because the browser needs a long time to render the HTML result. So back to basic: SQL Plus. After having learned how to logon as SYS (took me ten minutes) i ran the query again but stopped it after a while. Would have taken too long. Then changed the query to only get count(*) instead of the real numbers.

    Result: 2,5 MINUTES

    Maybe the XE version is somehow creepled - is it?

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply