Performance challenge

Question

Post reply

Performance challenge

hreisterp

SSC Veteran

Points: 288
More actions
December 27, 2010 at 12:14 pm

#231498

Hello guys out there,
here is a simple sounding but yet interesting challenge.
Create a database with a single table named "numbers" with a single column of type int (or bigint).
create table numbers (number int not null)
create unique clustered index abc on numbers (number)
1.) Use as few as possible insert statements to add at least the numbers 1 to 60.000 (no gaps). I can do it in 15 insert statements.
2.) Use a standard SELECT query (no cursors etc) to select all prime numbers (prime number: can only be devided by 1 or itself without rest, 1 is not a prime number)
3.) Find a way to get the query result in less then 1 second on a normal PC (not a server with 36 CPU#s). Dual core machine with 4 GB RAM, 64 bit version of SQL Server 2008.
1.) and 2.) are quite easy. I bet many of you will find a solution easily.
But 3.) is a challenge
Let's see if someone can find a solution for 3.) I did not find one yet.

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.

IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL

DROP TABLE #Numbers;

CREATE TABLE #Numbers (Number INT PRIMARY KEY);

INSERT INTO #Numbers (Number)

SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)

FROM sys.all_columns AS T1

CROSS JOIN sys.all_columns AS T2;

Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)

The third part, will depend almost completely on the CPU.

On a low-end machine, I got this to run in 13 seconds:

SELECT Number

FROM #Numbers AS N1

WHERE Number > 1

AND NOT EXISTS

(SELECT 1

FROM #Numbers AS N2

WHERE N1.Number%N2.Number = 0

AND N2.Number > 1

AND N2.Number <= N1.Number/2);

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Wildcat SSCarpal Tunnel Points: 4774 More actions · Answer 2

GSquared, I used your query and got 6 seconds on a 8GB SQL2008 SP2 machine. 🙂

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 3

The code on the thread below uses the Sieve of Eratosthenes and Sieve of Aitken to find prime numbers.

Tests for the Sieve of Aitken returned all primes below 100,000 in 0.6 seconds and all primes below 5,000,000 in 55 seconds. I believe this was on an old desktop computer. The Sieve of Eratosthenes took over twice as long.

Prime Number Table Function

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=69646

WayneS SSC Guru Points: 95409 More actions · Answer 4

GSquared (12/27/2010)
The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.
IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL
DROP TABLE #Numbers;
CREATE TABLE #Numbers (Number INT PRIMARY KEY);
INSERT INTO #Numbers (Number)
SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)
FROM sys.all_columns AS T1
CROSS JOIN sys.all_columns AS T2;
Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)[/code]

If you change the code to use the master.sys.all_columns, there is (thru SQL 2008R2) > 5000 rows in this view. When cross-joined to itself one time, this gives > 25 million rows.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

GSquared SSC Guru Points: 260824 More actions · Answer 5

WayneS (12/27/2010)
GSquared (12/27/2010)
The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.
IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL
DROP TABLE #Numbers;
CREATE TABLE #Numbers (Number INT PRIMARY KEY);
INSERT INTO #Numbers (Number)
SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)
FROM sys.all_columns AS T1
CROSS JOIN sys.all_columns AS T2;
Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)[/code]
If you change the code to use the master.sys.all_columns, there is (thru SQL 2008R2) > 5000 rows in this view. When cross-joined to itself one time, this gives > 25 million rows.

Yep.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Jeff Moden SSC Guru Points: 1003863 More actions · Answer 6

hreisterp (12/27/2010)
Hello guys out there,
here is a simple sounding but yet interesting challenge.
Create a database with a single table named "numbers" with a single column of type int (or bigint).
create table numbers (number int not null)
create unique clustered index abc on numbers (number)
1.) Use as few as possible insert statements to add at least the numbers 1 to 60.000 (no gaps). I can do it in 15 insert statements.
2.) Use a standard SELECT query (no cursors etc) to select all prime numbers (prime number: can only be devided by 1 or itself without rest, 1 is not a prime number)
3.) Find a way to get the query result in less then 1 second on a normal PC (not a server with 36 CPU#s). Dual core machine with 4 GB RAM, 64 bit version of SQL Server 2008.
1.) and 2.) are quite easy. I bet many of you will find a solution easily.
But 3.) is a challenge
Let's see if someone can find a solution for 3.) I did not find one yet.

So which "contest" are you entering? Or is this homework? Also, if it's taking 6 seconds to gen only 60.000 rows even on an old machine, you should figure out what's wrong with the machine. 😉

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

hreisterp SSC Veteran Points: 288 More actions · Answer 7

Hello,

great post so far. The idea of using row_num is really smart.

I did this unsing standard SQL in the following way:

insert into numbers values (1)

insert into numbers select (select MAX(number) from numbers)+number from numbers

Repeat this 15 times! It will double the number of row on each iteration.

So already 1 point for the community 🙂

For the select I started with

select * from numbers a

where

a.number>1 and

not exists (select null from numbers b

where b.number < a.number

and b.number>1

and a.number % b.number = 0)

order by a.number desc

OK. If you delete the 1 from the table (delete from numbers where number = 1) then you can remove the two > 1 constraints.

But surprisingly it takes 9 seconds on my dual core machine. Compared to the classic sieves this is PAINFULLY slow. The plan looks not very good (attached it).

It does a nest loop join and both times used the clustered index. It does an index scan on the outer loop (fine) and tries an index seek on the innner loop. Here it expects a hit count of 1 but ends up with seekin the complete index (that's why it is so slow).

One common optimization would be to change the condition from b.number < a.number to b.number*b.number<=a.number (once the square of the number is larger a the number to check we can stop). But this doesn't help at all. It acutally causes an overflow error if you use type int for number. Changing to bigint solves this issue - but still performance is sad.

Then I though - SQL Server issue?? Tried it on a colleagues local Oracle 9 intallation - it ran for 10 minutes (then we stopped it)! OK - maybe on this machine something was wrong 😉 I'll try it again on my machine today.

This could actually be a good test for performance of the underlying query execution system of the DB. The problem does not depend much on the I/O system as the amount of data is minimal (<1 MB).

I tried a couple of other things:

- use a table variable : declare @numbers table (number int) ... --> surprisingly MUCH slower

- used LINQ (this is for sure in memory. For table variables it is not 100% sure that they will not go to file system). Here's my code (VS 2010 Express compiles this)

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

namespace ConsoleApplication1

{

class Program

{

static void Main(string[] args)

{

const int N = 65536;

int[] data = new int[N];

for (int i = 0; i < N; ++i) data = i + 1; // should have used foreach ...

DateTime d = DateTime.Now;

// BEAUTIFUL !! Sometimes I luv myself for doing these things...

var q = from s in data where !(from o in data where (o < s) && (o > 1) && (s % o == 0) select 0).Any() && s > 1 select s;

Console.WriteLine(DateTime.Now - d);

// Console.ReadKey();

StringBuilder sb = new StringBuilder();

foreach (var item in q)

{

sb.Append(item + "");

}

Console.WriteLine(DateTime.Now - d);

Console.ReadKey();

Console.WriteLine(sb.ToString());

Console.ReadKey();

}

Runs a little faster (8 sec) on my machine. But that is most likely due to the network data transport system of SQL Server. That does not matter for me.

Actually there is an issue in C# here. If you use the square break conditon in C# it yield the wrong result!

var q = from s in data where !(from o in data where (o*o <= s) && (o > 1) && (s % o == 0) select 0).Any() && s > 1 select s;

It obviously uses int as data type for s and o. As there is no overflow check in C# (which acutally sucks) this will end somewhere around 40.000. Actually the compiler should be smart enough to figure out that for computing o*o it is required to use long int instead.... Maybe i should post a bug...

I'll keep trying..

P.S. We all know that the classical sieves do the job ultra fast. But those are speciallized algorithms. SQL MUST be smart enough to do this at least 10 times faster.

hreisterp SSC Veteran Points: 288 More actions · Answer 8

OK. Gave it a try in the "FAST" MySQL -> out of the box: 50 sec on the same machine. Silly. MySQL is ridiculously :laugh:

Switching from InnoDB engine to MyISAM at least improves it to 18 sec. Only twice as slow as SQL Server (anyway: MyISAM is not a suitable engine at all...)

The SUPER DUPER InMemory engine ... forget it: 33 seconds

Done with MySQL ...

roi.reuven SSCommitted Points: 1670 More actions · Answer 9

it take me 8 secunds on my laptop

IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL

DROP TABLE #Numbers;

CREATE TABLE #Numbers (Number INT PRIMARY KEY);

INSERT INTO #Numbers (Number)

SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)

FROM sys.all_columns AS T1

CROSS JOIN sys.all_columns AS T2;

hreisterp SSC Veteran Points: 288 More actions · Answer 10

The performance challenge is not to insert the data but to select the prime number. 🙂

hreisterp SSC Veteran Points: 288 More actions · Answer 11

OK. Another try... 😉

Used then outstanding LINQPad [/url]tool to do the same.

int[] data=data=Enumerable.Range(2,65536).ToArray(); // learned this today 🙂

var q = from s in data where !(from o in data where (o < s) && (s % o == 0) select 0).Any() select s;

q.Dump(); // that is a LINQPad extension...

3 seconds!

That is almost twice as fast as running it from withing VCS 2010 Express. Hmmmhhhh ... that is a big surprise.

And it outperformns SQL server by a factor of two... This needs further investigation.

hreisterp SSC Veteran Points: 288 More actions · Answer 12

Thanks for your post. But that was not the challenge. Actually loading all prime number from a pre-computed list is MUCH faster (if it is not huge number you need you could simply define the list as code).

hreisterp SSC Veteran Points: 288 More actions · Answer 13

Oracle is such an amzing source of annoyance:

- downlad -> MUST register

- register. Does not respect international aspects. My first name contains an ö. For Oracle this seems to be a challenge

- download. .EXE extension is missing

- install to c:\program files (x86)\orcale -> can not install. Path contains space characters. ...

:angry:

hreisterp SSC Veteran Points: 288 More actions · Answer 14

After a while of using OracleXE (this a creapled 10g version) the results could not be any worth....

1.) The thing installs an HTTP server!! Without even asking.

2.) there is no acceptable logon. You have to use a predefined user SYS -> welcome Hacker. You already know my name

3.) The browser based GUI is a productivity NIGHTMARE. Even MySQL's community GUI is much better

4.) create table numbers (number <- error reserved word???

5.) Performance is a DESASTER. If you run the query in the browser it will first only show the 10 first items - did I ask to show only 10 itmes??? Once you turn this off and allow 100.000 items it takes MINUTES until the result is there. I thought this because the browser needs a long time to render the HTML result. So back to basic: SQL Plus. After having learned how to logon as SYS (took me ten minutes) i ran the query again but stopped it after a while. Would have taken too long. Then changed the query to only get count(*) instead of the real numbers.

Result: 2,5 MINUTES

Maybe the XE version is somehow creepled - is it?