SQL Clone
SQLServerCentral is supported by Redgate
 
Log in  ::  Register  ::  Not logged in
 
 
 


Performance challenge


Performance challenge

Author
Message
hreisterp
hreisterp
Grasshopper
Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)

Group: General Forum Members
Points: 14 Visits: 13
Hello guys out there,

here is a simple sounding but yet interesting challenge.
Create a database with a single table named "numbers" with a single column of type int (or bigint).
create table numbers (number int not null)
create unique clustered index abc on numbers (number)

1.) Use as few as possible insert statements to add at least the numbers 1 to 60.000 (no gaps). I can do it in 15 insert statements.
2.) Use a standard SELECT query (no cursors etc) to select all prime numbers (prime number: can only be devided by 1 or itself without rest, 1 is not a prime number)
3.) Find a way to get the query result in less then 1 second on a normal PC (not a server with 36 CPU#s). Dual core machine with 4 GB RAM, 64 bit version of SQL Server 2008.

1.) and 2.) are quite easy. I bet many of you will find a solution easily.
But 3.) is a challenge
Let's see if someone can find a solution for 3.) I did not find one yet.
GSquared
GSquared
SSC-Insane
SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)

Group: General Forum Members
Points: 23371 Visits: 9730
The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.

IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL
DROP TABLE #Numbers;

CREATE TABLE #Numbers (Number INT PRIMARY KEY);

INSERT INTO #Numbers (Number)
SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)
FROM sys.all_columns AS T1
CROSS JOIN sys.all_columns AS T2;



Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)

The third part, will depend almost completely on the CPU.

On a low-end machine, I got this to run in 13 seconds:

SELECT Number
FROM #Numbers AS N1
WHERE Number > 1
AND NOT EXISTS
(SELECT 1
FROM #Numbers AS N2
WHERE N1.Number%N2.Number = 0
AND N2.Number > 1
AND N2.Number <= N1.Number/2);



- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Wildcat
Wildcat
Mr or Mrs. 500
Mr or Mrs. 500 (571 reputation)Mr or Mrs. 500 (571 reputation)Mr or Mrs. 500 (571 reputation)Mr or Mrs. 500 (571 reputation)Mr or Mrs. 500 (571 reputation)Mr or Mrs. 500 (571 reputation)Mr or Mrs. 500 (571 reputation)Mr or Mrs. 500 (571 reputation)

Group: General Forum Members
Points: 571 Visits: 1444
GSquared, I used your query and got 6 seconds on a 8GB SQL2008 SP2 machine. :-)
Michael Valentine Jones
Michael Valentine Jones
SSCertifiable
SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)SSCertifiable (5.7K reputation)

Group: General Forum Members
Points: 5714 Visits: 11771
The code on the thread below uses the Sieve of Eratosthenes and Sieve of Aitken to find prime numbers.

Tests for the Sieve of Aitken returned all primes below 100,000 in 0.6 seconds and all primes below 5,000,000 in 55 seconds. I believe this was on an old desktop computer. The Sieve of Eratosthenes took over twice as long.

Prime Number Table Function
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=69646
WayneS
WayneS
SSCrazy Eights
SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)SSCrazy Eights (9.8K reputation)

Group: General Forum Members
Points: 9783 Visits: 10569
GSquared (12/27/2010)
The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.

IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL
DROP TABLE #Numbers;

CREATE TABLE #Numbers (Number INT PRIMARY KEY);

INSERT INTO #Numbers (Number)
SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)
FROM sys.all_columns AS T1
CROSS JOIN sys.all_columns AS T2;



Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)[/code]


If you change the code to use the master.sys.all_columns, there is (thru SQL 2008R2) > 5000 rows in this view. When cross-joined to itself one time, this gives > 25 million rows.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings

GSquared
GSquared
SSC-Insane
SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)SSC-Insane (23K reputation)

Group: General Forum Members
Points: 23371 Visits: 9730
WayneS (12/27/2010)
GSquared (12/27/2010)
The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.

IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL
DROP TABLE #Numbers;

CREATE TABLE #Numbers (Number INT PRIMARY KEY);

INSERT INTO #Numbers (Number)
SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)
FROM sys.all_columns AS T1
CROSS JOIN sys.all_columns AS T2;



Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)[/code]


If you change the code to use the master.sys.all_columns, there is (thru SQL 2008R2) > 5000 rows in this view. When cross-joined to itself one time, this gives > 25 million rows.


Yep.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Jeff Moden
Jeff Moden
SSC Guru
SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)SSC Guru (85K reputation)

Group: General Forum Members
Points: 85521 Visits: 41081
hreisterp (12/27/2010)
Hello guys out there,

here is a simple sounding but yet interesting challenge.
Create a database with a single table named "numbers" with a single column of type int (or bigint).
create table numbers (number int not null)
create unique clustered index abc on numbers (number)

1.) Use as few as possible insert statements to add at least the numbers 1 to 60.000 (no gaps). I can do it in 15 insert statements.
2.) Use a standard SELECT query (no cursors etc) to select all prime numbers (prime number: can only be devided by 1 or itself without rest, 1 is not a prime number)
3.) Find a way to get the query result in less then 1 second on a normal PC (not a server with 36 CPU#s). Dual core machine with 4 GB RAM, 64 bit version of SQL Server 2008.

1.) and 2.) are quite easy. I bet many of you will find a solution easily.
But 3.) is a challenge
Let's see if someone can find a solution for 3.) I did not find one yet.


So which "contest" are you entering? Or is this homework? Also, if it's taking 6 seconds to gen only 60.000 rows even on an old machine, you should figure out what's wrong with the machine. ;-)

--Jeff Moden

RBAR is pronounced ree-bar and is a Modenism for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column.
If you think its expensive to hire a professional to do the job, wait until you hire an amateur. -- Red Adair

Helpful Links:
How to post code problems
How to post performance problems
Forum FAQs
hreisterp
hreisterp
Grasshopper
Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)

Group: General Forum Members
Points: 14 Visits: 13
Hello,

great post so far. The idea of using row_num is really smart.
I did this unsing standard SQL in the following way:
insert into numbers values (1)

insert into numbers select (select MAX(number) from numbers)+number from numbers


Repeat this 15 times! It will double the number of row on each iteration.
So already 1 point for the community :-)

For the select I started with

select * from numbers a
where
a.number>1 and
not exists (select null from numbers b
where b.number < a.number
and b.number>1
and a.number % b.number = 0)
order by a.number desc



OK. If you delete the 1 from the table (delete from numbers where number = 1) then you can remove the two > 1 constraints.
But surprisingly it takes 9 seconds on my dual core machine. Compared to the classic sieves this is PAINFULLY slow. The plan looks not very good (attached it).
It does a nest loop join and both times used the clustered index. It does an index scan on the outer loop (fine) and tries an index seek on the innner loop. Here it expects a hit count of 1 but ends up with seekin the complete index (that's why it is so slow).

One common optimization would be to change the condition from
b.number < a.number  to b.number*b.number<=a.number

(once the square of the number is larger a the number to check we can stop). But this doesn't help at all. It acutally causes an overflow error if you use type int for number. Changing to bigint solves this issue - but still performance is sad.

Then I though - SQL Server issue?? Tried it on a colleagues local Oracle 9 intallation - it ran for 10 minutes (then we stopped it)! OK - maybe on this machine something was wrong ;-) I'll try it again on my machine today.

This could actually be a good test for performance of the underlying query execution system of the DB. The problem does not depend much on the I/O system as the amount of data is minimal (<1 MB).

I tried a couple of other things:
- use a table variable :
declare @numbers table (number int)

... --> surprisingly MUCH slower

- used LINQ (this is for sure in memory. For table variables it is not 100% sure that they will not go to file system). Here's my code (VS 2010 Express compiles this)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
const int N = 65536;
int[] data = new int[N];
for (int i = 0; i < N; ++i) data[i] = i + 1; // should have used foreach ...
DateTime d = DateTime.Now;
// BEAUTIFUL !! Sometimes I luv myself for doing these things...
var q = from s in data where !(from o in data where (o < s) && (o > 1) && (s % o == 0) select 0).Any() && s > 1 select s;
Console.WriteLine(DateTime.Now - d);
// Console.ReadKey();
StringBuilder sb = new StringBuilder();
foreach (var item in q)
{
sb.Append(item + "");
}
Console.WriteLine(DateTime.Now - d);
Console.ReadKey();
Console.WriteLine(sb.ToString());
Console.ReadKey();
}
}
}



Runs a little faster (8 sec) on my machine. But that is most likely due to the network data transport system of SQL Server. That does not matter for me.
Actually there is an issue in C# here. If you use the square break conditon in C# it yield the wrong result!

var q = from s in data where !(from o in data where (o*o <= s) && (o > 1) && (s % o == 0) select 0).Any() && s > 1 select s;


It obviously uses int as data type for s and o. As there is no overflow check in C# (which acutally sucks) this will end somewhere around 40.000. Actually the compiler should be smart enough to figure out that for computing o*o it is required to use long int instead.... Maybe i should post a bug...
I'll keep trying..

P.S. We all know that the classical sieves do the job ultra fast. But those are speciallized algorithms. SQL MUST be smart enough to do this at least 10 times faster.
Attachments
Unbenannt.JPG (7 views, 40.00 KB)
hreisterp
hreisterp
Grasshopper
Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)Grasshopper (14 reputation)

Group: General Forum Members
Points: 14 Visits: 13
OK. Gave it a try in the "FAST" MySQL -> out of the box: 50 sec on the same machine. Silly. MySQL is ridiculously Laugh
Switching from InnoDB engine to MyISAM at least improves it to 18 sec. Only twice as slow as SQL Server (anyway: MyISAM is not a suitable engine at all...)
The SUPER DUPER InMemory engine ... forget it: 33 seconds
Done with MySQL ...
roi.reuven
roi.reuven
SSC-Enthusiastic
SSC-Enthusiastic (120 reputation)SSC-Enthusiastic (120 reputation)SSC-Enthusiastic (120 reputation)SSC-Enthusiastic (120 reputation)SSC-Enthusiastic (120 reputation)SSC-Enthusiastic (120 reputation)SSC-Enthusiastic (120 reputation)SSC-Enthusiastic (120 reputation)

Group: General Forum Members
Points: 120 Visits: 195
it take me 8 secunds on my laptop
IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL
DROP TABLE #Numbers;

CREATE TABLE #Numbers (Number INT PRIMARY KEY);

INSERT INTO #Numbers (Number)
SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)

FROM sys.all_columns AS T1
CROSS JOIN sys.all_columns AS T2;
Go


Permissions

You can't post new topics.
You can't post topic replies.
You can't post new polls.
You can't post replies to polls.
You can't edit your own topics.
You can't delete your own topics.
You can't edit other topics.
You can't delete other topics.
You can't edit your own posts.
You can't edit other posts.
You can't delete your own posts.
You can't delete other posts.
You can't post events.
You can't edit your own events.
You can't edit other events.
You can't delete your own events.
You can't delete other events.
You can't send private messages.
You can't send emails.
You can read topics.
You can't vote in polls.
You can't upload attachments.
You can download attachments.
You can't post HTML code.
You can't edit HTML code.
You can't post IFCode.
You can't post JavaScript.
You can post emoticons.
You can't post or upload images.

Select a forum

































































































































































SQLServerCentral


Search