Click here to monitor SSC
SQLServerCentral is supported by Red Gate Software Ltd.
 
Log in  ::  Register  ::  Not logged in
 
 
 
        
Home       Members    Calendar    Who's On


Add to briefcase 123»»»

Performance challenge Expand / Collapse
Author
Message
Posted Monday, December 27, 2010 12:14 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, April 16, 2012 4:14 AM
Points: 12, Visits: 13
Hello guys out there,

here is a simple sounding but yet interesting challenge.
Create a database with a single table named "numbers" with a single column of type int (or bigint).
create table numbers (number int not null)
create unique clustered index abc on numbers (number)

1.) Use as few as possible insert statements to add at least the numbers 1 to 60.000 (no gaps). I can do it in 15 insert statements.
2.) Use a standard SELECT query (no cursors etc) to select all prime numbers (prime number: can only be devided by 1 or itself without rest, 1 is not a prime number)
3.) Find a way to get the query result in less then 1 second on a normal PC (not a server with 36 CPU#s). Dual core machine with 4 GB RAM, 64 bit version of SQL Server 2008.

1.) and 2.) are quite easy. I bet many of you will find a solution easily.
But 3.) is a challenge
Let's see if someone can find a solution for 3.) I did not find one yet.
Post #1039538
Posted Monday, December 27, 2010 12:58 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.

IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL
DROP TABLE #Numbers;

CREATE TABLE #Numbers (Number INT PRIMARY KEY);

INSERT INTO #Numbers (Number)
SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)
FROM sys.all_columns AS T1
CROSS JOIN sys.all_columns AS T2;

Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)

The third part, will depend almost completely on the CPU.

On a low-end machine, I got this to run in 13 seconds:

SELECT Number
FROM #Numbers AS N1
WHERE Number > 1
AND NOT EXISTS
(SELECT 1
FROM #Numbers AS N2
WHERE N1.Number%N2.Number = 0
AND N2.Number > 1
AND N2.Number <= N1.Number/2);



- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1039554
Posted Monday, December 27, 2010 1:40 PM
SSC-Addicted

SSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-AddictedSSC-Addicted

Group: General Forum Members
Last Login: Tuesday, March 18, 2014 1:26 PM
Points: 405, Visits: 1,431
GSquared, I used your query and got 6 seconds on a 8GB SQL2008 SP2 machine.
Post #1039568
Posted Monday, December 27, 2010 1:47 PM
Hall of Fame

Hall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of FameHall of Fame

Group: General Forum Members
Last Login: Thursday, September 18, 2014 12:39 AM
Points: 3,105, Visits: 11,494
The code on the thread below uses the Sieve of Eratosthenes and Sieve of Aitken to find prime numbers.

Tests for the Sieve of Aitken returned all primes below 100,000 in 0.6 seconds and all primes below 5,000,000 in 55 seconds. I believe this was on an old desktop computer. The Sieve of Eratosthenes took over twice as long.

Prime Number Table Function
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=69646






Post #1039570
Posted Monday, December 27, 2010 2:06 PM


SSCertifiable

SSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiableSSCertifiable

Group: General Forum Members
Last Login: Yesterday @ 8:59 PM
Points: 5,357, Visits: 8,912
GSquared (12/27/2010)
The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.

IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL
DROP TABLE #Numbers;

CREATE TABLE #Numbers (Number INT PRIMARY KEY);

INSERT INTO #Numbers (Number)
SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)
FROM sys.all_columns AS T1
CROSS JOIN sys.all_columns AS T2;

Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)[/code]


If you change the code to use the master.sys.all_columns, there is (thru SQL 2008R2) > 5000 rows in this view. When cross-joined to itself one time, this gives > 25 million rows.


Wayne
Microsoft Certified Master: SQL Server 2008
If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
Links: For better assistance in answering your questions, How to ask a question, Performance Problems, Common date/time routines,
CROSS-TABS and PIVOT tables Part 1 & Part 2, Using APPLY Part 1 & Part 2, Splitting Delimited Strings
Post #1039576
Posted Monday, December 27, 2010 2:20 PM


SSChampion

SSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampionSSChampion

Group: General Forum Members
Last Login: Friday, June 27, 2014 12:43 PM
Points: 13,872, Visits: 9,596
WayneS (12/27/2010)
GSquared (12/27/2010)
The first part is easy. It can be done in one Insert...Select if you just use the Row_Number() function.

IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL
DROP TABLE #Numbers;

CREATE TABLE #Numbers (Number INT PRIMARY KEY);

INSERT INTO #Numbers (Number)
SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)
FROM sys.all_columns AS T1
CROSS JOIN sys.all_columns AS T2;

Keep adding references to the system view till you get at least 60k rows. (Could be one instance, could be more. Will depend on the database you're doing this in.)[/code]


If you change the code to use the master.sys.all_columns, there is (thru SQL 2008R2) > 5000 rows in this view. When cross-joined to itself one time, this gives > 25 million rows.


Yep.


- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Post #1039586
Posted Monday, December 27, 2010 10:04 PM


SSC-Dedicated

SSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-DedicatedSSC-Dedicated

Group: General Forum Members
Last Login: Yesterday @ 10:28 PM
Points: 35,215, Visits: 31,666
hreisterp (12/27/2010)
Hello guys out there,

here is a simple sounding but yet interesting challenge.
Create a database with a single table named "numbers" with a single column of type int (or bigint).
create table numbers (number int not null)
create unique clustered index abc on numbers (number)

1.) Use as few as possible insert statements to add at least the numbers 1 to 60.000 (no gaps). I can do it in 15 insert statements.
2.) Use a standard SELECT query (no cursors etc) to select all prime numbers (prime number: can only be devided by 1 or itself without rest, 1 is not a prime number)
3.) Find a way to get the query result in less then 1 second on a normal PC (not a server with 36 CPU#s). Dual core machine with 4 GB RAM, 64 bit version of SQL Server 2008.

1.) and 2.) are quite easy. I bet many of you will find a solution easily.
But 3.) is a challenge
Let's see if someone can find a solution for 3.) I did not find one yet.


So which "contest" are you entering? Or is this homework? Also, if it's taking 6 seconds to gen only 60.000 rows even on an old machine, you should figure out what's wrong with the machine.


--Jeff Moden
"RBAR is pronounced "ree-bar" and is a "Modenism" for "Row-By-Agonizing-Row".

First step towards the paradigm shift of writing Set Based code:
Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

(play on words) "Just because you CAN do something in T-SQL, doesn't mean you SHOULDN'T." --22 Aug 2013

Helpful Links:
How to post code problems
How to post performance problems
Post #1039625
Posted Monday, December 27, 2010 11:43 PM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, April 16, 2012 4:14 AM
Points: 12, Visits: 13
Hello,

great post so far. The idea of using row_num is really smart.
I did this unsing standard SQL in the following way:
insert into numbers values (1)

insert into numbers select (select MAX(number) from numbers)+number from numbers

Repeat this 15 times! It will double the number of row on each iteration.
So already 1 point for the community

For the select I started with
select * from numbers a 
where
a.number>1 and
not exists (select null from numbers b
where b.number < a.number
and b.number>1
and a.number % b.number = 0)
order by a.number desc

OK. If you delete the 1 from the table (delete from numbers where number = 1) then you can remove the two > 1 constraints.
But surprisingly it takes 9 seconds on my dual core machine. Compared to the classic sieves this is PAINFULLY slow. The plan looks not very good (attached it).
It does a nest loop join and both times used the clustered index. It does an index scan on the outer loop (fine) and tries an index seek on the innner loop. Here it expects a hit count of 1 but ends up with seekin the complete index (that's why it is so slow).

One common optimization would be to change the condition from
b.number &lt; a.number  to b.number*b.number&lt;=a.number

(once the square of the number is larger a the number to check we can stop). But this doesn't help at all. It acutally causes an overflow error if you use type int for number. Changing to bigint solves this issue - but still performance is sad.

Then I though - SQL Server issue?? Tried it on a colleagues local Oracle 9 intallation - it ran for 10 minutes (then we stopped it)! OK - maybe on this machine something was wrong I'll try it again on my machine today.

This could actually be a good test for performance of the underlying query execution system of the DB. The problem does not depend much on the I/O system as the amount of data is minimal (<1 MB).

I tried a couple of other things:
- use a table variable :
declare @numbers table (number int)

... --> surprisingly MUCH slower

- used LINQ (this is for sure in memory. For table variables it is not 100% sure that they will not go to file system). Here's my code (VS 2010 Express compiles this)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
const int N = 65536;
int[] data = new int[N];
for (int i = 0; i < N; ++i) data[i] = i + 1; // should have used foreach ...
DateTime d = DateTime.Now;
// BEAUTIFUL !! Sometimes I luv myself for doing these things...
var q = from s in data where !(from o in data where (o < s) && (o > 1) && (s % o == 0) select 0).Any() && s > 1 select s;
Console.WriteLine(DateTime.Now - d);
// Console.ReadKey();
StringBuilder sb = new StringBuilder();
foreach (var item in q)
{
sb.Append(item + "");
}
Console.WriteLine(DateTime.Now - d);
Console.ReadKey();
Console.WriteLine(sb.ToString());
Console.ReadKey();
}
}
}

Runs a little faster (8 sec) on my machine. But that is most likely due to the network data transport system of SQL Server. That does not matter for me.
Actually there is an issue in C# here. If you use the square break conditon in C# it yield the wrong result!
var q = from s in data where !(from o in data where (o*o <= s) && (o > 1) && (s % o == 0) select 0).Any() && s > 1 select s;

It obviously uses int as data type for s and o. As there is no overflow check in C# (which acutally sucks) this will end somewhere around 40.000. Actually the compiler should be smart enough to figure out that for computing o*o it is required to use long int instead.... Maybe i should post a bug...
I'll keep trying..

P.S. We all know that the classical sieves do the job ultra fast. But those are speciallized algorithms. SQL MUST be smart enough to do this at least 10 times faster.


  Post Attachments 
Unbenannt.JPG (2 views, 40.57 KB)
Post #1039642
Posted Tuesday, December 28, 2010 1:00 AM
Grasshopper

GrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopperGrasshopper

Group: General Forum Members
Last Login: Monday, April 16, 2012 4:14 AM
Points: 12, Visits: 13
OK. Gave it a try in the "FAST" MySQL -> out of the box: 50 sec on the same machine. Silly. MySQL is ridiculously
Switching from InnoDB engine to MyISAM at least improves it to 18 sec. Only twice as slow as SQL Server (anyway: MyISAM is not a suitable engine at all...)
The SUPER DUPER InMemory engine ... forget it: 33 seconds
Done with MySQL ...
Post #1039657
Posted Tuesday, December 28, 2010 1:33 AM


Valued Member

Valued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued MemberValued Member

Group: General Forum Members
Last Login: Tuesday, January 18, 2011 4:52 AM
Points: 54, Visits: 195
it take me 8 secunds on my laptop
IF OBJECT_ID(N'tempdb..#Numbers') IS NOT NULL
DROP TABLE #Numbers;

CREATE TABLE #Numbers (Number INT PRIMARY KEY);

INSERT INTO #Numbers (Number)
SELECT TOP 60000 ROW_NUMBER() OVER (ORDER BY T1.OBJECT_ID)

FROM sys.all_columns AS T1
CROSS JOIN sys.all_columns AS T2;

Post #1039658
« Prev Topic | Next Topic »

Add to briefcase 123»»»

Permissions Expand / Collapse