RE: Performance challenge – SQLServerCentral

hreisterp

SSC Veteran

Points: 288

December 27, 2010 at 11:43 pm

#1266623

Hello,

great post so far. The idea of using row_num is really smart.

I did this unsing standard SQL in the following way:

insert into numbers values (1)

insert into numbers select (select MAX(number) from numbers)+number from numbers

Repeat this 15 times! It will double the number of row on each iteration.

So already 1 point for the community 🙂

For the select I started with

select * from numbers a

where

a.number>1 and

not exists (select null from numbers b

where b.number < a.number

and b.number>1

and a.number % b.number = 0)

order by a.number desc

OK. If you delete the 1 from the table (delete from numbers where number = 1) then you can remove the two > 1 constraints.

But surprisingly it takes 9 seconds on my dual core machine. Compared to the classic sieves this is PAINFULLY slow. The plan looks not very good (attached it).

It does a nest loop join and both times used the clustered index. It does an index scan on the outer loop (fine) and tries an index seek on the innner loop. Here it expects a hit count of 1 but ends up with seekin the complete index (that's why it is so slow).

One common optimization would be to change the condition from b.number < a.number to b.number*b.number<=a.number (once the square of the number is larger a the number to check we can stop). But this doesn't help at all. It acutally causes an overflow error if you use type int for number. Changing to bigint solves this issue - but still performance is sad.

Then I though - SQL Server issue?? Tried it on a colleagues local Oracle 9 intallation - it ran for 10 minutes (then we stopped it)! OK - maybe on this machine something was wrong 😉 I'll try it again on my machine today.

This could actually be a good test for performance of the underlying query execution system of the DB. The problem does not depend much on the I/O system as the amount of data is minimal (<1 MB).

I tried a couple of other things:

- use a table variable : declare @numbers table (number int) ... --> surprisingly MUCH slower

- used LINQ (this is for sure in memory. For table variables it is not 100% sure that they will not go to file system). Here's my code (VS 2010 Express compiles this)

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

namespace ConsoleApplication1

{

class Program

{

static void Main(string[] args)

{

const int N = 65536;

int[] data = new int[N];

for (int i = 0; i < N; ++i) data = i + 1; // should have used foreach ...

DateTime d = DateTime.Now;

// BEAUTIFUL !! Sometimes I luv myself for doing these things...

var q = from s in data where !(from o in data where (o < s) && (o > 1) && (s % o == 0) select 0).Any() && s > 1 select s;

Console.WriteLine(DateTime.Now - d);

// Console.ReadKey();

StringBuilder sb = new StringBuilder();

foreach (var item in q)

{

sb.Append(item + "");

}

Console.WriteLine(DateTime.Now - d);

Console.ReadKey();

Console.WriteLine(sb.ToString());

Console.ReadKey();

}

Runs a little faster (8 sec) on my machine. But that is most likely due to the network data transport system of SQL Server. That does not matter for me.

Actually there is an issue in C# here. If you use the square break conditon in C# it yield the wrong result!

var q = from s in data where !(from o in data where (o*o <= s) && (o > 1) && (s % o == 0) select 0).Any() && s > 1 select s;

It obviously uses int as data type for s and o. As there is no overflow check in C# (which acutally sucks) this will end somewhere around 40.000. Actually the compiler should be smart enough to figure out that for computing o*o it is required to use long int instead.... Maybe i should post a bug...

I'll keep trying..

P.S. We all know that the classical sieves do the job ultra fast. But those are speciallized algorithms. SQL MUST be smart enough to do this at least 10 times faster.