At work we had a little challenge in December to see who could come up with a program to count the number of primes between any two numbers (0 to 2^32-1 inclusive) as fast as possible. To this end I wrote this optimized Sieve of Eratosthenes (algorithm) that counts all the primes from 0 to 4,294,967,295 in about 13.75 seconds on my (admittedly fast) development machine at work.

If you come up with any improvements to it, let me know!

(2/2/2007 - I'm going to go ahead and add the link for the multicore Sieve of Eratosthenes here. On a 3GHz dual core Xeon, 7.15s!)

## 8 comments:

Hi Paul, cool blog!

Just thought I'd add that there are several known prime counting functions, Pi(x), that return the number of primes less (and/or equal) to x. One such function is Lehmer's Formula:

http://mathworld.wolfram.com/LehmersFormula.html

These functions are pretty straightforward to implement and would be much faster than a sieve. Given an implementation of such a function, the asked for value is directly given by Pi(upper limit) - Pi(lower limit).

Thank you for this information, that's interesting! One question - do you know if that's exact or approximate? I know a variety of prime count estimation functions exist, but I did not believe an exact function existed.

One of my colleagues had an approximation approach that relied on long splines which was interested.

A little late but, the above formula gives an upper boundary on the number of primes less than x, so it's not an exact number of primes.

Alexander

primegen uses the Sieve of Aitken, the following is output on a P4 3.2Hgz:

203280221 primes up to 4294967295.

Timings are in ticks. Nanoseconds per tick: approximately 0.313275.

Overall seconds: approximately 7.785068.

I'm somewhat surprised that Sieve of Eratosthenes is that fast, but it won't compile under gcc, I think because of the inline assembly. After uncommenting the C before the inline asm and fixing some scope errors, I got it to compile cleanly, but it produces incorrect results (always returns 1), so I guess something is messed up. Anyway, if your work computer is comparable to mine, it looks like the Sieve of Aitken is about 2x faster, without any inline assembly.

http://cr.yp.to/primegen.html

I checked out the Sieve of Atkin on Wikipedia and I can't say that I really understood it. Anyhow I have a multicore version of my SofE that runs in 7.15s on my dual core work machine - kind of cheating. ;)

This Sieve of Eratosthenes implementation can be that fast in this case because:

A.) It's limited to 32 bits.

B.) It uses every trick in the book to make it run like hell.

It really is just an example of a very simple brute force algorithm being optimized to the point of absurdity.

This is a very late response, but yes, the prime counting functions are exact (despite anonymous statement to the contrary). See e.g.:

http://mathworld.wolfram.com/PrimeCountingFunction.html

These funcions have been discussed ad nauseum in sci.math in the various "JSH" threads over the last N years.

Here's one of the posts which contain a lot of useful information (if you're into this sort of thing):

http://groups.google.com/group/sci.physics/msg/fc3424e2de072253?dmode=source&hl=en&output=gplain

BTW, it would be helpful if the comments were date stamped, and listed somewhere on the front page (so you knew there was a reply). I just happened to slip on a banana peel again, ending up here for the second time! ;)

Recompiled the multi-core Sleeve of Eratosthenes, with a tiny improvement and came in at 1.640 seconds on quadcore 9650 at 3.67ghz running in 64-bit mode using intel compiler. Don't have any of your blogger id's but call me Thomas

I realize this thread is rather old, but I stumbled on it and while in a curious mood, modified senzee.c to compile on gcc. It wasn't hard to port the ASM code to gcc's syntax, but it wasn't necessary since gcc -O3 gives quite a bit faster code with the for loop than with using the hand-rolled ASM. It's really hard to beat optimizing compilers these days. I also made it a bit faster by using a lookup table of byte masks rather than computing 1 << (i & 7). This is a trick I learned somewhere when I wrote my own sieve. I have no problem sharing the modified code, but I don't know how to attach something to a post...

Also, while your sieve is nicely optimized and fast, there is one significant optimization lacking: small prime pre-sieving using a wheel. I use this in my sieve for a speedup of over 4x. The code isn't supremely documented, but is available as part of the YAFU integer factorization package at: http://sites.google.com/site/bbuhrow/

Post a Comment