Counting Primes: Speed Comparisons with Other Algorithms

Okay, this is lot of fascinating information, but let's get down to running code. How do these different algorithms actually compare in actual practice to each other?

That's a great question. Let's look at that.

Once again, below is a table that spawns off a bunch of browser worker threads and executes different algorithms to count primes. But this time, in addition to the algorithms we evolved here, I'll include single-threaded implementations of a few other, better known prime counting algorithms.

Which ones?

Well, I'll include four, and let me preface this by saying this isn't intended as an industrial strength, high-precision speed comparison. This is a rough, approximate affair.

So, the first approach is a normal (meaning, iterative) implementation of trial division, which must be about the simplest way to count primes.

Sure - I think it's taught in nearly every intro to CS class.

Trial division runs something like $O(n^\frac{3}{2})$ time and a trivial amount of space.

Okay. Will you also include that implementation of the Legendre formula for counting primes from before?

I will. That will be the second approach I include.

The third approach will be a somewhat optimized JavaScript implementation of the Sieve of Eratosthenes, which I didn't write - the implementation was taken from here, written by StackOverflow user Ovinus Real.

Okay.

Excluding log factors, the Sieve of Eratosthenes runs in roughly $O(n)$ time and $O(n)$ space, but if you use a segmented sieve, the memory bound drops to around $O(n^\frac{1}{2})$.

And the final entry, considerably more complicated, is a JavaScript implementation of the Lagarais-Miller-Odlyzko algorithm, which I also didn't write. That implementation was also taken from here, written by StackOverflow user GordonBGood. And again excluding log factors, that algorithm runs very roughly in $O(n^\frac{2}{3})$ time and $O(n^\frac{1}{3})$ space.

And variants of that have been the workhorse setting actual prime counting records, right?

You are indeed correct. Anyway, to see these live and in action, just click that "Test Algorithms" button and see what happens. Or alternatively, if you click on the table headings, you'll see the actual JavaScript implementations that are being run live, too.

Okay. So largely as expected, of the new algorithms, trial division is by far the slowest, and the complicated Lagarais-Miller-Odlyzko is very clearly the fastest.

Right, exactly as expected.

And it seems like the fastest version we developed here is nearly as fast as that last algorithm. So that's promising, right? On my middling computer, our version computes $\pi(10^{15})$ in about 58 seconds, and the LMO version seems to compute it in around 56 seconds. So is it competitive?

Well, it certainly shows that the constant time performance of the approach we developed here seems quite favorable. And that's not a bad thing!

But this is a case where looks can be deceiving and why rigorous algorithmic analysis is crucial. Because we should expect that, for an algorithm that runs in roughly $O(n^\frac{2}{3})$ time, input of f(10*n) ought to take very roughly $10^\frac{2}{3} \approx 4.64$ times as long to execute as f(n). And that's not what we're seeing here in our version, without even dipping our toes into more formal analysis of these algorithms.

More generally, we are really hitting the limits of working in JavaScript in a browser context, although I'll admit to being surprised it works as well and quickly as it does. JavaScript's native number data type can only represent integers without precision errors up to around $10^{15}$. So we really can't even run these algorithms usefully for larger numbers, even to get timings. And turning to BigInts opens its own can of worms...

I suppose this is why JavaScript isn't the main language people work in for the topics, yeah.

True enough.

Leaving the Browser

But if you'd like to see what industrial strength implementations of these algorithms look like, you'll have to leave the web browser sandbox and look to projects like Kim Walisch's primecount project. Such projects are much more complicated, but they're also much more capable of using multi-threading and native hardware features to get really significant constant time speed ups, as well as really clever caching and memoization techniques for significant constant time improvements.

Obviously this is all very rough timing, so keep that in the back of your mind, but what do you see if you run that program on the same machine you've been using?

Well, you noted before that the basic prime counting identity we've been building algorithms from is something like an alternative to Legendre's original identity.

Right.

So if I use that program and run its implementation of Legendre's original algorithm, the results looks something like this:

A table of rough timings of Legendre's algorithm, implemented in Kim Walisch's primecount program — Very coarse timings of Legendre's algorithm, implemented in Kim Walisch's primecount program

It's hard to tell how its timing grows. It's quite a lot faster, but I assume the C++ code can use multi-threading and hardware intrinsics and other helpful features to attain that speed, along with very smart and aggressive caching and memoization. In terms of execution time, and just eyeballing things, it doesn't seem radically out of line with the best algorithm we developed, though.

Perhaps not. But of course, keep in mind that Legendre's original algorithm runs in something like $O(n)$ time and $O(n^\frac{1}{2})$ space.

Oh, right. Meanwhile, if I run that program with its most refined algorithms, which are subtle evolutions of the LMO approach, I get timings that look like this:

A table of rough timings of significant evolutions of Legendre's algorithm, implemented in Kim Walisch's primecount program — Very coarse timings of significant evolutions of Legendre's algorithm, implemented in Kim Walisch's primecount program

And?

Ah. Yeah, that's really, blazingly fast. And I see what you mean more generally; more than its speed, its execution time seems to increase at a slower rate as the input increases.

Right. It's pretty clearly faster asymptotically than the approach we developed here.

So what does that mean?

Well, let's turn to that question.