This series of articles evolves a surprisingly fast method to count prime numbers that uses almost no memory at all.
The approach here is hopefully a good springboard to even faster approaches, too; the articles end with a brief overview of one such approach, running in $O(n^{\frac{2}{3}+\epsilon})$ time and $O(n^{\frac{1}{3}+\epsilon})$ space.
There'll be a good survey of other fast prime counting techniques, and there'll be lots of running JavaScript code showing off all these ideas and some striking animations too.
Although I'll do my best to make them work on phones, some of the interactive tests and animations and code really, really benefit from a larger screen.
These articles are in the form dialogues between unnamed voices.
I accept this is a bit, well, unusual. But whether you're into math, animation, programming, or just intellectually curious, I hope you find it engaging.
Okay, let me begin...
Here's a succinct, rather curious math puzzle.
What's that?
Want to hazard a guess as to what the following fairly dense bit of JavaScript does?
p=(n,k=1,m=1,j=2)=>n<j?0:1/k-p(n,k+1,j,j+j)+p(n,k,m,m+j)
And you would run it as p(100)
.
Show some mercy for your poor audience! That's got to be way more compact than absolutely necessary.
Well, I'll admit I'm amusing myself a touch writing the idea this way. But yes, if you have some fixed constant $n$, this idea can be rewritten, with a bit more empathy towards the reader, as $$p_k(m,j)=\rec{\frac{1}{k}-p_{k+1}(j,j+j) + p_k (m, m+j)}$$ with the value of interest being $$p(n) = p_1(1,2)$$
So, what do you suppose p(n)
does?
I have no idea. It's just some fairly opaque recursive function. It looks like it adds some terms up, cycling between addition and subtraction. Whatever it's doing can't be too complicated, at any rate, right?
Okay. Well I'll pause now and strongly encourage you to explore it a bit before I go on.
But if you've had your fill of that, at least consider running the following very trivial experiment.
Here's a bit of JavaScript code that actually lists the behavior of p(n)-p(n-1)
for several small values of $n$.
There's nothing tricky about it; just the function and then a code scaffold to print out some of those differences.
p=(n,k=1,m=1,j=2)=>n<j?0:1/k-p(n,k+1,j,j+j)+p(n,k,m,m+j)
checkZero = n => Math.abs(n) < 1e-13 ? 0:n
for(let j = 2; j <= 40; j++ )
console.log(j + ": " + checkZero( p(j)-p(j-1) ))
If you have a JavaScript environment handy, consider running this right now. At the time of this conversation, most web browsers can run JavaScript in their interactive consoles with a simple tap of F12.
Well, okay. I'll copy and paste that code snippet after tapping F12, and I get...
...huh.
Wait, really?
So p(n)-p(n-1)
equals $1$ at primes for some reason, and $0$ for most composites, unless...
...unless $n$ has only one prime factor.
Specifically, if $n=p^k$, then p(n)-p(n-1)
equals $\frac{1}{k}$.
Which is to say, p(n)
counts all the primes raised to whole number powers (with a weighting factor) that are $ \le n$.
If you want to explore the function, it looks like this if graphed up to a million...
Another way to put this is that if $\pi(n)$ is the notation for the count of primes (this has nothing to do with 3.14159... and is, oddly, standard notation), then
$$p(n) = \pi(n) + \frac{1}{2}\pi(n^\frac{1}{2}) + \frac{1}{3}\pi(n^\frac{1}{3}) + \frac{1}{4}\pi(n^\frac{1}{4}) + \dots$$
and so p(n)
computes the Riemann Prime Counting function.
Note that because $\pi(1) = 0$ and only changes values at whole numbers, the sum on the right only has $\lfloor \log_2 n \rfloor$ non-zero terms.
There must be some trick here. Aren't there formulas that "count" primes by checking each number, one by one, to see if it's prime, then adding up the results? Is that the case here?
Oh, you mean a formula like $$\pi(n) = -1 + \sum_{j=3}^n \left( (j-2)!-j \cdot \lfloor \frac{(j-2)!}{j} \rfloor \right)$$ given by Hardy and Wright, applying Wilson's Theorem term by term, I suppose?
Sure. Is it doing something like that?
Okay... well, this is a bit of a digression, but hopefully an amusing one. Let's take a look. We can actually rewrite that sum recursively as $$ g(n, j, p) = \begin{cases} p - j \cdot \lfloor \frac{p}{j} \rfloor + g(n, j+1, p \cdot (j-1)) & \text{if } n \geq j \\ 0 & \text{otherwise } \end{cases} $$ and then we can count primes, as long as $n>3$, with $$\pi(n) = -1+g(n, 3, 1)$$ And so code doing that (with a bit of rewriting) looks like
function g(n, j = 3, p = 1) {
if(n < j) return 0;
return p % j + g(n, j + 1, p * (j-1));
}
We can actually use an idea like that to make a pretty compelling animation of that idea in action, where you can watch primes get counted in real time.
So that works, but...
But what? That looks pretty interesting and tidy, doesn't it?
Well, first, that's obviously not even remotely how $p_k(m,j)$ from above works. The definition for $p_k(m,j)$ doesn't even involve any multiplication, let along a calculation of remainders from integer division. And, tidy though this form may seem, precision issues are really vexing here. In JavaScript, this only counts primes up to around $n=20$ before you need to upgrade to special kinds of numbers. And even ignoring precision, factorials grow really, really fast. The numbers involved become otherworldly, impossibly large far too quickly for this to be useful. At any rate, you can clearly see the primarily test in the original sum, as $(j-2)!-j \cdot \lfloor \frac{(j-2)!}{j} \rfloor $, which is the entire trick here.
Oh.
But you might well wonder if there is some other way to write such a recursive function that relies on something more standard than Wilson's theorem and its factorials...
Right. What about trial division, where you try to divide by all the numbers up to $\sqrt{n}$, and if none of them divide $n$ cleanly, then $n$ is prime? Is that perhaps what we're doing above with $p_k(m,j)$.
Well, let's look at such an approach. In fact, the following bit of JavaScript, taken from here and then optimized a bit to be fair to trial division, works exactly that way, working through trial division recursively to count primes.
h=(n,j=Math.floor(n**.5+1.000000001))=>n>1&&(--j<2)+(n%j?h(n,j):h(n-1))
And although that bit of code is intentionally hyper-compressed by trollish design, the role of the divisibility test driving logic inside it - n%j?
- is pretty clearly visible.
This bit of code (which I didn't originally write) is actually fun to puzzle out in its own right.
But if you want to see the logic in more normal math notation, try this.
If you define the following general characteristic function, $$ \mathbf{1}_P = \begin{cases} 1 & \text{ condition } P \text{ is true} \\ 0 & \text{ otherwise } \end{cases} $$ you can rewrite that bit of JavaScript as $$h(n, j) = \begin{cases} \mathbf{1}_{j-1 < 2} + h(n, j-1) & \text{if } n > 1 \text{ and } j > 1 \text{ and } n \bmod j \neq 0 \\ \mathbf{1}_{j-1 < 2} + h(n-1, \sqrt{n-1}) & \text{if } n > 1 \text{ and } j > 1 \text{ and } n \bmod j = 0 \\ 0 & \text{if } n \leq 1\end{cases}$$ and then you can count primes with. $$ \pi(n) = h(n, \sqrt{n}) $$
Go down this road, and you can arrive at an even more appealing animation (although I have to rewrite the recursive functions a bit to make the animation coherent). And this time the recursive forms captures the process of counting primes via trial division.
Okay. So right, this one counts primes by applying trial division to each number, one by one, and doing this up to the square root of each number. If the square root is crossed, then the number is prime, which the animation colors green. And after a while, it's clear the primes themselves are the slowest part of this process. So that makes sense; I can see the outlines of that approach here, sure...
And?
And that doesn't seem to be what we're doing with our original $p_k(n,j)$. This exploration of trial division is nice, but it is only leaving me more unclear.