Counting Primes: How This Works

Where is the Prime Testing in the Original Function, $p_k(m,j)$?

In comparison to the trial division idea we just explored, I can't figure out how the expression we started with, $p_k(m,j)$, tests the divisibility of our range of numbers, one by one. I'm just not seeing it. There's literally not even any multiplication or division in there, aside from the $\frac{1}{k}$ terms...

And it's not hard to confirm that $k$ is never larger than $\log_2 n$, and so, for example, if $n$ were a billion, the largest $k$ would be 29. So that's far too small to be testing divisibility...

...right. It's just behaving like a constant anyway, probably a coefficient. It's not testing anything. So I can't possibly see any way numbers are being tested for divisibility in there... right? I mean, look at it. $$p_k(m,j)=\rec{\frac{1}{k}-p_{k+1}(j,j+j) + p_k (m, m+j)}$$ Where is the primality test?

You don't see it? Good! You shouldn't be able to find anything of the sort. Because there isn't any such logic. It's counting primes (or, well, prime powers), but it's not testing and identifying individual primes to do so, at all. Instead, something deep and combinatorial is at work.

It must be pretty deep, because I'm not seeing it...

Well, let's finally turn to an animation of $p_k(n,j)$ itself. And hopefully that animation might provide some intuition about what's going on, combinatorially, in addition to being hopefully intriguing too. First, let's rename the weighted count of prime powers function p(n) from above into something a bit more normal. So we will notate it as $\Pi(n)$, which is $$\Pi(n) = \pi(n) + \frac{1}{2}\pi(n^\frac{1}{2}) + \frac{1}{3}\pi(n^\frac{1}{3}) + \frac{1}{4}\pi(n^\frac{1}{4}) + \dots$$

Okay, so basically a simple renaming.

Right. And then I'll rewrite the recursive idea slightly. The core combinatorial idea here can be rewritten in a number of different forms. The form I started with is fascinating, or it is to me, and it does work but it employs a technique programmers refer to as strength reduction, which converts multiplication into addition in certain special circumstances. And that does make the bigger idea a bit harder to reason about and to implement.

Ah, okay.

So for the rest of these articles, I'll work with another, exactly equivalent form of the combinatorial idea. This one tends to be more convenient to use in programming. And so I'll use that now for this animation. The basic recursive form I will turn to instead looks like this:

$$p_k(n,j)=\rec{\frac{1}{k}-p_{k+1}(\frac{n}{j},2) + p_k (n, j+1)}$$ $$\Pi(n)=p_1(n,2)$$

And that does the exact same thing, somehow?

It does. Both of these forms, harnessed correctly, generate the exact same recursive structure.

So, using that form, with a bit of cleverness, produces the following animation.

The circles in this animation correspond directly to recursive function calls of $p_k(n,j)$. So you can mouse over them to see what function is being called at each point. Or, on mobile device, you can tap on them for the same information.

Okay. Oh, wait though... So from this animation, it looks like that recursive function is walking through every single possible solution to $j_1 \cdot j_2 \cdot j_3 \cdot ... \le n$ where $j_k>1$, right? And as $n$ grows, that count of solutions grows bigger and bigger.

Right.

And the counting is weighting each solution by how many terms it contains. And then, somehow, combinatorially, the sum of those weighted counts changes at primes or prime powers, and it stays 0 otherwise. So it truly isn't testing if any numbers are prime at all.

That's exactly right. It's counting those solutions, and then by adding them up the right way, a weighted count of prime powers comes out the other side.

This animation does seem to emphasize the lack of primality testing. And if I let it run for a while, you can see the outlines of something like a discrete hyperbolic fractal emerging, which is really intriguing. But I still don't understand how prime-ness, so to speak, is connected to any of this.

Well, it's exactly that seeming lack of connection that makes this approach fruitful. And that's because determining which numbers divide some given number is generally slow. In fact, to get another perspective on how $p_k(n,j)$ functions, watch what happens if we sort the divisors a different way, attaching them to numbers they are factors of. We'll have the exact same divisors being added at the same time, but we'll have a different graph if we do this. To explore this, the animation uses the following notation. $\frac{\Lambda(n)}{\log n}$ is just $\Pi(n) - \Pi(n-1)$. It equals $\frac{1}{k}$ if $n = p^k$, and 0 otherwise, so it's a weighted prime power characteristic function.

Arranged this way, it looks much more like primality testing, I think.

That's exactly right. Here, we take each number, find all sets of numbers it can be decomposed into as products of various divisors, and then add up the counts of those sets in a very specific way. I won't provide a tidy formula that does this, but if I were to, the formula would function quite a lot like trial division... but it would be computationally worse, actually. It would be extremely slow. Still, that can be done.

In fact, to heighten that comparison, let's return to a slightly altered animation of trial division again.

They do resemble each other, but trial division uses boolean logic whereas this other approach uses arithmetic on the counts of these divisors, right?

Exactly.

The approach you started with and trial division are like inversions of each other, though, right? In trial division, primes force the longest chain of operations as they're tested for divisibility by all numbers up to their square root. In the approach you started with, it's exactly the opposite. Numbers that decompose into a lot of divisors have the most products that need to be counted, whereas primes don't have any.

Those are sharp observations, yes.

Doesn't it seem like this divisor counting approach produces far more recursive nodes, though? Shouldn't that be a lot slower?

That's a reasonable guess, but this is where the visualization breaks down. Every single node in the trial division animation requires a divisibility test, and thus a division operation. That adds up. The nodes in our divisor counting graph, on the other hand, can often be tallied in bulk if we count well, so in practice it can be vastly faster, computationally. We will use this observation aggressively in the next several pages to good effect. At any rate, I hope this digression provided suggestive visual evidence for how this approach might not be so far off, in spirit, from something like trial division.

Well, it would be if you reordered how you counted, right?

Right. Which is exactly why we won't do that. Let's turn back to our original case, where we counted divisors without ever touching a single divisibility test. This specific combinatorial feature is at the heart of this idea's significance, if counting the number of primes up to some really large values of $n$ interests you.

Why is that?

Well, if you think about it, with a bit of recourse to big O notation, identifying prime numbers one by one can never possibly be faster than, very roughly, $O(n)$. If you increase your input by a factor of ten, it's going to take at least ten times as long to check the new range for primes one-by-one.

Oh, sure.

So if you want to count primes faster than $O(n)$, you need a way to count primes that doesn't rely on testing and identifying individual primes one by one. You need a way to count primes that doesn't involve divisibility tests.

Counting Primes in Bulk

Wait, is that really possible? I thought the distribution of prime numbers among the whole numbers were notoriously unpredictable.

Surprising, isn't it.

Well, certainly at an individual level they are. But in fact prime numbers, collectively, do have properties that, with a lot of cleverness and some serious mental sweat, can be harnessed for faster counting. Later, I will provide a short survey with striking animations and running code of a few other major approaches to prime counting that can be evolved to count faster than $O(n)$ - so faster than identifying primes one by one - by harnessing some powerful ideas. These techniques are often really fascinating but also sometimes pretty challenging to understand. Many of them start with an arithmetic form of the Sieve of Eratosthenes and then evolve clever techniques from there. Those approaches have been the workhorses of setting prime counting records for the last several hundred years.

But the entire goal of this series of dialogues here is to explain and actively demonstrate (with live running code in the browser) how the core mathematical idea behind $p_k(n,j)$, above, can be evolved to be the basis for surprisingly fast prime counting algorithms, both in constant time terms, and in terms of their asymptotic bounds... and especially in terms extremely low memory usage.

So are you saying this approach can made to run faster than $O(n)$?

With some serious work, definitely. And I have even more optimism for it in the future.

Are these approaches the fastest way to count primes, then?

Well, that's a surprisingly complicated question, so I'll tantalizingly hold off on answering it. I will return to it, because it deserves a nuanced answer! But let's use this idea to count some primes quickly first.

Möbius inversion

Okay, but one quick objection... I'm trying to follow along with the basic idea we started with. But I thought we wanted to count actual primes - so $$\displaystyle \pi(n) = \sum_{p \le n} 1$$ And yet you said the recursive identity we'll be working with is this: $$p_k(n,j)=\rec{\frac{1}{k}-p_{k+1}(\frac{n}{j},2) + p_k (n, j+1)}$$ And that computes a sum of the prime counting function instead, as $$p_1(n,2) = \sum_{k=1} \frac{\pi(n^\frac{1}{k})}{k}$$ Isn't that a problem?

Oh, you mean this difference?

Well spotted... but no, not really. Going from $p_1(n,2)$ back to the prime counting function is actually quite simple with just a touch of ingenuity. The technique to do that will stay constant through all the examples, so I'll cover it quickly and then use it without comment afterwards. This is a very common, well-trod, unobjectionable technique.

Suppose, as before, I name that sum of prime counting functions as

$$\Pi(n) = \sum_{k=1} \frac{\pi(n^{\frac{1}{k}})}{k} $$

Which means $\Pi(n)$ equals $p_1(n,2)$, using that recursive formula $p_k(n,j)$ from before, right?

Right.

So we have $\Pi(n)$ in terms of $\pi(n).$ But what we want is really want is $\pi(n)$ in terms of $\Pi(n)$.

Meaning we have to invert the sum somehow?

Precisely.

So, start with

$$\Pi(n) = \pi(n) + \frac{1}{2}\pi(n^\frac{1}{2})+ \frac{1}{3}\pi(n^\frac{1}{3})+\frac{1}{4}\pi(n^\frac{1}{4})+\dots$$

Now, the goal is to isolate one $\pi(n)$ term on the right side of the equation.

So, our first step is to take that equation, multiply by $\frac{1}{2}$, and change the variable $n$ with $n^\frac{1}{2}$. And that gives us

$$\rr{\frac{1}{2}}\Pi(n\rr{^\frac{1}{2}}) = \rr{\frac{1}{2} \cdot}\pi(n\rr{^\frac{1}{2}}) + \rr{\frac{1}{2} \cdot} \frac{1}{2}\pi(n^{\rr{\frac{1}{2}\cdot}\frac{1}{2}})+ \rr{\frac{1}{2} \cdot} \frac{1}{3}\pi(n^{\rr{\frac{1}{2}\cdot}\frac{1}{3}})+\rr{\frac{1}{2} \cdot} \frac{1}{4}\pi(n^{\rr{\frac{1}{2}\cdot}\frac{1}{4}})+\dots$$

Now subtract that from the original equation, leaving

$$\Pi(n)-\frac{1}{2}\Pi(n^\frac{1}{2}) = \pi(n) + \frac{1}{3}\pi(n^\frac{1}{3})+\frac{1}{5}\pi(n^\frac{1}{5})+\frac{1}{7}\pi(n^\frac{1}{7})+\dots$$

Ah! So that removed the term $\frac{1}{2} \pi(n^\frac{1}{2})$ from the right side.

Exactly. Then we repeat the process again, by taking that very first equation again, multiplying by $\frac{1}{3}$, and changing the variable $n$ with $n^\frac{1}{3}$. And that gives us

$$\rr{\frac{1}{3}}\Pi(n^{\rr{\frac{1}{3}}}) = \rr{\frac{1}{3}}\pi(n^{\rr{\frac{1}{3}}}) + \rr{\frac{1}{3} \cdot} \frac{1}{2}\pi(n^{\rr{\frac{1}{3}\cdot}\frac{1}{2}})+ \rr{\frac{1}{3} \cdot} \frac{1}{3}\pi(n^{\rr{\frac{1}{3}\cdot}\frac{1}{3}})+\rr{\frac{1}{3} \cdot} \frac{1}{4}\pi(n^{\rr{\frac{1}{3}\cdot}\frac{1}{4}})+\dots$$

Subtract that, too, and now we have

$$\Pi(n)-\frac{1}{2}\Pi(n^\frac{1}{2})-\frac{1}{3}\Pi(n^\frac{1}{3}) = \pi(n) + \frac{1}{5}\pi(n^\frac{1}{5}) - \frac{1}{6}\pi(n^\frac{1}{6})+\frac{1}{7}\pi(n^\frac{1}{7})+\dots$$

And so that removed the term $\frac{1}{3} \pi(n^\frac{1}{3})$ from the right side, too.

Yes.

Huh... I thought I understood the pattern, but there's no $\frac{1}{4} \pi(n^\frac{1}{4})$ term to remove. And also, too many $\pi(n^\frac{1}{6})$ terms have been subtracted, so I suppose one will have to be added back in. What exactly is the deeper pattern here?

Well, if we continue this process, it'll eventually terminate after $\lfloor \log_2 n \rfloor$ steps, because $\pi(n)=0$ if $n < 2$. And we will eventually have

$$\Pi(n)-\frac{1}{2}\Pi(n^\frac{1}{2})-\frac{1}{3}\Pi(n^\frac{1}{3})-\frac{1}{5}\Pi(n^\frac{1}{5})+\frac{1}{6}\Pi(n^\frac{1}{6})+\dots = \pi(n)$$

This might seem unfamiliar, but this combinatorial inclusion-exclusion process, called Möbius inversion, is actually quite standard.

The sequence of coefficients that end up being used here have a special name and notation, $\mu(n)$, the Möbius function.

So, finally, using that function, we can rewrite $\pi(n)$ in terms of $\Pi(n)$ as

$$\pi(n) = \sum_{j=1} \frac{\mu(j)}{j} \cdot \Pi(n^{\frac{1}{j}})$$

Alright.

Riemann using Möbius inversion in a version of his 1859 paper. His $f(n)$, $F(n)$, and $(-1)^{\mu(m)}$ are the equivalent of $\Pi(n)$, $\pi(n)$, and $\mu(j)$. Scan courtesy of the Clay Mathematics Institute and the University of Göttingen.

As an interesting bit of history, Riemann used exactly this technique with exactly these functions in his original 1859 paper that includes the Riemann hypothesis. The inversion, with somewhat different notation, shows up in the picture here.

That's Möbius inversion. Let's return to prime counting...