Two important ideas really speed up this particular prime counting technique.
I'll introduce each in turn, because the code grows somewhat more complex with both applied at once. It'll be easier to see one at a time. So let me introduce the first, which is more straightforward.
Go ahead.
The idea is to only use divisors that aren't themselves divisible by the first several primes. For example, if we discarded numbers divisible by the first 3 primes, the recursive weighted prime counting function becomes
$$g(j)= \begin{cases} 0 & \text{if } 2 | j \textrm{ or } 3 | j \textrm{ or } 5 \\ 1 & \text{otherwise} \end{cases} $$ $$p_k(n,j)=\rec{\rr{g(j) \cdot }(\frac{1}{k}-p_{k+1}(\frac{n}{j},2)) + p_k (n, j+1)}$$Do that, and we'll see, with a bit of work, that we can still get the weighted count of prime powers, now as
$$\Pi(n) = H_{\lfloor \log_2 n \rfloor} + H_{\lfloor \log_3 n \rfloor} + H_{\lfloor \log_5 n \rfloor} + p_1(n,7) $$And here, $H_n$ are harmonic numbers, defined as $H_n = \sum_{k=1}^n \frac{1}{k}$. An animation of this idea in action might help make this approach clearer.
Oh, okay. So clearly there is a lot less to count this way. Thus it's much faster.
That's the idea, yes. But discarding 3 primes isn't nearly ambitious enough; this approach is very sensitive to wheel factorization. So instead, here, we'll discard numbers divisible by the first 8 primes.
So the smallest possible prime factor will be $23$.
Right. So the function we'll work with looks like this:
$$h(j)= \begin{cases} 0 & \text{if } 2 | j \textrm{ or } 3 | j \textrm{ or } 5 | j \textrm{ or } 7 | j \textrm{ or } 11 | j \textrm{ or } 13| j \textrm{ or } 17| j \textrm{ or } 19 | j \\ 1 & \text{otherwise} \end{cases} $$ $$p_k(n,j)=\rec{\rr{h(j) \cdot }(\frac{1}{k}-p_{k+1}(\frac{n}{j},2)) + p_k (n, j+1)}$$Do that, and we'll see this new function actually still counts primes, with a bit of adjustment, as
$$\pi(n) = \pi(19) + \sum_{j=1} \frac{\mu(j)}{j}p_1(n^\frac{1}{j},23)$$Interesting. From the new recursive definition, I can see why it must execute much more quickly. That smallest value of $j$ it uses in the recursive definition is $23$, followed by $29$. As long as recursion is stopped whenever $h(j)=0$, it ends up skipping way past most small numbers.
Exactly. And it gets even better - here, I'm manually testing the first $8$ primes to keep the code simple and the technique foregrounded. But later, caching and proper wheel factorization will mean we can step through all the numbers not divisible by the first few primes quickly without needing those divisibility tests.
Do I get to watch an animation of this, too?
Well, this wheel factorization is so extreme that the first composite number still used as a divisor is $23^2 = 529$. So there's really not much to see at the scales these animations thrive at.
Ah.
Want to see how this code works?
Sure.
Okay. And keep in mind I'm writing this in the most straightforward way I can think of.
If you ran the algorithm, did you notice how much faster it ran than the previous, non-wheel version?
Sure. On my perfectly serviceable laptop, the non-wheel algorithm could compute $\pi(10^5)$ in about 2.6 seconds. This version, by comparison, computes $\pi(10^5)$ in 3.8 milliseconds, and it computes $\pi(10^8)$ in about 3.8 seconds. So... yeah, that's vastly faster.
These algorithms are strikingly sensitive to wheel factorization. It's not unusual for wheel factorization, in the context of other prime-related algorithms, to offer a speed up of, say, x6, which is certainly worth it. But here, it seems more like x1000.
It's pretty extreme.
Well, just wait. But to introduce the second technique, I'll need to reorganize our approach a bit. So let's turn to that.