Counting Primes: Using Symmetry Reduction for Speed

Okay. Now let me introduce another valuable idea that speeds up this prime counting approach.

Go ahead.

The trick is to pay attention to symmetries and redundancies in how we count.

What do you mean?

Well, an example should help point in the right direction, I think. Suppose we're computing $D_3'(300)$. What is that function counting, conceptually?

That's the number of solutions to $m_1 \cdot m_2 \cdot m_3 \le 300$, where $m_k > 1$.

Good. Now, when computing that answer, how many times does the triple of numbers $4 \cdot 5 \cdot 6$ multiplied together get counted?

Well, let's see. We would have $$4 \cdot 5 \cdot 6$$ $$4 \cdot 6 \cdot 5$$ $$5 \cdot 4 \cdot 6$$ $$5 \cdot 6 \cdot 4$$ $$6 \cdot 4 \cdot 5$$ $$6 \cdot 5 \cdot 4$$ So it looks like that product shows up $6$ times.

And?

Well, that's basic combinatorics, right? The count is $6$ because there are 3 items, and so there are $3!$ total variations.

That's right.

So what does that mean?

Well, as you say, that's basic combinatorics. So, in the case of $D_3'(n)$, if $m_1$, $m_2$, and $m_3$ are all distinct numbers, they will be counted $6$ times. If, on the other hand, $m_1 = m_2 = m_3$ - like $6 \cdot 6 \cdot 6$ - then they will only be counted by $D_3'(n)$ once. And finally, if two of the terms are the same but the third is different - like $4 \cdot 4 \cdot 3$, then they will by counted 3 times by $D_3'(n)$.

Huh. Okay, I think I follow that. But given that, what can we do with it?

Well the trick is this. We add an extra useful constraint to our counting that radically speeds it up. So, as an example, now when computing $D_3'(n)$, we add a new constraint for counting: $m_1 \le m_2 \le m_3$. And then we manually add how many times that product shows up by using combinatorics.

Ah. So instead of counting each variation of $4 \cdot 5 \cdot 6$ by hand and counting it 6 times, we instead count in such a way that we only ever iterate over the triple $4 \cdot 5 \cdot 6$ once, and then we manually note it shows up $6$ times.

Precisely.

And I suppose that means that for $D_3'(n)$, counting this new way, $m_1$ can never be larger than $n^{\frac{1}{3}}$. And $m_2$ can never by larger than $ \sqrt{\frac{n}{m_1}} $.

Exactly. This vastly pares down how many numbers we count through in our loops.

So let me define a new family of functions that uses combinatorics to determine how many times a set of divisors gets counted: $f(m_1, m_2, ... m_k)$ Then, using this new function, we can rewrite our divisor count functions as

$$D'_{ 1}( n ) = \sum_{\substack{m_1=2}}^n 1$$ $$D'_{ 2 }( n ) = \sum_{\substack{ m_1 \cdot m_2 \le n \\ 1 < m_1 \le m_2}} f(m_1,m_2)$$ $$D'_{ 3 }( n ) = \sum_{\substack{m_1 \cdot m_2 \cdot m_3 \le n \\ 1 < m_1 \le m_2 \le m_3}} f(m_1,m_2,m_3)$$

and so on.

Of course, we want to shift to a more computational frame of mind here. So we can rewrite those sums more explicitly (with somewhat less clarity) as

$$D'_{ 1}( n ) = \sum_{\substack{m_1=2}}^n 1$$ $$D'_{ 2 }( n ) = \sum_{\substack{m_1=2}}^{\lfloor n^\frac{1}{2} \rfloor} \sum_{\substack{m_2=m_1}}^{\lfloor \frac{n}{m_1} \rfloor} f(m_1,m_2)$$ $$D'_{ 3 }( n ) = \sum_{\substack{m_1=2}}^{\lfloor n^\frac{1}{3} \rfloor} \sum_{\substack{m_2=m_1}}^{\lfloor (\frac{n}{m_1})^\frac{1}{2} \rfloor} \sum_{\substack{m_3=m_2}}^{\lfloor \frac{n}{m_1 \cdot m_2} \rfloor} f(m_1,m_2,m_3)$$

and so on.

Okay. Just from those definitions, you've convinced me this ought to result in a serious speed increase. But how is that function $f(n)$ defined?

Well, this really is basic combinatorics. If $m$ is some array of integer values sorted from lowest to highest, then $$f(m) = \frac{k!}{\prod_{i} k_i!} = \binom{k}{k_1, k_2, k_3, ...}$$ where $k$ is the length of the array $m$, and $k_i$ is the multiplicity (number of occurrences) of the $i$-th distinct value in $m$. This is the multinomial coefficient. This function can be computed with the following bit of JavaScript code

Can you follow that?

I suppose that doesn't seem too bad in the grand scheme of things.

I hope not. At any rate, given those coefficients, we can now, as I just said, restrict our counting to the case where $m_1 \le m_2 \le \dots \le m_k$. And if we do that, our transformed code now looks something like this.

I apologize in advance for the function D_2, which has had a few more basic optimizations applied to it for the sake of performance. I will be hand-waving past that. I recognize that might be a bit perplexing, although it just amounts to inlining theoretical (here non-existent) specialized D_1and D_0 functions, all of which can be derived from applying specific constants as parameters to D_k.

Anyway, how does running this algorithm, which does not have wheel factorization applied to it, look?

Well, once again, it is vastly faster than the prior version that didn't include this optimization. The earlier, unoptimized version computed $\pi(10^5)$ in around 1.56 seconds. Here, this algorithm on my run-of-the-mill machine computes $\pi(10^5)$ in around 1.2 milliseconds. And it computes $\pi(10^9)$ in around .87 seconds. So this is blazingly faster.

Do those ratios sound familiar?

Yeah, they are comparable to the speed-up that came from wheel factorization, right?

They seem to be, don't they?.

Wait, but is there anything that conflicts between the two methods?

That's an interesting question, isn't it? Let's pursue that further.