Prime Counting Variations: Sampling and Smoothing

Okay. I hope this has been puzzling and intriguing so far. Now let's change gears a bit and see how the style of identity we've been exploring can shed light on a specific, tough problem. We'll finally have the opportunity, or maybe the necessity, to dip our toes into some real calculus along the way.

What tough problem is that?

It's the question of approximating the count of primes.

So to clarify, you're suggesting the style of prime counting we've been working with can do that as well?

With the right ideas, yes. It is, at least to my eyes, quite natural to do. And in fact, we will work through a few approaches to this challenge. But to do that, let's return, as always, to our basic definition. We will need to alter it in a new way.

and

Changing the Sample Size

So are we going to take the same approach as before? Will we add a new helper function to the recursive definition again?

As a matter of fact... no - or, at least, not for this first approach. We need to adopt a slightly different plan of attack.

What does that entail?

Well, think about this. If we had wanted to, we could have written our original definition as

$$ f_k(n,j) = \rr{1} \cdot( \frac{1}{k}-f_{k+1}(\frac{n}{j},1+\rr{1}) ) + f_k (n, j+\rr{1}) $$

And then if we had graphed $f_1(n,1\rr{+1})$, we would have seen the same results we've already explored so much, right?

Sure, that makes sense. Although it wouldn't have been particularly useful.

That's true. But! It does give me a new seam, so to speak, to tug on. In fact, now I can change those $1$ values to another number, $\frac{1}{2}$, like so

And here, our base function is now $g_1(n, 1+\rr{\frac{1}{2}})$

Huh. What does that do?

Well, if we examine $g_1(n, 1 +\rr{\frac{1}{2}})$, and compare it to our original prime counting function $f_1(n, 2)$, the we'll see this:

That seems strange and, well, messy. It no longer seems related to prime numbers in any meaningful sense, that's for sure.

That's certainly true.

The new function is no longer pegged to integers, even, which is the first time we've seen that in these discussions.

But can you see why we need that?

Well, I suppose if we are interested in approximating the weighted count of prime powers with a smooth curve, we really should expect that we won't just be working with functions that only change at integers. At any rate, it changes a lot more often. So this new variation is kind of interesting, I suppose, but it's hard to see it as pointing in a useful direction.

Do you think it looks smoother, in some sense?

I don't know. Maybe.

Fair enough. Let's push this idea farther and see if it grows more useful. To do that, let's generalize this new function. The generalization will look like this:

Oh. So this generalization encompasses both the functions we were just graphing, I see. If $d=1$, then we get our weighted prime power counting function. And if $d=\frac{1}{2}$, we get the jagged, messy function we were just looking at.

Exactly right. But now that I've added this parameter $d$, here's a natural thought, a particular case that I hope your curiosity is drawn to. What do you suppose happens when $d = 0$ - or, rather, when we take that limit, really?

The limit when $d=0$? Well, $d=\frac{1}{2}$ was very jagged and messy. Does it get even more so? I'm really not sure.

That's fine. Let's just graph it and see. If you examine $h_1(n, 1 + d)$ when $d$ approaches $0$, and compare that graph to $h_1(n, 1+d)$ when $d=1$, you'll see this:

So what do you see?

Okay, so obviously this is much, much smoother. It actually looks like a really good approximation of the weighted count of prime powers. Like, out near $x=1,000,000$, the smooth curve is around $y=78620$, while the weighted prime counting function is around $y=78592$. So they only differ by $28$ around $x=1,000,000$.

Yeah, it's pretty interesting, isn't it? The distribution of the primes seems so unpredictable on a local level, the scale where we routinely interact with them, and yet they do seem to behave with this larger scale predictability.

Apparently they do?

There's something conceptual I hope to emphasize here. The above graph is, basically, the plot of a single concept with two different $d$ parameters, right?

Sure, that seems right. The two parameters are $d=1$ and then the limit of $d=0$.

How is that parameter functioning? What is it doing?

Well, it reminds me of the $d$ in an integral definition, I suppose. So it looks very much like we're quantizing some smooth curves, and $d$ is the size of the samples with which we are quantizing. As $d$ gets closer to $0$, the samples are getting infinitely small. Is that what you're getting at?

Yes, that is roughly how I see it, too, at least here.

So the amount the count of primes differs from this other smooth curve is, conceptually, almost like an aliasing error or something. I mean, if we're thinking about sampling.

I don't think it's too outlandish a thought.

Are you going to animate this as well, with emerging circles showing the recursive nature of the function?

No... or at least, perhaps another time. Such an animation could be really fascinating, but it would also be much trickier to do for performance reasons, so I'll have to pass on it right now. Consider this an invitation to make such an animation yourself!

On that subject, it seems like animating as $d$ goes back and forth between $1$ and $0$ for some fixed $n$ would be enlightening as well.

Definitely. I'd love to see that myself.

Okay, well what exactly is this curve, then?

That's a good question. This will require a bit of calculus, and I won't actually explain or justify these steps despite that being warranted, but the following can be worked through:

$$ \begin{aligned}\lim_{ d \to 0} h_1(n,1+\rr{d}) &= \int_1^x \, dt - \frac{1}{2} \int_1^x \int_1^{\frac{x}{t}} \, du \, dt + \frac{1}{3} \int_1^x \int_1^{\frac{x}{t}} \int_1^{\frac{x}{t \cdot u}} \, dv \, du \, dt -... \\ &= \int_1^x \, dt - \frac{1}{2} \int_1^x \frac{\log t}{1} \, dt + \frac{1}{3} \int_1^x \frac{\log^2 t}{2!} \, dt - \frac{1}{4} \int_1^x \frac{\log^3 t}{3!} \, dt + ... \\ &= \int_1^x \sum_{k=1} \frac{(-1)^{k+1}}{k} \cdot \frac{\log^{k-1}t}{(k-1)!} \, d t \\ &= \int_1^x \frac{1}{\log t}-\frac{1}{t \cdot \log t} \, d t \\ &= \mathrm{li}( x) - \log \log x - \gamma\end{aligned}$$

Here, $\mathrm{li}(x)$ is the logarithmic integral, and $\gamma$ is the Euler-Mascheroni constant. So this smoothed sum is (because $\log \log x$ is nearly a flat line ) almost entirely just the logarithmic integral.

Okay. So let me make sure I'm clear here. You're trying to suggest that, at least from one perspective, the weighted count of prime powers and the logarithmic integral are more or less the same set of curves being sampled at two different sample scales, right? And we can think of the differing sampling scales as responsible for their difference.

From one perspective, yes.

Is that useful?

I wouldn't say that. I just personally find it conceptually compelling, that's all.

Speaking of compelling, the particular ideas here can be rewritten in another, essentially equivalent recursive form that I've been ignoring in these discussions.

And then here, given some fixed whole number $n$, $$ \lim_{d \to \rr{1}} H_1( 1+d, d) = \Pi(n) $$ and $$ \lim_{d \to \rr{0}} H_1( 1+d, d) = \mathrm{li}(n) - \log \log n - \gamma $$

This form isn't quite as useful for exploring all the variants we've been working through, so I haven't brought it up. But I have to admit it catches my fancy that, if $d=1$, this is a method for computing the weighted count of prime powers $\Pi(n)$ that essentially only relies on addition and subtraction, without relying on multiplication or division (aside from the constant $\frac{1}{k}$ coefficients, of which a mere $\log_2 n$ terms will be used for any given $n$).

Okay. Perhaps these ideas are conceptually compelling. But how good of an approximation is the logarithmic integral, anyway? It's not clear to me if this way of approximating the weighted count of prime powers sheds light on that question, even if it does admittedly seem pretty natural.

Well, let's turn to that briefly.

So, the behavior of the difference between $\lim_{d \to \rr{1}} h_1(x,1 + d)$ and $\lim_{d \to \rr{0}} h_1(x,1 + d)$ is a version of arguably the most famous open problem in mathematics at the time of this conversation.

Wait, huh?

Riemann's Explicit Formula for $\Pi(x)$

The question here is about the long term behavior differences between the logarithmic integral and the weighted count of prime powers.

Right. How good of an approximation is the logarithmic integral?

Well, we just looked at a conceptual way those two functions are linked (or, if you can accept my argument, the two can be thought of as something like two parameters of the same function).

Sure.

That's one way to think about the difference. But there is a different, famously important method to describe the difference between those two functions. And it's much more challenging to tackle.

And what is that?

Well, in 1859, Bernhard Riemann published a very significant paper. In it, he introduced the Riemann zeta function, the Riemann zeta zeros, and the Riemann hypothesis. And in that paper, Riemann provides a formula for an explicit, exact relationship between the weighted count of prime powers and the logarithmic integral.

Oh.

Right. But it's not at all simple. And that formula is, in fact, $$\Pi(x)=\mathrm{li}(x)-\log 2 - \int_x^\infty \frac{1}{t(t^2-1) \log t} \, d t + \lim_{T \to \infty} \sum _{\substack{k=-T \\ k\neq 0}}^{ T } \rr{E_1\left(-\log x\cdot \rho _k\right)}$$

Here, $\rho_k$ are the non-trivial zeros of the Riemann zeta function, ordered from smallest imaginary part to largest, and $E_1(x)$ is the first order exponential integral - you'll often find this entire term written instead as $\mathrm{li}(x^{\rho})$. The other terms contribute almost nothing the final difference, so the difference between $\Pi(n)$ and $\mathrm{li}(n)$ is captured almost entirely by the sum in red, the one over the non-trivial zeros of the zeta function.

So how does that even work? Can we experiment with this here?

Well, unfortunately, we've entered territory where current JavaScript isn't ideal to experiment with, I'm afraid. But if you happen to have a copy of Wolfram Mathematica handy, this equation can be explored with the following code.

ZetaZeroCached[j_]:=ZetaZeroCached[j]=N[ZetaZero[j]]

PrimePowerFromExplicit[x_,numberOfZeros_, trivialZeros_]:=
	LogIntegral[x]-Log[2]+Sum[ExpIntegralE[1,-Log[x](-2j)],{j,1,trivialZeros}]+
	Sum[ExpIntegralE[1,-Log[x]ZetaZeroCached[j]]+ExpIntegralE[1,-Log[x] ZetaZeroCached[-j]],{j,1,numberOfZeros}]

But that code doesn't look quite the same as the above equation, does it?

Well, the limit of Sum[ExpIntegralE[1,-Log[x](-2j)],{j,1,trivialZeros}] is $- \int_x^\infty \frac{1}{t(t^2-1) \log t} \, d t$, if you use all the trivial zeros of the Riemann zeta function. But keep in mind we are approximating here. As I say, this part contributes almost nothing to the total, so I'll glide past it without more comment.

At any rate, suppose you used the first $400$ non-trivial zeta zeros. If you were to do that, and then graphed for $x$ from $1+\frac{1}{2}$ to $120$, you'd see the following approximations of the formula.

A Mathematica plot of the approximation of the explicit formula for the weighted count of prime powers, with 400 non-trivial zeros, from 2 to 10 — An approximation of the explicit formula for the weighted count of prime powers, with $400$ non-trivial zeros, for $x$ from $\frac{3}{2}$ to $10$

A Mathematica plot of the approximation of the explicit formula for the weighted count of prime powers, with 400 non-trivial zeros, from 2 to 120 — An approximation of the explicit formula for the weighted count of prime powers, with $400$ non-trivial zeros, for $x$ from $\frac{3}{2}$ to $120$

Okay, sure enough, that certainly seems to be producing the weighted count of prime powers.

The Difference Between $\lim_{d \to \rr{1}} h_1(x,1 + d)$ and $\lim_{d \to \rr{0}} h_1(x,1 + d)$

It does seem that way. Well, if we rearrange it all slightly, we can use it to ask how much the weighted count of prime powers and the logarithmic integral differ from each other. $$\Pi(x)-\mathrm{li}(x)=-\log 2 - \int_x^\infty \frac{1}{t(t^2-1) \log t} \, d t + \lim_{T \to \infty} \sum _{\substack{k=-T \\ k\neq 0}}^{ T } \rr{E_1\left(-\log x\cdot \rho _k\right)}$$

ZetaZeroCached[j_]:=ZetaZeroCached[j]=N[ZetaZero[j]]

PrimePowerDifference[x_,numberOfZeros_, trivialZeros_]:=
	-Log[2]+Sum[ExpIntegralE[1,-Log[x](-2j)],{j,1,trivialZeros}]+
	Sum[ExpIntegralE[1,-Log[x]ZetaZeroCached[j]]+ExpIntegralE[1,-Log[x] ZetaZeroCached[-j]],{j,1,numberOfZeros}]

And if we plot that, we'll see something like this.

A Mathematica plot of an approximation of the difference between the weighted count of prime powers and the logarithmic integral, using 400 non-trivial zeros — An approximation of the difference between the weighted count of prime powers and the logarithmic integral, with $400$ non-trivial zeros, for $x$ from $\frac{3}{2}$ to $10$

So, to reiterate, we seem to have expressed the difference of the two with this sum over the zeta zeros.

Okay, I suppose I follow all of that, or follow it enough anyway, but what does this all mean about our original function $h_1(x,1 + d)$?

Well, because the formula we just looked at for the difference between the two relies on the zeros of the Riemann zeta function, the behavior of the formula (meaning the difference between $\lim_{d \to \rr{1}} h_1(x,1 + d)$ and $\lim_{d \to \rr{0}} h_1(x,1 + d)$, essentially) is in turn deeply affected by the conjectured behavior of those zeros.

Ah, okay.

This obviously gets much deeper into complex analysis than I want this conversation to be. Still, we can immediately say something much more straightforward and useful about your question about the Riemann hypothesis, and what it means about the difference between $\lim_{d \to \rr{1}} h_1(x,1 + d)$ and $\lim_{d \to \rr{0}} h_1(x,1 + d)$.

Oh? What's that?

Well, look at this graph of the difference between the two...

Do you see those two smooth curves that bound the difference?

Well, obviously. What's special about them, though?

Well, you asked how good of an approximation the logarithmic integral is to the weighted count of prime powers. Those curves give a very precise answer.

Wait, what answer is that?

If you zoom down the number lines of this graph, the graph of the difference between $\lim_{d \to \rr{1}} h_1(x,1 + d)$ and $\lim_{d \to \rr{0}} h_1(x,1 + d)$ never penetrates either of those curves after around $x=105$, right?

Sure. I don't think the difference even looks like it gets close.

Well, we're only graphing up to around $x=1,000,000$, which is not such a big number compared to the infinity of integers mathematicians tend to be interested in.

Fine.

But if the Riemann hypothesis is true, that jagged, erratic line, that difference, provably never, ever penetrates those lines every again. Ever. It stays bounded by them. For the derivation of that, see this 1975 article from Lowell Schoenfeld, Sharper bounds for the Chebyshev functions $\theta(x)$ and $\psi(x)$.

Okay. But, in terms of a bigger picture, what does this all mean?

It means that the logarithmic integral is as good of an approximation to the count of primes as we can hope, and the behavior of the primes is likewise about as tidy and orderly as we can hope. If the hypothesis fails, the primes are more unruly, and the logarithmic integral is a worse approximation.

Obviously, there is no shortage of other popular writing about the Riemann zeta function and the Riemann hypothesis, so let's move on, okay?

And an Aside about a Slight Variation

Well, wait a moment. Something here has caught my eye, and I must admit I'm both a bit confused and a bit curious.

What's that?

Our smoothing formula above ended up being $\mathrm{li}(x) - \log \log x - \gamma$.

True.

Meanwhile, Riemann's formula just seems to be about $\mathrm{li}(x)$, the logarithmic integral. Does that difference matter?

Actually, that observation grows even more acute if you rewrite the first order exponential integrals, the ones using zeta zeros that we wrote as $\rr{E_1\left(-\log x\cdot \rho _k\right)}$, as logarithmic integrals too - if you read around, you'll find that is in fact very frequently done, as $\mathrm{li}\left(x^ {\rho _k}\right)$.

Okay. So conceptually, what accounts for that difference?

Well, the simplest answer is this. Start with $\mathrm{li}(x) - \log \log x - \gamma$ and take the limit as $x \to 1$. Now, what do you get?

If I graph that, the limit appears to be $0$ when $x=1$.

Exactly. Normally, the logarithmic integral shoots off to $-\infty$ at $x=1$ and causes all sorts of trouble. The extra $\log \log x + \gamma$ usefully cancels out the behavior. Because we are talking about multiplicative ideas, where $x=1$ is the identity element, this is handy.

Is this similar to the way that $\log(1+x)$ is frequently useful for Taylor series, rather than $\log x$? Come to think of it, $\log x = 0$ when $x=1$ as well...

I don't think that's a crazy observation. In fact, the main identity we're using here, for both the weighted count of prime powers and the logarithm, is a partial sum convolution equivalent of that Taylor series for logarithms. Suppose we defined the following two functions, $$S'_{k}(n)= \displaystyle \underbrace{\int \int \dots \int}_{\substack{x_1 \cdot x_2 \cdot ... \cdot x_k \leq n \\ \textrm{ x is a \rr{real} number} > 1}} \, d x_k \dots \, d x_2 \, d x_1 $$ and $$D'_{k}(n)= \displaystyle \sum_{\substack{ m_1 \cdot m_2 \cdot ... \cdot m_k \le n \\ \textrm{ m is a \rr{whole} number} > 1 }} 1 $$

And so $S'_{k}(n)$ is just a real valued version of the discrete $D'_{k}(n)$.

Right. With those, we could have just as easily rewritten $\lim_{d \to {0}} h_1(x,1 + d)$ as $$\lim_{d \to {0}} h_1(x,1 + d) = \sum_{k=1}^\infty \frac{(-1)^{k+1}}{k} \cdot S'_k(n) = \mathrm{li}(n) - \log \log n - \gamma$$ and $\lim_{d \to {1}} h_1(x,1 + d)$ as $$\lim_{d \to {1}} h_1(x,1 + d) = \sum_{k=1}^\infty \frac{(-1)^{k+1}}{k} \cdot D'_k(n) = \Pi(n)$$

And so that's obviously analogous to the Taylor series for logarithms, $$\sum_{k=1}^\infty \frac{(-1)^{k+1}}{k} \cdot x^k = \log(1+x)$$ But I don't quite see where you're going with this...

Well, I want to note a different way to arrive at the logarithmic integral from here, one that doesn't include the $\log \log x + \gamma$ terms.

Oh.

So let's set aside that Taylor series for a moment and look more closely at $S'_k(n)$ and $D'_k(n)$.

Okay.

The trick here is to generalize these definitions for $S'_k(n)$ and $D'_k(n)$ from a whole number $k$ to a complex number $z$.

How do we do that?

Well, it's not much work to see that $S'_k(n)$ can be defined as $S'_k(n) = \int_1^x \frac{\log^{k-1} t}{(k-1)!} \, dt$. Given that, as long as we generalize the factorial to the gamma function as $z! = \Gamma(z+1)$ which is pretty standard, we can just generalize our definition as $$S'_z(n) = \int_1^x \frac{\log^{z-1} t}{(z-1)!} \, dt$$

Okay.

Generalizing $D'_k(n)$ to some complex parameter $z$, on the other hand, is not at all obvious. So I'll just obscurely claim the following is not an unreasonable generalization, owing to deep connections between $D'_k(n)$ and binomial coefficients. And that definition, which I won't really justify and relies on the definition $\mathrm{sinc}(z) = \frac{\sin z}{z}$, $\mathrm{sinc}(0) = 1$, is $$D'_z(n) = \sum_{j=0}^\infty \mathrm{sinc}(\pi\cdot(z-j)) \cdot D_j'(n) $$

Which I suppose does, at the least, give correct values if $z$ is a positive whole number. So what is interesting about these definitions?

Well, if you take their derivatives with respect to $z$, and then take the limit at $z=0$, I promise that you'll find, if you put in the work, that you have $$ \lim_{z \to 0} \frac{\partial}{\partial z} S'_z(n) = \int_1^x \frac{1}{\log t} \, dt$$ and $$ \lim_{z \to 0} \frac{\partial}{\partial z} D'_z(n) = \Pi(n) $$

Ah, okay. But of course, that integral doesn't even converge...

Too true. But then, the normal logarithmic integral always makes trouble at $x=1$, so we should expect that. It's the specific reason $\mathrm{li}(x) - \log \log x - \gamma$ is nice.

At any rate, as I say, this is all just perhaps a curious aside. I have another interesting lens about the difference between the count of primes and the logarithmic integral, so let's move on to that.

A Prime Counting Gallery

3-1: Sampling and Smoothing

Changing the Sample Size

Riemann's Explicit Formula for $\Pi(x)$

The Difference Between $\lim_{d \to \rr{1}} h_1(x,1 + d)$ and $\lim_{d \to \rr{0}} h_1(x,1 + d)$

And an Aside about a Slight Variation