Nov 03, 2020

The idea is that you want to split your data as a sum of three parts:

(a) a smooth function that is not interesting

(b) noise

(c) a big unexpected peak in an interesting part

There is a nice preprint posted a few days ago: https://arxiv.org/abs/2010.09761 https://arxiv.org/abs/2010.09761.pdf

They reanalyze the data than in the phosphine paper.

---

If you look at figure 3, they have the data that is the skyscraper-like line, and they use a polynomial of degree 3 to approximate the signal, that is that curved unhappy smooth line.

The smooth line is (a), when you subtract this smooth function, you get the other part of figure 3, that is the noise (b). There are some high and low parts, but nothing too high or low that look special. So their conclusion is that there is no interesting part (c).

---

If you look at figure 2, top left, they use a slightly smaller interval, but now they use a polynomial of degree 12 instead of a polynomial of degree 3. This is the smooth function (a).

This is a reconstruction of the process in the original paper. They fit the polynomial using the data, but excluding the central part.

The problem with the polynomial of degree 12 is that is has too much freedom, so it fits the actual smooth curve, but it also fit the noise.

The polynomial of degree 3 has to go somewhat in the middle of the data, because it can't go up and down too many times. The polynomial of degree 12 can follow the local bumps and fit the noise.

When you subtract the polynomial of degree 12 you get the the graph in the third row, with the noise that is (b). It is copied in Figure 2, and it is very similar to the graph in the original paper.

Since the polynomial of degree 12 fit the noise, the noise is too small, so you underestimate the noise level.

And since the central part was skip in the fit, in some case you get a big bump like here. It is bigger than the apparent level of noise so it looks like an unexpected peak (c).

But here the problem is that you are comparing the peak with the surrounding noise level, but the noise level is underestimated because the polynomial of degree 12 overfit.

---

They repeat the same kind of analysis in other regions, and they get a few additional fake peaks. This are the other 5 graph it the top of Figure 3.

Nov 01, 2020

I was more convinced that the original paper may have problems by https://arxiv.org/abs/2010.09761 From that preprint:

> We find that the 12th-order polynomial fit to the spectral passband utilised in the published study leads to spurious results.

Using a 12th-order polynomial fit is very suspicious? And I think that it is very interesting to see Figure 2, top-left. The signal is very noisy and the polynomial fit is not convincing. That graph is a very big red flag.

Oct 24, 2020

[A study](https://arxiv.org/abs/2010.09761) posted 2020-10-19 challenges the claim that phosphine was detected in Venus's atmosphere.

[The original claim](https://www.nature.com/articles/s41550-020-1174-4), posted 2020-10-14.