Aug 08, 2016

Did you guys click on the paper itself? This is the scientific equivalent of clickbait.

As I read it, they didn't even measure working hours directly because "we have another issue in examining the effects of labor hours on cognitive functioning, that is, labor hours are censored (that is, retirees report zero working hours). Rather than directly using variables which correlate with labor hours, but do not correlate with cognitive functioning, we use these variables for creating the fitted values for squared of working hours and working hours as instruments."

And they run a regression to infer working hours from the region they live in, parent being alive, number of children and so forth.

Let me just get to the main point: I think they're just picking up a decline in cognition from lack of sleep. If someone is stressed out with all these obligations, then longer work hours will cut into their sleep.

Also, I don't see how they figure out which way that causality runs: could those who have good cognitive function afford to work fewer hours than those who don't?

Jul 18, 2016

I'm sorry, I am trying to explain something that is very clear in my head, and I'm pretty sure that I'm right, but I haven't had bio-stats training specifically so I do not know the language precisely. My background is in physics, computer science, and epistemology.

The functions I am referring to are estimators of cognitive indicators (like backwards digit span, say, that they use in the paper), as a function of working hours.

Take a look at page 21 for some plots. For each of the cognitive indicators, the estimator is a downwards parabola as a function of working hours. What I am saying is that this is an artifact of the analysis. The shape could be far different -- in fact it could be a bad case of curve fitting. Additionally -- why not just directly plot the data as a scatter plot or a binned average of cognitive indicators for bins between, say, 20 - 25 hours, 25 - 30 hours, etc? Then at least we could see if the parabolas are close to the data...

Jul 18, 2016

I'll try to state in this language what I think is the conceptual flaw.

The stated conclusion is "there is an optimum number of work hours, and it is less than 40."

However, their method of analysis is this:

"We fit a non-linear model that is a quadratic model cognitive ability y ~ ax + b x^2 for x work hours / week, and we found a statistical optimum around 25 hours"

The problem with this is that you're trying to find the best fit of a parabola to the data. And you have tons of samples where the work hours are very few / none (unemployed). Because there is a fairly strong correlation with unemployment and cognitive indicators, the parabola is already being "forced down" near hours worked = 0.

Now in this parabola model of estimated cognitive indicators vs work hours, either you are going to get a minima -- and it goes to infinity at working hours -> infinity (of course in real life it cannot really do this, because we only have so many hours in the week, but the statistical model will suggest it) -- or, you are going to get a maxima, which is what actually happens.

It could well be in the data that the indicators are that there is roughly flat, or even increasing response of cognitive indicators to working hours when the number of working hours is beyond a nominal value, but that the unemployed population has somewhat lower indicators.

In this case the model will automatically become a downward curving, parabola with a maxima, suggesting decline with increasing work hours -- even though this is not what the data directly suggests.

This maxima, the fact that there even is a "work hour optimum" that is a smooth, quadratic curve, is a mirage -- the model is not the data.

A remaining question is why the optimum is less than 40 hours. It is relatively easy to construct a statistical case in which it is a curve fitting artifact, despite that there is no direct data even at at the suggested optimum.

One could in principle check to see if this is the case. The data may be available.

For now, there's few graphs on page 20. It really doesn't seem to me that there is a significant distinction between the part time and full time groups -- in fact, the biggest difference is that more women who have a high reading score are not unemployed. Men who have a higher symbols score are more likely to be full-time employed instead of part-time, slightly -- but the converse is true for men with higher reading scores. The difference is not very distinct.

Jul 18, 2016

This is a paper on the work that the article cited:

There is a critical mathematical mistake with their use of statistical analysis which basically, as an artifact, bakes in this "conclusion" into the "results".

It comes about because they manually add "Working hours-squared/100" into the factor analysis, and then, low and behold, there is a parabola correlated with the data. It's a downward facing one, because an upward facing one, or no correlation will not match as well. But the model is not reality.

There are numerous other conceptual problems with the study, like "what is work?" and a lack of real data on the nature of work beyond what is correlated with socioeconomic indicators, like University education, however, it is remarkable that even at the level of statistic analysis, the work is critically flawed.

I am also disappointed that nobody else on hacker news seems to have pointed this out.

Hopefully, this can serve as yet another example, of how to interpret and understand data the new, be it hacker news or otherwise, presents us with. Statistical data analysis is both hard, and often times misleading.