Jul 23, 2016

I'm sure many will read into the similarity between the bandwagon for information theory described here and the bandwagon for machine learning / artificial intelligence. While it may be obvious for some, let me decode (one of the few situations Genius would make sense for me) some of the repeating issues. As I've noted elsewhere[1], machine learning / artificial intelligence hold great promise for many fields, but we need to fight hype and unscientific use. To steal from Jack Clark[2]:

"Machine Learning is modern alchemy. People then: iron into gold? Sure! People now: shoddy data into new information? Absolutely!"


"Our fellow scientists in many different fields, attracted by the fanfare and by the new avenues opened to scientific analysis, are using these ideas in their own problems."

"[E]stablishing of such applications is not a trivial matter of translating words to a new domain, but rather the slow tedious process of hypothesis and experimental verification."

This is immensely important. While many of these methods appear general, and can be used with little effort thanks to the wide variety of machine learning toolkits that are available, they should be applied with care. Applying these methods to problems in other domains without careful consideration for the differences and complexities that might arise.

The availability of advanced toolkits does not make your work impervious to flaws. With machine learning, it's even worse - the flaws in your data, model, or process, can be explicitly worked around by the underlying machine learning algorithm. That makes debugging difficult as your program is, to some loose degree, self repairing[3].

Using these toolkits without proper analysis and experimental proof that they're working as intended, especially when their predictions are used for an important decision, is negligence.

"Research rather than exposition is the keynote, and our critical thresholds should be raised."

As a field, we don't have a strong grasp on many of the fundamentals. Issues that are obvious in hindsight are hiding in plain view. Just a few days ago, layer normalization popped up. It will likely make training faster and results better for a variety of applications. You can literally explain the idea to a skilled colleague in all of ten seconds. Somehow we were using a far more complicated method (batch normalization, weight normalization, etc) before trying the "obvious" stuff

We need more work like that than papers and media publications grandstanding about vague potential futures that have little theoretical or experimental basis.

Also, it's worth reading Shannon's "A Mathematical Theory of Communication"[4] from 1948. There's a reason it has 85,278 citations - entire fields started there.

[1]: http://smerity.com/articles/2016/ml_not_magic.html

[2]: https://twitter.com/jackclarkSF/status/755257228429406208

[3]: https://twitter.com/sergulaydore/status/746098734946201600

[4]: http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20...

Jul 15, 2016

I think this is an absolutely fantastic idea. I think every field should do something like this for its "seminal" papers! I wanted to get something like this going for "A Mathematical Theory of Communication"[1] which was just posted [2] a week ago!

[1]: http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20... [2]: https://news.ycombinator.com/item?id=12079826

Mar 20, 2016

This isn't quite related to your question, but I think Markov chains made their debut in the original information theory paper by Shannon: http://worrydream.com/refs/Shannon%20-%20A%20Mathematical%20...

It doesn't necessarily matter whether they're still in use. All of these tools are worth learning, so that if a problem ever comes up that can be solved by markov chaining, you'll know it right away.

Same for all the other tools, though. And there are a lot of them.