Only a few HN submissions each year are so complete that it is nearly impossible to gather comments to start a discussion. Well done!
I wonder how close we are to running these "excessive" ensembles in a production environment.
Like how we went from using decision trees to random forests, it seems to me only a natural progression to move from random forests, to a random forest of random forests.
Some Kaggle competitors use over a 1000 RF estimators in their ensemble, but this is not yet possible/pragmatic to put in production for most use cases. But an ensemble of 10 complex base estimators is already within reach for applications that demand the highest accuracy.
About the Netflix prize, the engineers said:
> This is a truly impressive compilation and culmination of years of work, blending hundreds of predictive models to finally cross the finish line. We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.
So it also depends on the additional gains, if going the route of complex ensembles makes any business sense. But the next 20 years can make a lot of difference.
Anyone have experience putting complex ensemble models in production?
Another progress I find really interesting is the https://arxiv.org/abs/1701.06538 "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer". It seems possible to learn how to selectively prune a giant ensemble, to select a handful of experts that do well on particular samples. This makes it computationally feasible to get predictions from a giant ensemble. In the paper they solely use neural nets, but I guess there is no reason to not try this with other models, like SVM's or gradient boosted decision trees.