Several people have asked for background material for the workshop--
(Wallach, 2015, http://arxiv.org/pdf/1510.02855.pdf) AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery
(Duvenaud, 2015, http://papers.nips.cc/paper/5954-convolutional-networks-on-g...) Convolutional Networks on Graphs for Learning Molecular Fingerprints
(Kearnes, 2016, http://arxiv.org/abs/1606.08793v1) Modeling Industrial ADMET Data with Multitask Networks
(Gómez-Bombarelli, 2016, doi:10.1038/nmat4717) Design of efficient molecular organic light-emitting diodes by a high-throughput virtual screening and experimental approach
and of course
(Gómez-Bombarelli, 2016, https://arxiv.org/abs/1610.02415) Automatic chemical design using a data-driven continuous representation of molecules
One of the authors of the original paper https://arxiv.org/abs/1610.02415 here.
From a machine learning point of view, we simply glued together two techniques: text autoencoders and Bayesian optimization. That is, we trained an autoencoder to transform a text representation of chemicals (SMILES) to and from continuous vectors. Then “chemical design” is just maximizing a function of a continuous variable, something that we already know a lot about.
We also showed off some of the nice things that one can do with continuous latent representations, such as interpolation. This had already been done for images by many people, and for text in https://arxiv.org/abs/1511.06349
Of course, our paper is just a proof of concept. For instance, instead of encoding to and from SMILES, it would be much better to encode to and from graphs directly. We know how to encode graphs into vectors, but I don’t know of a good way to decode a vector into a graph.
Another open problem is that it’s hard to know what to optimize. Our initial experiments optimizing for specific chemical properties produced suggested molecules with crazy structures, such as giant rings. Human chemists have a great intuition for what is easily synthesizable or stable, and it’s hard to articulate all the properties we want the molecule to have programmatically. Alternatively, we could enforce the optimizer to only look at molecules similar to ones we’ve already seen, but this is unsatisfying too - after all, the best result of exploration is when you find something unlike what you’ve seen before.