Apr 23, 2017

On system for extracting information from PDFs is Fonduer[1], which is built on the Snorkel framework from Stanford. It may be worth checking out for your use case. Here's a blog post introducing it [2].

Disclosure: I worked on the project.

[1] https://arxiv.org/abs/1703.05028

[2] https://hazyresearch.github.io/snorkel/blog/fonduer.html