Just to mirror what was said on the thread a month ago when the paper came out, if you're interested in FastText I'd strongly recommend checking out Vowpal Wabbit and BIDMach.
My main issue is that the FastText paper  only compares to other intensive deep methods and not to comparable performance focused systems like Vowpal Wabbit or BIDMach.
Many of the features implemented in FastText have been existing in Vowpal Wabbit (VW) for many years. Vowpal Wabbit also serves as a test bed for many other interesting, but all highly performant, ideas, and has reasonable strong documentation. The command line interface is highly intuitive and it will burn through your datasets quickly. You can recreate FastText in VW with a few command line options.
BIDMach is focused on "rooflining", or working out the exact performance characteristics of the hardware and aiming to maximize those. While VW doesn't have word2vec, BIDMach does, and more generally word2vec isn't going to be a major slow point in your systems as word2vec is actually pretty speedy.
To quote from my last comment in  regarding features:
Behind the speed of both methods [VW and FastText] is use of ngrams^, the feature hashing trick (think Bloom filter except for features) that has been the basis of VW since it began, hierarchical softmax (think finding an item in O(log n) using a balanced binary tree instead of an O(n) array traversal) and using a shallow instead of deep model.
^ Illustrating ngrams: "the cat sat on the mat" => "the cat", "cat sat", "sat on", "on the", "the mat" - you lose complex positional and ordering information but for many text classification tasks that's fine.
Link to the paper: https://arxiv.org/abs/1607.01759
Quotes from the paper:
Both char-CNN and VDCNN are trained on a NVIDIA Tesla K40 GPU, while our models are trained on a CPU using 20 threads.
Table2 shows that methods using convolutions are several orders of magnitude slower than fastText.
Our speed-up compared to CNN based methods increases with the size of the dataset, going up to atleast a 15, 000× speed-up.
Table 2 shows the speedups of:
ConvNets: 2 to 5 days on GPUs
FastText: 52 seconds on CPU