Cyan4973/Yann also is well known for xxhash, which is one of the faster hashers out there. Post a new hasher and people will probably ask about xxhash (ex: metrohash). Guy is an absolute machine. If you search Google for 'zstd' right now, you'll find him, not Facebook, namely: https://github.com/Cyan4973/zstd . Glad his work is being supported by someone now! Immensely well deserved, after so many years of helping make everyone fast.
PS - I didn't know a lot of the terms you used, but the Finite State Entropy (FSE) link you provided does a good job intro'ing some of them, and the linked paper Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding  (ANS) seems interesting.
It points to even older paper for ANS: http://arxiv.org/abs/0710.3861
The more recent one is: http://arxiv.org/abs/1311.2540
The first one is about kind of multidimensional analogue of Fibonacci coding ( https://en.wikipedia.org/wiki/Fibonacci_coding )- on a lattice, such that there cannot be two neighboring '1's. It also uses an interesting concept of Maximal Entropy Random Walk.
LZ4 is a format. All you can say about how energy efficient a format is, is by looking at the average speeds of its implementations. However, zlib for instance (a DEFLATE library) is more energy efficient than zopfli (another DEFLATE library), just by virtue of taking less CPU cycles to crunch data.
There is in fact a very high correlation between CPU cycles and energy efficiency, since compression algorithms don't sit idle and use roughly the same instructions. In fact, Yann Collet's Zstd uses the same principles as LZFSE, as both were sprouted from Jarek Duda's research: http://arxiv.org/abs/1311.2540.
The reference LZ4 implementation is absolutely more energy efficient than LZFSE, and in fact Apple does push for its use by offering it in its compression library. However, it tends not to compress as well as both LZFSE and Zstd. For 4G or WiFi (or even broadband), the time lost by transferring more data is not compensated by the time won by decompressing it faster, resulting in much slower downloads than even zlib. LZ4 is still relevant for higher speeds, such as those offered by magnetic hard drives. (Beyond a certain speed, such as for SSDs, compression no longer offers a benefit, but you might be ok with the slowdown given that you win drive space.)
There is a separate discussion to be had about the fact that the open-sourced LZFSE reference implementation is not the one they use (which explains how little they touched it since), as it does not even have ARM-specific code. Also, LZFSE does not claim to be patent-unencumbered. LZ4 and Zstd do have optimized code for ARM.
All in all, it is not a stretch to assume that Apple benefits from this FUD, which explains why there is no comparative benchmark anywhere to be found on their GitHub or in their documentation. It really looks like Zstd is better all around.