A system built with the same peak flops using GPUs would actually be more useful than what they built. The Sunway TaihuLight processing elements have 64KB scratchpad, no data cache, and have less memory bandwidth than Nvidia Fermi (7 years old.)
For most applications the processing elements will be so starved for data that they are just an idle waste of energy.
For technical information, read the paper: http://www.netlib.org/utk/people/JackDongarra/PAPERS/sunway-...
Compared to Tianhe-2, the previous top system, 2.7x more flops, and 3.2x more flops per watt.