Unfortunately, as usual, what Intel says and what actually happens are two quite different things:
> Shortly after we learned about Intel’s Software Guard Extensions (SGX) initiative, we set out to study it in the hope of finding a practical solution to its vulnerability to cache timing attacks. After reading the official SGX manuals, we were left with more questions than when we started. The SGX patents filled some of the gaps in the official documentation, but also revealed Intel’s enclave licensing scheme, which has troubling implications.
> After learning about the SGX implementation and inferring its design constraints, we discarded our draft proposals for defending enclave software against cache timing attacks. We concluded that it would be impossible to claim to provide this kind of guarantee given the design constraints and all the unknowns surrounding the SGX implementation. Instead, we applied the knowledge that we gained to design Sanctum , which is briefly described in § 4.9.
> This paper describes our findings while studying SGX. We hope that it will help fellow researchers understand the breadth of issues that need to be considered before accepting a trusted hardware design as secure. We also hope that our work will prompt the research community to expect more openness from the vendors who ask us to trust their hardware.
According to , on recent Intel CPUs, each instruction is translated by hardware decoder to up to four micro-ops: either trivial micro-ops like addition, subtraction, bitwise and/or/xor, or a special "microcode assist" micro-op which is essentially a function call into the CPU microcode table. According to the same source, CPU microcode table is believed to consist roughly of 20,000 micro-ops which handle edge cases like rare instructions, rare prefixes, FPU denormals, traps/exceptions, all that stuff. Also, CPU microcode table is believed to contain full-blown implementations of RSA and SHA-256 in order to support microcode updates.
So, yes, there's a performance gap between instructions with hardware fast path and ones which require a microcode assist.