Very interesting; I've ignored random forests more than I should. Thank you!
I won't comment on your methods except to say that comparing classification over linear PCA against a kernel PCA in linearly unseparable data isn't exactly fair, and I think that providing an SVM performed on a different kernel PCA decomposition or a kernel SVM itself would be more illustrative. (Is your code available?)
I generally think of neural networks as enormous meta-kernels. (Composites of <composites of...> kernels) This generally leads me to think of ways that kernels can be turned into neural network layers.
Great work has been done turning powerful tools like Random Fourier Features/Kitchen Sinks into layers in neural networks (e.g., Alex Smola's Deep-Fried ConvNets [https://arxiv.org/abs/1412.7149] and Choromanski's Structured Adaptive/Random Spinners[https://arxiv.org/abs/1610.06209]). Deep Forest [https://arxiv.org/abs/1702.08835] is a method which claims to work well, but it's somewhat odd; It's not quite what I would have imagined.
My biggest criticism of random forests is that the more expressive models are more memory-hungry and expensive at both training-time and run-time than many comparable methods. But bounded-complexity trees with smart implementations seem to be a lot more useful and have broader applications than I give them credit for.