Every Model Learned by Gradient Descent Is Approximately a Kernel Machine
Dec 27, 2020
Thanks for the reference. Here it is:“We show, however, that deep networks learned by the standard gradient descent algorithm are in fact mathematically approximately equivalent to kernel machines, a learning method that simply memorizes the data and uses it directly for prediction via a similarity function (the kernel)“