Can we use neural tangent kernel [1] to get the benefits of new neural net architectures while using the budgeted kernel machines? See also the paper on the path kernel [2].
The kernel needs to compute the dot product of the gradient of a neural network fed with two different inputs. Perhaps we can efficiently compute this dot product?
[1] Jacot, Arthur, Franck Gabriel, and Clément Hongler. "Neural tangent kernel: Convergence and generalization in neural networks." arXiv preprint arXiv:1806.07572 (2018).
[2] Domingos, Pedro M.. “Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.” ArXiv abs/2012.00152 (2020): n. pag.
Can we use neural tangent kernel [1] to get the benefits of new neural net architectures while using the budgeted kernel machines? See also the paper on the path kernel [2].
The kernel needs to compute the dot product of the gradient of a neural network fed with two different inputs. Perhaps we can efficiently compute this dot product?
[1] Jacot, Arthur, Franck Gabriel, and Clément Hongler. "Neural tangent kernel: Convergence and generalization in neural networks." arXiv preprint arXiv:1806.07572 (2018).
[2] Domingos, Pedro M.. “Every Model Learned by Gradient Descent Is Approximately a Kernel Machine.” ArXiv abs/2012.00152 (2020): n. pag.