Generalization and Scaling Laws
A comprehensive bibliography: https://www.gwern.net/notes/Scaling
Explaining Neural Scaling Laws
A Neural Scaling Law from the Dimension of the Data Manifold
Scaling Laws for Autoregressive Generative Modeling
Scaling Laws for Neural Language Models
On Power Laws in Deep Ensembles
Generalization
Multiple Descent: Design Your Own Generalization Curve
Rethinking Bias-Variance Trade-off for Generalization of Neural Networks
----
Deep Learning Scaling is Predictable, Empirically
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm
Generalization bounds for deep learning
Foundation Models
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing