McGill.CA / Science / Department of Physics

Physical Society Colloquium

High-dimensional Optimization in Machine Learning with Applications to Scaling Limits and Compute-Optimal Neural Scaling Laws

Courtney Paquette

Department of Mathematics and Statistics
McGill University

Given the massive scale of modern ML models, we now only get a single shot to train them effectively. This restricts our ability to test multiple architectures and hyper-parameter configurations. Instead, we need to understand how these models scale, allowing us to experiment with smaller problems and then apply those insights to larger-scale models. In this talk, I will present a framework for analyzing scaling laws in stochastic learning algorithms using a power-law random features model, leveraging high-dimensional probability and random matrix theory. I will then use this scaling law to address the compute-optimal question: How should we choose model size and hyper-parameters to achieve the best possible performance in the most compute-efficient manner? Additionally, I will introduce a scaling limit commonly seen in ML optimization algorithms which has origins in statistical physics  and I will highlight several promising research directions in scaling laws that remain underexplored but offer significant potential.

Friday, March 28th, 2025, 15:30
Ernest Rutherford Physics Building, Keys Auditorium (room 112)