About
Alexander “Sasha” Rakhlin works at the intersection of statistics, machine learning, and optimization. A major thrust of his research is in developing theoretical and algorithmic tools for online prediction, a method of machine learning that processes information provided in a sequential fashion. Rakhlin has uncovered critical connections between online prediction, optimization, and probability; these insights led to a fundamental theoretical understanding of the field as well as new efficient and accurate prediction methods. His contributions also include advances in statistical inference in structured problems, as well as new tools for optimal model selection. In newer lines of inquiry, Rakhlin developed a detailed analysis of statistical complexity of neural networks, offering insight into recent advances in deep learning.
Rakhlin received a BA in computer science and a BA in mathematics from Cornell University in 2000 and a PhD from MIT in 2006 under the direction of Tomaso Poggio. Following an appointment as a postdoc at the University of California at Berkeley, Rakhlin joined the faculty as an assistant professor at the University of Pennsylvania in 2009 with an appointment in the Department of Statistics at the Wharton School, and was promoted to associate professor with tenure in 2015. After spending a year as a visiting professor in the MIT Statistics and Data Science Center, Rakhlin joins the Department of Brain and Cognitive Sciences as an associate professor with tenure as well as the Institute for Data, Systems, and Society as a core faculty member.
Research
My research is at the interface of Machine Learning and Statistics. I am interested in formalizing the process of learning, in analyzing the learning models, and in deriving and implementing the emerging learning methods. A significant thrust of my research is in developing theoretical and algorithmic tools for online prediction, a learning framework where data arrives in a sequential fashion. My recent interests include understanding neural networks and, more generally, interpolation methods.
A high level description of a few research areas:
- Statistical Learning: We study the problem of building a good predictor based on an i.i.d. sample. While much is understood in this classical setting, our current focus is on the Deep Learning models. In particular, we study various measures of complexity of neural networks that govern their out-of-sample performance. We aim to understand the "geometry" (in an appropriate sense) of neural networks and its relation to the prediction ability of trained models. Our recent focus is on statistical and computational aspects of interpolation methods, as well as understanding the bias-variance tradeoff in over-parametrized models.
- Non-Convex Landscapes: Here we are interested in understanding properties of high-dimensional empirical landscapes that arise when one attempts to fit a model with many parameters (such as a multi-layer neural network or a latent variable model) to data. Some of the questions that arise are: (a) What is the behavior of optimization methods on such landscapes? (b) What salient features of the landscape arise from its random nature? (c) How can one exploit randomness in the optimization method to analyze its convergence?
- High-Dimensional Statistics: This setting is centered around the problem of recovery of high-dimensional and structured signals hidden in noise. Since standard statistical methods are often computationally intractable, the question of interplay between computation and statistical optimality arises. Examples: estimation of communities in networks, recovery of few relevant genes in a large set of gene expression data, etc. We are also interested in understanding optimality of maximum likelihood methods in such rich models.
- Online Learning: We aim to develop robust prediction methods that do not rely on the i.i.d. or stationary nature of data. In contrast to the well-studied setting of Statistical Learning, methods that predict in an online fashion are arguably more complex and nontrivial. Major questions that arise in this setting are: (a) How to model the problem at hand? (b) How many examples are required to achieve certain level of performance, and what are the computationally-efficient methods? (c) How to deal with incomplete feedback and the exploration-exploitation dilemma? Examples: sequentially predicting users' preferences, classifying nodes in a social network, sequentially selecting medical treatment strategies while observing limited feedback about the past decisions, etc.
Teaching
Fall semester: 9.520: Statistical Learning Theory and Applications
Spring semester: 9.521/IDS 160: Mathematical Statistics: A Non-Asymptotic Approach
Publications
Recent publications:
* Consistency of Interpolation with Laplace Kernels is a High-Dimensional Phenomenon (with X. Zhai). COLT 2019.
* Just Interpolate: Kernel "Ridgeless" Regression Can Generalize (with T. Liang). The Annals of Statistics, to appear.
* Online Learning: Sufficient Statistics and the Burkholder Method (with D. Foster and K. Sridharan). COLT, 2018
* Optimality of Maximum Likelihood for Log-Concave Density Estimation and Bounded Convex Regression (with G. Kur and Y. Dagan). In submission.