Details on Plenary and Invited Sessions


MMLS 2023


May 11, 2023

Plenary Sessions

Anna C. Gilbert

Metric representations: Algorithms and Geometry

Abstract: Given a set of distances amongst points, determining what metric representation is most “consistent” with the input distances or the metric that best captures the relevant geometric features of the data is a key step in many machine learning algorithms. In this talk, we focus on 3 specific metric constrained problems, a class of optimization problems with metric constraints: metric nearness (Brickell et al. (2008)), weighted correlation clustering on general graphs (Bansal et al. (2004)), and metric learning (Bellet et al. (2013); Davis et al. (2007)).

Because of the large number of constraints in these problems, however, these and other researchers have been forced to restrict either the kinds of metrics learned or the size of the problem that can be solved. We provide an algorithm, PROJECT AND FORGET, that uses Bregman projections with cutting planes, to solve metric constrained problems with many (possibly exponentially) inequality constraints. We also prove that our algorithm converges to the global optimal solution. Additionally, we show that the optimality error decays asymptotically at an exponential rate. We show that using our method we can solve large problem instances of three types of metric constrained problems, out-performing all state of the art methods with respect to CPU times and problem sizes.

Finally, we discuss the adaptation of PROJECT AND FORGET to specific types of metric constraints, namely tree and hyperbolic metrics.

Bio: I received an S.B. degree from the University of Chicago and a Ph.D. from Princeton University, both in Mathematics. In 1997, I was a postdoctoral fellow at Yale University and AT&T Labs-Research. From 1998 to 2004, I was a member of technical staff at AT&T Labs-Research in Florham Park, NJ. From 2004 to 2020, I was with the Department of Mathematics (with a secondary appointment in Electrical and Computer Engineering) at the University of Michigan, where I was eventually the Herman H. Goldstine Collegiate Professor. In 2020, I moved to Yale University as the John C. Malone Professor of Mathematics and Professor of Statistics & Data Science. I have received several awards, including a Sloan Research Fellowship (2006), an NSF CAREER award (2006), the National Academy of Sciences Award for Initiatives in Research (2008), the Association of Computing Machinery (ACM) Douglas Engelbart Best Paper award (2008), the EURASIP Signal Processing Best Paper award (2010), and the SIAM Ralph E. Kleinman Prize (2013).

My research interests include analysis, probability, discrete mathematics, and algorithms. I am especially interested in randomized algorithms with applications to harmonic analysis, signal and image processing, and massive datasets.

Sheila McIlraith

(Formal) Languages Help AI Agents Learn and Reason

Abstract: How do we communicate with AI Agents that learn? One obvious answer is via language. Indeed, humans have evolved languages over tens of thousands of years to provide useful abstractions for understanding and interacting with each other and with the physical world. The claim advanced by some is that language influences how we think, what we perceive, how we focus our attention, and what we remember. We use language to capture and share our understanding of the world around us, to communicate high-level goals, intentions and objectives, and to support coordination with others. Importantly, language can provide us with useful and purposeful abstractions that can help us to generalize and transfer knowledge to new situations.

Language comes in many forms. In Computer Science and in the study of AI, we have historically used formal knowledge representation languages and programming languages to capture our understanding of the world and to communicate unambiguously with computers. In this talk I will discuss how formal language can help agents learn and reason with a deep dive on one particular topic – reinforcement learning. I’ll show how we can exploit the syntax and semantics of formal language and automata to aid in the specification of complex reward-worthy behavior, to improve the sample efficiency of learning, and to help agents learn what to remember. In doing so, formal language can help us address some of the challenges to reinforcement learning in the real world.

Bio: Sheila McIlraith is a Professor in the Department of Computer Science at the University of Toronto, a Canada CIFAR AI Chair (Vector Institute), and an Associate Director at the Schwartz Reisman Institute for Technology and Society. Prior to joining U of T, McIlraith spent six years as a Research Scientist at Stanford University, and one year at Xerox PARC. McIlraith’s research is in the area of AI knowledge representation and reasoning, and machine learning where she currently studies sequential decision-making, broadly construed, with a focus on human-compatible AI. McIlraith is a Fellow of the ACM and the Association for the Advancement of Artificial Intelligence (AAAI). She and co-authors have been recognized with two test-of-time awards from the International Semantic Web Conference (ISWC) in 2011, and from the International Conference on Automated Planning and Scheduling (ICAPS) in 2022.

Tara Javidi

A (Con)Sequential View of Information for Optimization

Abstract: In most communication systems, sequentially adapting transmission strategies to the (unpredictable) realization of channel output at the receiver requires an (unrealistic) assumption about the availability of a reliable “feedback” channel. This unfortunate fact, combined by the historical linkage between teaching information theory and digital communication curriculum has kept “feedback information theory” which provides a sequential view of information less taught, discussed, appreciated and understood compared to other topics in our field.

This talk, in contrast, catalogs important and challenging problems in machine learning, optimization, and statistics where the problem of acquiring information in a sequential manner arises very naturally. Thus, I will argue that an increased emphasis on (teaching) feedback information theory can provide vast and exciting research opportunities at the intersection of information theory and these fields. Drawing on my own research, I will highlight the successful application of this sequential view information to black-box optimization also known as Gaussian Process bandits. In particular, we provide the first instance-dependent regret analysis of GP bandits querying a noisy zeroth-order-oracle to efficiently learn about the optimizer of an unknown function. More specifically, for the first time, we derive a fundamental complexity measure associated with every problem instance and propose the first minimax near-optimal algorithm that also adapts to easier problem instances.

Bio: Prof. Javidi received her BS in electrical engineering at Sharif University of Technology, Tehran, Iran. She received her MS degrees in electrical engineering (systems) and in applied mathematics (stochastic analysis) from the University of Michigan, Ann Arbor. She received her Ph.D. in electrical engineering and computer science from the University of Michigan, Ann Arbor, in 2002. From 2002 to 2004, Prof. Javidi was an assistant professor at the Electrical Engineering Department, University of Washington, Seattle. In 2005, she joined the University of California, San Diego, where she is currently a Jacobs Family Scholar, Halicioglu Data Science Fellow, and Professor of Electrical and Computer Engineering. Prof. Javidi’s research interests are in theory of active learning, information acquisition and statistical inference, information theory with feedback, stochastic control theory, and wireless communications and communication networks.

Noah A. Smith

Mind the Data

Abstract: Today’s mainstream NLP research focuses on general-purpose models that are scaled up to work with extremely large datasets. This direction has had many benefits, evidenced by performance on research benchmarks and by new use cases for AI in general, and language models specifically, imagined by an ever wider community of stakeholders. What I believe is coming next is a strong demand for customization. More people than ever will want to adapt language models to create new applications. To enable them, I believe we need new affordances for working with the most important ingredient for NLP systems: the data. In this talk, I’ll present recent work from my group showing benefits and risks of new methods for data selection, organization, and synthesis. I’ll advocate for a future in which artifacts like language models are developed to support adaptation to unexpected and diverging demands of a wide population of users, who in turn should be empowered to direct models to serve their own interests.

Bio: Noah Smith is a computer scientist working at the junction of natural language processing (NLP), machine learning (ML), and computational social science. His research spans core problems in NLP, general-purpose ML methods for NLP, methodology in NLP, and a wide range of applications. As of 2022, he has graduated 25 Ph.D. students and mentored 14 postdocs; 11 of his undergraduate mentees have gone on to Ph.D. programs.

He is Amazon Professor of Machine Learning in the Paul G. Allen School of Computer Science & Engineering at the University of Washington (also Adjunct in Linguistics, Affiliate of the Center for Statistics and the Social Sciences, and Senior Data Science Fellow at the eScience Institute), as well as Senior Director of NLP Research at the Allen Institute for Artificial Intelligence. Previously, he was an Associate Professor of Language Technologies and Machine Learning in the School of Computer Science at Carnegie Mellon University. He earned his Ph.D. in Computer Science from Johns Hopkins University and his B.S. in Computer Science and B.A. in Linguistics from the University of Maryland.

Invited Sessions

Session 1

Michał Dereziński

Title: Stochastic Variance-Reduced Newton: Accelerating Finite-Sum Minimization with Large Batches

Abstract: Stochastic variance reduction has proven effective at accelerating first-order algorithms for solving convex finite-sum optimization tasks such as empirical risk minimization. Incorporating second-order information has proven helpful in further improving the performance of these first-order methods. Yet, comparatively little is known about the benefits of using variance reduction to accelerate popular stochastic second-order methods such as Subsampled Newton. To address this, we propose Stochastic Variance-Reduced Newton (SVRN), a finite-sum minimization algorithm that provably accelerates existing stochastic Newton methods from \(O(\log(1/\epsilon))\) to \(O(\log(1/\epsilon)/\log(n))\) passes over the data (here, \(n\) is the number of sum components). Crucially, SVRN retains the key advantages of Newton-type methods, such as easily parallelizable large-batch operations and a simple unit step size. We use SVRN to accelerate Subsampled Newton and Iterative Hessian Sketch algorithms, and show that it compares favorably to popular first-order methods with variance reduction.

Bryon Aragam

Title: Challenges in causal representation learning and deep generative models

Abstract: Interpretability and causality have been acknowledged as key ingredients to the success and evolution of modern machine learning systems. A crucial step in this pipeline is to identify latent representations from observational data along with their causal structure. In many applications, the causal variables are not directly observed, and must be learned from data. Under parametric assumptions, this resembles familiar factor models, however, in modern applications we aim to use flexible, nonparametric models such as deep neural networks. These settings present new statistical and computational challenges that will be the focus of this talk. We will discuss our recent work on developing methods for identifying and learning causal representations from data with rigorous guarantees. Along the way, we will explore the connections between causal graphical models, deep generative models, and nonparametric mixture models, and how these connections lead to a useful new theory for causal representation learning.

Session 2

Xueru Zhang

Title: Strategic classification with random manipulation outcomes

Abstract: Individuals subject to algorithmic decisions often adapt their behaviors strategically to the decision rule to receive favorable outcomes. Strategic classification tackles this problem by explicitly considering the strategic behavior of human agents during learning, where agents can manipulate their features at costs and the decision-maker aims to learn a classifier robust against such manipulations. However, existing formulations in strategic classification implicitly assume the agent knows the manipulation outcome precisely, and the manipulation cost is modeled as a distance function of features before and after manipulation. We observe that in many real applications, agents’ manipulation outcomes are indeed unforeseeable, and they can only best respond based on their probabilistic knowledge but not the realized feature values.

In this talk, we study strategic classification under unforeseeable outcomes. We first introduce a novel Stackelberg game to model the interplay between the decision-maker and agents with unforeseeable manipulation outcomes. We analytically characterize the equilibrium strategies and examine how they are affected when the decision-maker lacks the ability to anticipate manipulation. Last, we consider settings where agents come from multiple social groups and explore how fairness interventions such as demographic parity and equal opportunity can serve as (dis)incentives for strategic manipulation.

Kartik Goyal

Title: Controllable generative models for natural language and related phenomena

Abstract: The large language models today have demonstrated impressive capabilities to produce natural sounding text. My research focuses on generative models of the future which additionally provide fine-grained knobs to the users for both guiding the output of these models, and also analyzing existing human artifacts with these models. In this talk, I will present a paradigm of generative models that captures the global properties of data in a holistic manner and readily provides the ability to control and analyze text and related data along desired attributes. Specifically, I will describe two instantiations of this paradigm: energy-based models, and structured probabilistic generative models for analyzing and generating text and paratextual elements in natural language documents.

Fei Xue

Title: Deep Generative Modeling for Nonignorable Missing Data with Nonparametric Identifiability

Abstract: Missing data is a pervasive problem in data analyses. Most existing imputation models tend to assume the missing mechanism to be ignorable, meaning that the data are missing at random or missing completely at random. While there are some recent advances in deep generative models that explicitly consider the missing process in the presence of missing-not-at-random data, their nonparametric identifiability has not been addressed properly. In this prelim, we adopt a latent variable model with a corruption process to generate missing data. The unknown parameters are estimated by deriving a tractable evidence lower bound (ELBO) for variational autoencoders (VAEs). Under the assumption of no self-censoring with latent variables, we first establish the nonparametric identifiability of the full data distribution. We then establish that our estimation process can recover the ground-truth data distribution under some regularity conditions. Experiments were conducted to demonstrate the advantages of our model.

Session 3

Lu Cheng

Title: Algorithmic Fairness in an Uncertain World

Abstract: Significant progress in the field of fair machine learning (ML) has been made to counteract algorithmic discrimination against marginalized groups. However, fairness remains an active research area that is far from settled. One key bottleneck is the implicit assumption that environments where ML is developed and deployed are certain and reliable. In a world that is characterized by volatility, uncertainty, complexity, and ambiguity, whether what has been developed in algorithmic fairness can still serve its purpose is far from obvious. In this talk, I will discuss how to improve algorithmic fairness under two kinds of predictive uncertainties, i.e., aleatoric uncertainty (i.e., randomness and ambiguity in the data) and epistemic uncertainty (i.e., a lack of data or knowledge), respectively. The former regards historical bias reflected in the data and the latter corresponds to the bias perpetuated or amplified during model training due to lack of data or knowledge. In particular, the first work studies pushing fairness-utility trade-off through aleatoric uncertainty, and the second work investigates fair few-shot learning.

Feng Ruan

Title: The automatic sparsity of kernel feature selection

Abstract: In this talk, we describe a new sparsity-inducing mechanism based on minimization a family of kernels, terminologically called kernel-based feature selection. Perhaps surprisingly, the sparsity of the solution of the minimization objective, which happens in finite samples, can be achieved without using any of the well-known sparsification techniques such as \(\ell_1\) penalization, early stopping or post-processing. We rigorously pin down the sufficient conditions on the underlying i.i.d. data distribution for which this phenomenon of automatic sparsity could happen in finite samples. As an application of our findings, we show how to use this new sparsity-inducing mechanism to build algorithms consistent for feature selection in nonparametric settings.

This is based on joint work with Michael I. Jordan, and Keli Liu.

Session 4

Qiaomin Xie

Title: Best of three worlds? Bias and extrapolation in constant-stepsize stochastic approximation

Abstract: Stochastic approximation is a class of iterative procedures that include, among others, stochastic gradient descent and temporal difference (TD) learning for reinforcement learning. In this talk, we consider linear stochastic approximation (LSA) with a constant stepsize and Markovian noise. We show that the iterate has a limit distribution and converges to it geometrically in Wasserstein distance. Furthermore, we provide a precise characterization of the asymptotic bias of the limit: it admits a Taylor-like, infinite series expansion with respect to the stepsize. Moreover, the bias is proportional to the mixing time of the Markovian noise and hence vanishes when the noise is in fact i.i.d.

The results above suggest the following algorithmic paradigm for LSA, which combines three ingredients: (1) use a constant stepsize for fast convergence of the optimization error, (2) use Polyak-Ruppert averaging to reduce the variance, and (3) use Richardson-Romberg extrapolation to correct the bias. In particular, our infinite series expansion implies that extrapolation with \(m\) stepsizes eliminates the \(m-1\) leading terms in the bias expansion, resulting in an exponentially smaller bias.

Ju Sun

Title: Deep Image Prior (and Its Cousin) for Inverse Problems: the Untold Stories

Abstract: Deep image prior (DIP) parametrizes visual objects as outputs of deep neural networks (DNNs); its cousin, neural implicit representation (NIR), directly parametrizes visual objects as DNNs. These stunningly simple ideas, when integrated into natural optimization formulations for visual inverse problems, have matched or even beaten the state-of-the-art methods on numerous visual reconstruction tasks, although not driven by massive amounts of training data. Despite the remarkable successes, the overparametrized DNNs used are typically powerful enough to also fit the noise besides the desired visual contents (i.e., overfitting), and the fitting process can take up to tens of minutes on sophisticated GPU cards to converge to a reasonable solution. In this talk, I’ll describe our recent efforts to combat these practicality issues around DIP and NIR, and how careful calibration of DIP models (or variants) can help to solve challenging visual inverse problems, such as blind image deblurring and phase retrieval, in unprecedented regimes.

Shenlong Wang

Title: Modeling, Editing and Generating Urban Digital Twins

Abstract: My long-term research goal is to let computers build highly realistic and easily authored 3D digital twins of the physical world and produce new, photorealistic, and physics-informed visual content – which we often refer to as imagination. This talk will summarize our group’s two recent directions toward this goal.

I will begin by discussing generating realistic movies of physical phenomena in a real-world scene captured by a video – imagine what the local children’s park will look like if a flood or a heavy snowstorm hits our neighborhood. Generative models struggle in this setting because of inconsistency and lack of physics feasibility. We believe the key to this question is to integrate physics-informed simulation and generative modeling with neural scene representations. I will demonstrate how to integrate the two techniques and simulate coherent, realistic, and physically plausible extreme climate effects for a visual scene.

Then, I will focus on designing generative models that can create a large, realistic, and consistent 3D world. The task is of paramount interest to many applications of vision, graphics, geography, and robotics. Nevertheless, achieving this goal is challenging and goes beyond the capacities of existing 3D generation methods, which fail to scale up to create large scenes. I’ll describe some of our recent efforts in 3D scene generation, including work on learning to generate a realistic LiDAR point cloud of urban driving scenes and our recent work on perpetual 3D world generation.

Finally, I will give a brief personal outlook on open research topics towards photorealistic 3D modeling and visual content creation.