The Midwest ML Symposium aims to convene regional machine learning researchers for stimulating discussions and debates, to foster cross-institutional collaboration, and to showcase the collective talent of ML researchers at all career stages. [past events]
Where: Graduate Hotels @ Minneapolis
[Google Map]
Parking and public transportation:
Accommodation: Limited free housing is provided for student participants only, on a first-come (i.e., request)-first-serve basis.
The Midwest ML Symposium offers various opportunities of exposure. In addition to the satisfaction of supporting the regional Machine Learning community, you will be gratefully recognized in various media and materials and have the possibility to more closely engage with the participants.
Contact Information: Sponsors are encouraged to contact the Midwest ML Symposium organizing committee. To discuss special requirements and to ask general questions regarding sponsorship of the Symposium, please contact us by email at: midwest.ml.2024@gmail.com
Ju Sun (Co-chair, UMN), Mingyi Hong (Co-chair, UMN), Qu Qing (UMich), Qiaomin Xie (UW Madison), Soumik Sarkar (ISU), Elena Zheleva (UIC), Jia Liu (OSU), Zhaoran Wang (Northwestern), Gesualdo Scutari (Purdue), Jinrui He (UIUC), Bo Li (UChicago), Sijia Liu (MSU), Xia Ning (OSU), Gaoxiang Luo (Web chair, UMN).
Ju Sun (CS&E, co-chair), Mingyi Hong (ECE, co-chair), Jie Ding (Stats), Yulong Lu (Math), Saad Bedros (MnDRIVE, MnRI, CSE).
Rob Nowak (Chair, UW Madison), Maxim Raginsky (UIUC), Laura Balzano (UMich), Mikhail Belkin (UCSD, formerly OSU), Avrim Blum (TTIC), Rebecca Willett (UChicago), Nati Srebro (TTIC), Po-Ling Loh (Cambridge, formerly UW Madison), Matus Telgarsky (UIUC), Mike Franklin (UChicago).
Amazon & University of Minnesota
University of California San Diego
Carnegie Mellon University
University of Illinois Urbana-Champaign
Michigan State University
Iowa State University
Iowa State University
University of Michigan
University of Michigan
University of Minnesota
Northwestern University
University of Minnesota
University of Wisconsin
University of Wisconsin
Ohio State University
Ohio State University
Purdue University
Purdue University
University of Illinois Chicago
Michigan State University
Northwestern University
University of Illinois Urbana-Champaign
University of Illinois Urbana-Champaign
North Dakota State University
University of Chicago
University of Chicago
Kitware
US Bank
Medtronic
Abstract: Large Language Models (LLMs) may bring unprecedent power for scientific discovery. However, current LLMs may still encounter major challenges for effective scientific exploration due to their lack of in-depth, theme-focused data and knowledge. Retrieval augmented generation (RAG) has recently become an interesting approach for augmenting LLMs with grounded, theme-specific datasets. We discuss the challenges of RAG and propose a retrieval and structuring (RAS) approach, which enhances RAG by improving retrieval quality and mining structures (e.g., extracting entities and relations and building knowledge graphs) to ensure its effective integration of theme-specific data with LLM. We show the promise of retrieval and structuring approach at augmenting LLMs and discuss its potential power for future LLM-enabled science exploration.
Abstract: Exponential tilting is a technique commonly used in fields such as statistics, probability, information theory, and optimization to create parametric distribution shifts. In this talk, I introduce its usage in machine learning by exploring the use of tilting in risk minimization. The tilted empirical risk minimization (TERM) framework is a simple extension of ERM, which uses exponential tilting to flexibly tune the impact of individual losses. I describe several useful theoretical properties of TERM including its connections with other non-ERM objectives, and a multitude of applications regarding fairness and robustness. I will conclude the talk by discussing recent advances on leveraging the tilted risk framework to improve non-convex optimization and the statistical properties of TERM.
Abstract: The transformer architecture has catalyzed revolutionary advances in language modeling. However, recent convolution-based recipes, such as state-space models and Mamba, have become competitive with transformers. Motivated by this, we examine the shortcomings of purely-attention or purely-convolutional designs and how augmenting attention with convolution can provably overcome them. We first describe a diverse suite of associative recall (AR) and in-context learning tasks which aim to assess the model's capability to search the context window and retrieve relevant information to the query. We show that equipping a transformer with "short convolutions" empowers the model to solve AR tasks with length generalization and without positional encoding. Secondly, we show that, "long convolutions" provide a mechanism to effectively summarize the long context window into few summary tokens. Through this, we describe a fundamental tradeoff between the required amount of attention vs convolution: Specifically, the model can solve AR tasks by only attending these summary tokens rather than all of the context window, thereby facilitating computational efficiency. Finally, we describe MambaFormer, a hybrid Attention+Mamba model, and demonstrate its best-of-both-world performance for in-context learning. In summary, our findings reveal the fundamental benefits of augmenting transformers with convolution and advocate the use of hybrid architectures for next-generation LLMs and foundation models.
Abstract: Motivated by an emergency response application, we consider smooth stochastic optimization problems over probability measures supported on compact subsets of the Euclidean space. With the \emph{influence function} as the first variational object, we construct a deterministic Frank-Wolfe (dFW) recursion for probability spaces. As in Euclidean spaces, dFW is made possible by a key lemma that expresses the solution to the infinite-dimensional Frank-Wolfe sub-problem as the solution to a finite-dimensional optimization problem. This in turn allows each iterate of the solution sequence to be expressed as a convex combination of the incumbent iterate and a Dirac measure concentrating on the minimum of the influence function at the incumbent iterate. To address common application contexts that have access only to Monte Carlo observations of the objective and influence function, we construct a stochastic Frank-Wolfe (sFW) variation that generates a random sequence of probability measures constructed using minima of increasingly accurate estimates of the influence function. We demonstrate that sFW's optimality gap sequence exhibits $O(1/k)$ complexity almost surely and in expectation for smooth convex objectives, and $O(1/\sqrt{k})$ (in Frank-Wolfe gap) for smooth non-convex objectives, where $k$ is the iteration count. Furthermore, we show that an easy-to-implement fixed-step, fixed-sample version of (sFW) exhibits exponential convergence to $\varepsilon$-optimality. We end with a central limit theorem on the observed objective values at the sequence of generated random measures. To further intuition, we include several illustrative examples with exact influence function calculations.
Abstract: Recent strides in artificial intelligence (AI) have set the stage for groundbreaking innovations. However, the journey towards fully harnessing AI's potential, particularly in the context of AI+X across numerous fields, has been filled with challenges. Central among these are the integration of heterogeneous data modalities, the analysis of data characterized by complex spatiotemporal structures, the adept handling of missing values, and the incorporation of domain-specific knowledge. In this talk, I will overview the scientific areas under my investigation, highlighting how each is influenced by these critical challenges. I will introduce the development of methodologies and theories aimed at enhancing knowledge integration to support multimodal learning, manage noisy datasets effectively, and facilitate the integration of domain expertise.
Abstract: The development of a microstructure generation framework tailored to user-specific needs is crucial for understanding materials behavior through distinct processing-structure-property relationships. Recent advancements in generative modeling, particularly with Latent Diffusion Models (LDM), have significantly enhanced our ability to create high-quality images that fulfill specific user requirements. In this talk, we present a scalable framework that employs LDM to sample 3D microstructures (128x128x64) with over a million voxels customized to user specifications. This framework can also predict manufacturing conditions that facilitate the synthesis of sampled microstructures experimentally. Our work focuses on organic photovoltaics (OPV), but the architecture allows for potential extensions into other fields of materials science by adjusting the training dataset.
Abstract: The emergence of high-throughput quantum chemical calculations has accelerated the rate at which we can predict new materials for various applications (batteries, solar cells, catalysts, etc.), but the successful synthesis of these materials has often become the slow step in materials design. Autonomous laboratories hold the potential to systematically explore various synthesis routes to new materials, alleviating the painstaking manual trial-and-error approach. However, for an autonomous laboratory to work for inorganic synthesis, we need methods to initialize, interpret, and optimize synthesis recipes without any human intervention. This talk will focus on the application of machine learning to the initialization (recommending precursors) and interpretation (identifying phases from X-ray diffraction) steps.
Abstract: Due to the popularity of machine learning, many organizations view data as an invaluable resource, likening it to the "new oil/gold". However, unlike many types of resources, data is nonrivalrous: it can be freely replicated and used by many. Hence, data produced by one organization, can, in principle, generate limitless value to many others. This will accelerate economic, social, and scientific breakthroughs and benefit society at large. However, considerations of free-riding and competition may prevent such open sharing of data between organizations. An organization may be wary that others may not be contributing a sufficient amount of data, or contributing fabricated/poisoned datasets. Organizations may also wish to monetize the data they have for profit. In some recent work, we leverage ideas from game theory, market design, and robust statistics to design protocols for data sharing. Our methods incentivize organizations to collect and truthfully contribute large amounts of data, so that socially optimal outcomes can be achieved.
In this talk, I will present a high level view of some of our recent approaches to solving these challenges and focus on a mean estimation problem. Here, a set of strategic agents collect i.i.d samples from a high dimensional distribution at a cost, and wish to estimate the mean of this distribution. To facilitate collaboration, we design mechanisms that incentivize agents to collect a sufficient amount of data and share it truthfully, so that they are all better off than working alone. Our approach prevents under-collection and data fabrication via two key techniques: first, when sharing the others’ data with an agent, the mechanism corrupts this dataset proportional to how much the data reported by the agent differs from the others; second, we design minimax optimal estimators for the corrupted dataset. Our mechanism, which is Nash incentive compatible and individually rational, achieves a social penalty (sum of all agents’ estimation errors and data collection costs) that is close to the global minimum.
Abstract: Given demonstrations of sequential decision making, imitation learning seeks a policy that performs competitively with the demonstrator (when evaluated on the demonstrator's unknown reward function). Prevalent imitation learning methods assume that demonstrations are (near-)optimal. For example, inverse reinforcement learning estimates a reward function that best rationalizes demonstrations, and then imitates using a policy that optimizes the estimated reward function. As imitators become more capable than demonstrators, the (near-)optimality assumption does not hold and these methods can lead to value misalignment. This talk presents subdominance minimization as an alternative imitation learning objective for robustly aligning the imitator with the demonstrator's reward function, even under differences in demonstrator-imitator capabilities.
Abstract: The use of pretrained models forms the major paradigm change in machine learning workflows this decade, including for decision making. These powerful and typically massive models have the promise to be used as a base for diverse applications. Unfortunately, it turns out that adapting these models for downstream tasks tends to be difficult and expensive, often requiring collecting and labeling additional data to further train or fine-tune the model. In this talk, I will describe my group's work on addressing this challenge via efficient adaptation. First, when adapting vision-language models to make robust predictions, we show how to self-guide the adaptation process, without any additional data. Second, we show how to integrate relational structures like knowledge graphs into model prediction pipelines, enabling models to adapt to new domains unseen during training, without additional annotated examples. Lastly, in the most challenging scenarios, when the model must be fine-tuned on labeled data, we show how to obtain this data efficiently through techniques called weak supervision.
Abstract: Diffusion models, particularly score-based generative models (SGMs), have emerged as powerful tools in diverse machine learning applications, spanning from computer vision to modern language processing. In this talk, I will discuss about the generalization theory of SGMs for learning high-dimensional distributions. Our analysis show that SGMs achieve a dimension-free generation error bound when applied to a class of sub-Gaussian distributions characterized by certain low-complexity structures.
Abstract: Zero-shot Natural Language-Video Localization (NLVL) has emerged as a promising approach by training NLVL models exclusively on raw video data. Most methods employ dynamic video proposal and pseudo-query annotation generation modules to extract video segments and their corresponding text queries. However, a common challenge encountered is effectively grounding the generated textual annotations within the source video context. This talk will explore the importance of commonsense as a cross-modal grounding mechanism for zero-shot NLVL and highlight possible future directions in cross-modal understanding.
Abstract: Recent research indicates that large language models lack consistent reliability in tasks requiring complex reasoning. While they may impress us with fluently written articles prompted by user input, they can easily disappoint us by displaying shortcomings in basic reasoning skills, such as understanding that "left" is the opposite of "right." To address real-world problems, computational models often need to involve multiple interdependent learners, along with significant levels of composition and reasoning based on additional knowledge beyond available data. In this talk, I will discuss our findings and novel models for compositional reasoning over complex linguistic structures. I will highlight our efforts in neuro-symbolic modeling to integrate explicit symbolic knowledge and enhance the compositional generalization of neural learning models. Additionally, I will introduce DomiKnowS, our library that facilitates neuro-symbolic modeling. DomiKnowS framework exploits both symbolic and sub-symbolic representations to solve complex, AI-complete problems. It seamlessly integrates domain knowledge in the form of logical constraints in deep models through various underlying algorithms.
Abstract: Recent breakthroughs in foundation models have unlocked exciting possibilities for embodied AI agents that can perceive and interact with the physical world. However, despite their impressive performance on various benchmarks, these models perceive images as bags of words. In detail, they use object understanding as a shortcut but lacks ability to do abstraction and reasoning, such as solving a maze. To acquire knowledge about the physical world, we initially categorize it based on its low-level physical and geometric visual features (from semantic to geometric) and its long horizon (from short/fast thinking to long/slow thinking). My research aims to bring this knowledge view to the multimodal world. Such a transformation poses significant challenges: (1) abstracting multimodal low-level geometric structures by introducing and training a low-level abstract layer that serves as a mental model; (2) enabling long-horizon reasoning by inducing complex patterns. Subsequently, we will examine the reason of hallucinations and explore potential methods for ensuring factuality through knowledge-driven approaches, with applications such as meeting summarization, timeline generation, and question answering. I will then lay out how I plan to promote factuality and truthfulness in multimodal information access, through a structured knowledge abstraction that is easily explainable, highly compositional, and capable of long-horizon reasoning.
Abstract: Many sensors produce data that rarely, if ever, is viewed by a human, and yet sensors are often designed to maximize subjective image quality. For sensors whose data is intended for embedded exploitation, maximizing the subjective image quality to a human will generally decrease the performance of downstream exploitation. In recent years, computational imaging researchers have developed end-to-end learning methods that co-optimize the sensing hardware with downstream exploitation via end-to-end machine learning. This talk will describe two such approaches at Kitware. In the first, we use an end-to-end ML approach to design a multispectral sensor that’s optimized for scene segmentation and, in the second, we optimize post-capture super-resolution in order to improve the performance of airplane detection in overhead imagery.
Abstract: Real-world deployments require AI systems that continually interact with their environment making decisions about what data to collect and what actions to take to continually improve their performance. A key component of this decision making environment are humans. In this talk, I will discuss two settings where we leverage human factors in sequential decision making algorithms, specifically bandit optimization algorithms. The first is where human judgement is available in the form of preferences that can be queried in an interactive fashion, in addition to direct rewards. The second is where we include memory effects so the reward depends on the number of times an action has been recommended to a user. We demonstrate how leveraging these human factors can not only align AI goals with human expectations, but also sometimes simplify the problem setting to enable sample efficient decision making.
Abstract: Generative AI has emerged as the new wave following discriminative AI, as exemplified by various powerful generative models including visual diffusion models and large language models (LLMs). While these models excel at generating images, text, and videos, mere creation is not the ultimate goal. A grand objective lies in understanding and making decisions in the world through the generation process. In this talk, I discuss our efforts towards bridging generative and discriminative learning, empowering autonomous agents to perceive, interact, and act in the open world.
I begin by elaborating on how we advance generative modeling to be geometry-aware, physics-informed, and multi-modal in the 4D world. Next, I delve into several representative strategies that exploit generative models to improve comprehension of the 4D world. These strategies include repurposing latent representations within generative models, treating them as data engines, and more broadly, formulating generative models, especially LLMs, as agents for problem-solving and decision-making. Finally, I explore how to synergize knowledge from different generative models in the context of modeling human-object interaction. Throughout the talk, I demonstrate the potential of generative AI in scaling up open-world, in-the-wild perception across application domains such as transportation, robotics, and agriculture.
Abstract: Deep generative AI, e.g., diffusion models, achieves state-of-the-art performance in various high-dimensional data modeling tasks. Such empirical successes have been challenging conventional wisdom. In this talk, we will focus on diffusion models to explore their methodology and theory. We will first understand how diffusion models efficiently model complex high-dimensional data, especially when there are low-dimensional structures in them. Then, we leverage our understanding of diffusion models to motivate a next-generation optimization method, termed “generative optimization”. Specifically, we utilize diffusion models as a data-driven solution generator to an unknown objective function. We propose a learning-labeling-generating algorithm incorporating the targeted function value as guidance to the diffusion model. Theoretically, we show that in the offline setting, the generated solutions yield large function values on average. Meanwhile, the generated solutions closely respect the data intrinsic structures in the training set. Empirically, we demonstrate a good synergy of generative optimization with reinforcement learning.
Abstract: Multi-channel imaging data is a prevalent data format in astronomy and biology. The structured information and the high dimensionality of these 3-D tensor data make the analysis an intriguing but challenging topic for statisticians and practitioners. In previous works, the low-rank scalar-on-tensor regression model has been re-formulated as a tensor Gaussian Process (Tensor-GP) model with a multi-linear kernel. We extend the Tensor-GP model by integrating a linear but interpretable dimensionality reduction technique, called tensor contraction, with a Tensor-GP for a scalar-on-tensor regression task. We first estimate a latent, reduced-size tensor for each data tensor and then apply a multi-linear Tensor-GP on the latent tensor data for prediction. We introduce an anisotropic total-variation regularization when conducting the tensor contraction to obtain a sparse and smooth latent tensor. We then propose an alternating proximal gradient descent algorithm for estimation. We validate and apply our approach via extensive simulation studies and to the solar flare forecasting problem.
Abstract: Machine unlearning, a novel paradigm designed to remove data from models without requiring complete retraining, has recently drawn significant attention for its potential to enhance user privacy. Despite its increasing application, the majority of research has concentrated on enhancing its effectiveness and efficiency, while largely overlooking the security risks it introduces. This gap in research is critical, as there exists the potential for malicious users, who may have contributed to the training data, to exploit these vulnerabilities. They could conduct attacks by submitting deceptive unlearning requests, aiming to manipulate the behavior of the unlearned model. In this talk, I will introduce our recent study which investigates these potential malicious attacks facilitated by machine unlearning.
Abstract: The use of machine learning models in high-stake applications (e.g., healthcare, lending, college admission) has raised growing concerns due to potential biases against protected social groups. Various fairness notions and methods have been proposed to mitigate such biases. This talk focuses on Counterfactual Fairness (CF), a fairness notion that relies on an underlying causal graph and requires the outcome an individual perceives in the real world to be the same as it would be in a "counterfactual" world, in which the individual belongs to another social group.
In this talk, I will present a novel method for generating counterfactually fair representations. I will show, both theoretically and empirically, that machine learning models trained on these representations can achieve perfect counterfactual fairness. Our proposed method improves the fairness-accuracy trade-off compared to existing methods, making it a promising solution for training counterfactually fair AI models.
Abstract: Causal knowledge is central to solving complex decision-making problems in many fields from engineering to medicine. Causal inference has also recently been identified as a key capability to remedy some of the issues modern machine learning systems suffer from, from explainability and fairness to generalization. In this talk, we first provide a short introduction to probabilistic causal inference. Next, we discuss some of the recent developments from the CausalML Lab. Specifically, we will discuss how deep learning can be used for answering causal questions in the high-dimensional setting with applications in machine learning.
Abstract: During just a few years, Graph Neural Networks (GNNs) have emerged as the prominent supervised learning approach that brings the power of deep representation learning to graph and relational data. An ever-growing body of research has shown that GNNs achieve state-of-the-art performance for problems such as link prediction, fraud detection, target-ligand binding activity prediction, knowledge-graph completion, and product recommendations. As a result, GNNs are quickly moving from the realm of academic research involving small graphs to powering commercial applications and very large graphs. This talk will provide an overview of some of the research that AWS AI has been doing to facilitate this transition.
Abstract: As machine learning (ML) is increasingly used in social domains to make consequential decisions about humans, it often has the power to reshape individual data and population distributions. Humans, as strategic agents, continuously adapt their behaviors in response to the learning system. As populations change dynamically, ML system also needs frequent updates to ensure high performance on targeted populations. However, acquiring high-quality human-annotated samples can be highly challenging and even infeasible in social domains. A common practice to address this issue is to use ML model itself to annotate unlabeled data samples. Yet, it remains unclear what happens when ML models are retrained with such model-annotated samples, especially when they incorporate human strategic responses. In this talk, I will discuss the societal impacts of this practice. I will first highlight potential risks of retraining ML models using model-annotated samples collected from strategic human agents, and then introduce the mitigation solutions.
Abstract: With the recent focus on generative artificial intelligence (GAI) and significant interest in neural network-based AI techniques prior to this, many other techniques have recently received less focus. Pronouncements from AI industry luminaries suggest, though, that GAI - at least in its current form - may be nearing the end of its current 'branch' on the research 'tree'. Because of this, this presentation discusses how hybrids of non-neural network techniques and hybrids with neural network techniques may be the key to further advancing AI and its successful application to many areas.
Abstract: Human-centered AI advocates the shift from emulating humans to empowering people so that AI can benefit humanity. In this talk, I discuss two directions on using LLMs to address challenging tasks for humans. First, I show that LLMs can be used to predict physician fatigue from clinical notes and reveal hidden racial biases: physicians appear more fatigued when seeing Hispanic and Black patients than White patients. Second, I present a recent work on generating novel hypotheses based on observed data. Our algorithm is able to enable an interpretable hypothesis-based classifier that makes accurate predictions. Moreover, the generated hypotheses not only corroborate human-verified theories but also uncover new insights for the tasks. I will conclude with some exciting future directions.
The Midwest Machine Learning Symposium is a forum for community-building and scholarly exchange, and we hope to foster a welcoming and positive environment for everyone. We will not tolerate any form of harassment, discrimination, or abuse. As a general code of conduct, we will adopt the ACM Policy Against Harassment. Since the event will be held on the grounds of the University of Minnesota Twin Cities, participants should also be aware of the UMN policies.
To report an incident or discuss any concerns, please approach an MMLS 2024 co-chair or email midwest.ml.2024@gmail.com. You may also use the UMN reporting options, including reaching out to the Title IX coordinator.