List of Titles and Abstracts
Beatrice Acciaio: Stochastic optimal transport in finance
In this talk I will introduce the concept of adapted optimal transport, that originates from imposing a causality constraint on couplings on path spaces. This is done in order to account for the flow of information which is encoded in the filtration. The resulting distances turn out to be suitable for the analysis of sensitivity and model misspecification in finance, as well as for many dynamic stochastic optimization problems. Some applications will be shown to illustrate their outreach.
Yann Brenier: Matrix-valued optimal transport in mathematical physics and mechanics
Optimal transport theory has been recently extended to matrix-valued density fields by many authors and in different ways. Several examples can be found in Fluid mechanics, quantum mechanics and general relativity.
Leon Bungert: The Geometry of Adversarial Machine Learning
It is well-known that despite their aptness for complicated tasks like image classification, modern neural networks are prone to insusceptible input perturbations (a.k.a. adversarial attacks) which can lead to severe misclassifications. Adversarial training is a state-of-the-art method to train classifiers which are more robust against these adversarial attacks. The method features minimization of a robust risk and has interpretations as game-theoretic problem, distributionally robust optimization problem, dual of an optimal transport problem, or nonlocal geometric regularization problem. In this talk I will focus on the last interpretation which allows for the application of tools from calculus of variations and geometric measure theory to study existence, regularity, and asymptotic behavior of minimizers. In particular, I will show that adversarial training of binary agnostic classifiers is equivalent to a nonlocal and weighted perimeter regularization of the decision boundary. Furthermore, I will show Gamma-convergence of this perimeter to a local anisotropic perimeter as the strength of the adversary tends to zero, thereby establishing an asymptotic regularization effect of adversarial training.
Charlotte Bunne: Neural Optimal Transport for Predicting Tumor Responses to Cancer Drugs
To accurately predict the responses of a patient’s tumor cells to a cancer drug, it is vital to recover the underlying population dynamics and fate decisions of single cells. However, measuring molecular properties of single cells requires destroying them. As a result, a cell population can only be monitored with sequential snapshots, obtained by sampling a few particles that are sacrificed in exchange for measurements. In order to reconstruct individual cell fate trajectories, as well as the overall dynamics, one needs to re-align these unpaired snapshots, in order to guess for each cell what it might have become at the next step. Optimal transport theory can provide such maps, and reconstruct these incremental changes in cell states over time. This celebrated theory provides the mathematical link that unifies the several contributions to model cellular dynamics that we present here: Inference from data of an energy potential best able to describe the evolution of differentiation processes (Bunne et al., 2022), building on the Jordan-Kinderlehrer-Otto (JKO) flow; recovery of differential equations modeling the stochastic transitions between cell fates in developmental processes (Bunne et al., 2023) through Schrödinger bridges; as well as zero-sum game theory models parameterizing distribution shifts upon interventions, which we employ to model heterogeneous responses of tumor cells to cancer drugs (Bunne et al., 2021).
Russel Caflisch: Optimization of the Boltzmann Equation
The kinetics of rarefied gases and plasmas are described by the Boltzmann equation and numerically approximated by the Direct Simulation Monte Carlo (DSMC) method. We present an optimization method for DSMC derived from an augmented Lagrangian. After a forward (in time) solution of DSMC, adjoint variables are found by a backward solver. They are equal to velocity derivatives of an objective function, which can then be optimized. This is joint work with Yunan Yang (ETH) and Denis Silantyev (U Colorado, Colorado Springs).
José A. Carrillo: Primal dual methods for Wasserstein gradient flows
Combining the classical theory of optimal transport with modern operator splitting techniques, I will present a new numerical method for nonlinear, nonlocal partial differential equations, arising in models of porous media,materials science, and biological swarming. Using the JKO scheme, along with the Benamou-Brenier dynamical characterization of the Wasserstein distance, we reduce computing the solution of these evolutionary PDEs to solving a sequence of fully discrete minimization problems, with strictly convex objective function and linear constraint. We compute the minimizer of these fully discrete problems by applying a recent, provably convergent primal dual splitting scheme for three operators. By leveraging the PDE’s underlying variational structure, ourmethod overcomes traditional stability issues arising from the strong nonlinearity and degeneracy, and it is also naturally positivity preserving and entropy decreasing. Furthermore, by transforming the traditional linear equality constraint, as has appeared in previous work, into a linear inequality constraint, our method converges in fewer iterations without sacrificing any accuracy. Remarkably, our method is also massively parallelizable and thus very efficient in resolving high dimensional problems. We prove that minimizers of the fully discrete problem converge to minimizers of the continuum JKO problem as the discretization is refined, and in the process, we recover convergence results for existing numerical methods for computing Wasserstein geodesics. Finally, we conclude with simulations of nonlinear PDEs and Wasserstein geodesics in one and two dimensions that illustrate the key properties of our numerical method.
Marta Catalano: Measuring the impact of the prior in Bayesian nonparametrics via optimal transport
The Bayesian approach to inference is based on a coherent probabilistic framework that naturally leads to principled uncertainty quantification and prediction. Via conditional (or posterior) distributions, Bayesian nonparametric models make inference on parameters belonging to infinite-dimensional spaces, such as the space of probability distributions. The development of Bayesian nonparametrics has been triggered by the Dirichlet process, a nonparametric prior that allows one to learn the law of the observations through closed-form expressions. Still, its learning mechanism is often too simplistic and many generalizations have been proposed to increase its flexibility, a popular one being the class of normalized completely random measures. Here we investigate a simple yet fundamental matter: will a different prior actually guarantee a different learning outcome? To this end, we develop a new distance between completely random measures based on optimal transport, which provides an original framework for quantifying the similarity between posterior distributions (or merging of opinions). Our findings provide neat and interpretable insights on the impact of popular Bayesian nonparametric priors, with very mild assumptions on the data-generating process. This is joint work with Hugo Lavenant.
Marco Cuturi: On the Monge gap and the MBO feature-sparse transport estimator
This talk will cover two recent works aimed at estimating Monge maps from samples. In the first part (in collaboration with Théo Uscidda) I will present a novel approach to train neural networks so that they mimic Monge maps for the squared-Euclidean cost. In that field, a popular approach has been to parameterize dual potentials using input convex neural networks, and estimate their parameters using SGD and a convex conjugate approximation. We present in this work a regularizer for that task that is conceptually simpler (as it does not require any assumption on the architecture) and which extends to non-Euclidean costs. In the second part (in collaboration with Michal Klein and Pierre Ablin), I will show that when adding to the squared-Euclidean distance an extra translation-invariant cost, the Brenier theorem translates into the application of the proximal mapping of that extra term to the derivative of the dual potential. Using an entropic map to parameterize that potential, we obtain the Monge-Bregman-Occam (MBO) estimator, which has the defining property that its displacement vectors T(x) - x are sparse, resulting in interpretable OT maps in high dimensions.
Simone Di Marino: First order expansion for regularizations in Optimal Transport: Monge Entropic Optimal Transport and the semi-classical limit of DFT
It is quite natural to consider regularization procedures for the classical optimal transport problem. A common way to do it is to add a penalization term to the optimization problem on the transport plans, which guarantees some kind of regularity of the plan. The main question discussed in this talk is the limit of the values of the penalized optimization problems as the penalization vanishes. The main examples considered will be the Entropic Optimal Transport for the Monge cost and the semiclassical-limit of Density Functional Theory.
Stephan Eckstein: Optimal transport and Wasserstein distances for causal models
We present a framework for optimal transport aimed at probability measures arising from structural causal models, i.e., models that contain an information structure dictated by a directed graph. The structure of the graph determines the exact specification of the optimal transport problem. The introduced framework recovers different versions of optimal transport for particular graph structures, like the standard OT problem (fully connected graph), the adapted OT problem (linear graph) or problems related to the Gromov-Wasserstein distance (graph without edges). The general goal of the introduced setting is to provide a concept of optimal transport whose topological and geometric properties are well suited for structural causal models. In this regard, we show that the resulting concept of Wasserstein distance can be used to control the difference between average treatment effects under different distributions, and is geometrically suitable to interpolate between different structural causal models.
Björn Engquist: Pre and post processing data for optimal transport applications
Already from the beginning of optimal transport applications and theory development have been connected starting with Monge and Kantorovich. We will remark on the balance between Pre and post processing data versus developing new theory in the application of seismic inversion. One example is normalization of input data versus using unbalanced optimal transport.
Franca Hoffmann: Gradient Flows in the covariance-modulated optimal transport geometry
We present a variant of the dynamical optimal transport problem in which the energy to be minimised is modulated by the covariance matrix of the current distribution. Such transport metrics arise naturally in mean-field limits of certain ensemble Kalman methods for solving inverse problems. We show that the transport problem splits into two coupled minimization problems up to degrees of freedom given by rotations: one for the evolution of mean and covariance of the interpolating curve, and one for its shape. Similarly, on the level of the gradient flows a similar splitting into the evolution of moments and shapes of the distribution can be observed. Those show better convergence properties in comparison to the classical Wasserstein metric in terms of exponential convergence rates independent of the Gaussian target.
Christian Klingenberg: Optimization of chemotaxis-type kinetic equations in a multi-scale context
We have a biological experiment that can be modeled both by i.) a kinetic chemotaxis-type equation and ii.) a macroscopic Keller-Segel-type PDE. In i.) the collision kernel and in ii.) the diffusion and drift term need to be determined from the experimental measurements.
For these PDE inverse problems we are able to design the experiment such that
- we prove that one can determine i.) the collision kernel; ii.) the diffusion and drift term,
- the fluid limit for the inverse problem can be understood in a Bayesian sense,
- the corresponding constrained optimization problem for i.) is numerically solvable.
We give an outlook on the optimal experimental design for this problem.
This is joint work with Kathrin Hellmuth (Würzburg), Qin Li (Madison), Min Tang (Shanghai) and Yunan Yang (Zürich).
Anna Korba: Sampling with Mollified Interaction Energy Descent
Sampling from a target measure whose density is only known up to a normalization constant is a fundamental problem in computational statistics and machine learning. In this talk, I will present a new optimization-based method for sampling called mollified interaction energy descent (MIED). MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs). These energies rely on mollifier functions—smooth approximations of the Dirac delta originated from PDE theory. We show that as the mollifier approaches the Dirac delta, the MIE converges to the chi-square divergence with respect to the target measure and the gradient flow of the MIE agrees with that of the chi-square divergence. Optimizing this energy with proper discretization yields a practical first-order particle-based algorithm for sampling in both unconstrained and constrained domains. We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particle-based algorithms like Stein Variational Gradient Descent (SVGD), while for constrained sampling problems our method readily incorporates constrained optimization techniques to handle more flexible constraints with strong performance compared to alternatives.
Hugo Lavenant: Measuring dependence with Wasserstein distances (in Bayesian Nonparametrics)
Computing the Wasserstein distance between a joint distribution and an extremal dependence coupling (independence or maximal dependence) can be a natural way to quantify the amount of dependence in the joint distribution. To normalize this quantity properly, one needs to solve an intriguing maximization problem of a Wasserstein distance. We apply this idea to measure dependence between completely random vectors. They are central objects to model in a Bayesian way the borrowing of information between similar, yet different sources of observations To achieve our goal we need to use a Wasserstein distance between Lévy intensities, which typically have infinite mass. This is joint work with Marta Catalano, Antonio Lijoi, and Igor Prünster.
Luca Nenna: Moment-Constrained Approximation of the Lieb functional
The aim of this talk is to present new sparsity results about the so-called Lieb functional, which is a key quantity in Density Functional Theory for electronic structure calculations for molecules. The Lieb functional was actually shown by Lieb to be a convexification of the so-called Lévy-Lieb functional. Given an electronic density for a system of N electrons, which may be seen as a probability density on R^3, the value of the Lieb functional for this density is defined as the solution of a quantum multi-marginal optimal transport problem, which reads as a minimization problem defined on the set of trace-class operators acting on the space of electronic wavefunctions that are antisymmetric L^2 functions of R^{3N}, with partial trace equal to the prescribed electronic density. We introduce a relaxation of this quantum optimal transport where the full partial trace constraint is replaced by a finite number of moment constraints on the partial trace of the set of operators. We show that, under mild assumptions on the electronic density, there exist sparse minimizers to the moment-constrained approximation of the Lieb (MCAL) functional that read as operators with rank at most equal to the number of moment constraints. We also prove under appropriate assumptions on the set of moment functions that the value of the MCAL functional converges to the value of the exact Lieb functional as the number of moments go to infinity. Finally, we show that a semi-classical limit holds, namely MCAL \Gamma-converges to the moment constraints multi-marginal optimal transport.This is a joint work with Virginie Ehrlacher.
Felix Otto: A variational regularity theory for OT and its application to matching
A couple of years ago, with M. Goldman we devised a new approach to the regularity theory for OT that mimics De Giorgi's approach to the regularity theory of minimal surfaces in the sense that a harmonic approximation result is at its center: Under a non-dimensional smallness condition, the displacement is close to the gradient of a harmonic function.
Probably the main advantage of this variational regularity theory over the one based on maximum principle is that it does not
require any regularity of the involved measures. Hence it can be applied to the popular matching problem, where it provides regularity on large scales (work with F. Mattesini and M. Huesmann).
Kui Ren: On computational inversion with metrics from optimal transport
Metrics originated from optimal transport theory have been used recently in solving computational inverse problems related to PDEs. Some obvious advantages (such as its stability against high-frequency noise) and disadvantages (such as the loss of resolution at when iterations are prematurely stopped) over classical L2 based least-squares method have been reported, especially in applications in full waveform inversion. I will discuss some recent understanding of some aspects of computational inversion with optimal transport, especially on its advantage on the optimization landscape.
Bernhard Schmitzer: Entropic transfer operators for data-driven analysis of dynamical systems
The transfer operator is an elegant way to capture the behaviour of a (stochastic) dynamical system as a linear operator. Spectral analysis can then in principle reveal (almost) invariant measures, cyclical behaviour, as well as separation of the dynamics into different time scales. In practice this analysis can rarely be done analytically, due to the complexity of the operator or since it may not be known in closed form. A central objective is therefore to numerically approximate this operator (or its adjoint: the Koopman operator) or to estimate it from data. In this talk we introduce a new estimation method based on entropic optimal transport and show convergence to a smoothed version of the original operator as more data becomes available. This involves an interplay between three different length scales: the discretization scale given by the data, the blur scale introduced by entropic transport, and the spatial scale of eigenfunctions of the operator
Dejan Slepčev: Geometry of sliced optimal transport and projected-transport gradient flows
We will discuss two types of objects that can be approximated in high-dimensions. Recent results have established that sliced-Wasserstein (SW) distance can be approximated accurately in high dimensions based on samples of the measures considered. We will discuss the geometry of the SW distance. In particular we will characterize tangent space to the SW space as a certain weighted negative Sobolev space and obtain the local metric. We show that SW space is not a length space and establish properties of the geodesic distance, relevant to gradient flows in the space.
To obtain gradient flows that can be approximated in high dimensions we introduce the projected Wasserstein distance where the space of velocities has been restricted to have low complexity. We will show some of the basic properties of the distance and the corresponding gradient flows. Application towards interacting particle methods for sampling will also be discussed.
The talk is based on joint works with Sangmin Park and Lantian Xu.
Matthew Thorpe: Linearised Optimal Transport Distances
Optimal transport is a powerful tool for measuring the distances between signals and images. A common choice is to use the Wasserstein distance where one is required to treat the signal as a probability measure. This places restrictive conditions on the signals and although ad-hoc renormalisation can be applied to sets of unnormalised measures this can often dampen features of the signal. The second disadvantage is that despite recent advances, computing optimal transport distances for large sets is still difficult. In this talk I will extend the linearisation of optimal transport distances to the Hellinger--Kantorovich distance, which can be applied between any pair of non-negative measures, and the TLp distance, a version of optimal transport applicable to functions. Linearisation provides an embedding into a Euclidean space where the Euclidean distance in the embedded space is approximately the optimal transport distance in the original space. This method, in particular, allows for the application of off-the-shelf data analysis tools such as principal component analysis as well as reducing the number of optimal transport calculations from O(n^2) to O(n) in a data set of size n. I will touch on a range of applications such as data generation, classification of particle decays and colour transfer.
Li Wang: Variational methods for general nonlinear gradient flows
In this talk, I will introduce a general variational framework for nonlinear evolution equations with a gradient flow structure, which arise in material science, animal swarms, chemotaxis, and deep learning, among many others. Building upon this framework, we develop numerical methods that have built-in properties such as positivity preserving and entropy decreasing, and resolve stability issues due to the strong nonlinearity. I will specifically discuss how to leverage ideas from optimization and machine learning to overcome difficulties such as boundedness requirement, slow convergence, and high dimensionality.
Marie-Therese Wolfram: Inverse optimal transport
Discrete optimal transportation problems arise in various contexts in engineering, the sciences and the social sciences. Often the underlying cost criterion is unknown, or only partly known, and the observed optimal solutions are corrupted by noise. In this talk we propose a systematic approach to infer unknown costs from noisy observations of optimal transportation plans. The algorithm requires only the ability to solve the forward optimal transport problem, which is a linear program, and to generate random numbers. It has a Bayesian interpretation, and may also be viewed as a form of stochastic optimization.
We illustrate the developed methodologies using the example of international migration flows. Reported migration flow data captures (noisily) the number of individuals moving from one country to another in a given period of time. It can be interpreted as a noisy observation of an optimal transportation map, with costs related to the geographical position of countries. We use a graph-based formulation of the problem, with countries at the nodes of graphs and non-zero weighted adjacencies only on edges between countries which share a border. We use the proposed algorithm to estimate the weights, which represent cost of transition, and to quantify uncertainty in these weights.