Le groupe de travail est ouvert à tous, évidemment aussi aux chercheurs en dehors de l’UGE, mais surtout à ceux qui ont un intérêt de près ou de loin pour le transport optimal et ses applications. Les séances consistent en un exposé, ou un exposé court et un exposé long.
séance 13:
Jeudi 25 juin 2024 - 14h00
Salle de séminaire du LIGM: 4ème étage du bâtiment Copernic.
Orateur: Maciej Buze
(Cliquer pour le résumé)Entropic regularisation of unbalanced optimal transportation problems
We develop a mathematical theory of entropic regularisation of unbalanced optimal transport problems. Focusing on static formulation and relying on the formalism developed for the unregularised case, we show that unbalanced optimal transport problems can be regularised in two qualitatively distinct ways - either on the original space or on the extended space. We derive several reformulations of the two regularised problems and in particular introduce the idea of a regularised induced marginal perspective cost function allowing us to derive an extended space formulation of the original space regularisation. We also prove convergence to the unregularised problem in the case of the extended space regularisation and discuss on-going work on deriving a unified framework based on higher order liftings in which both regularisations can be directly compared.
séance 12:
Jeudi 23 mai 2024 - 14h00
Salle de séminaire du LIGM: 4ème étage du bâtiment Copernic.
Orateur: Borjan Geshkovski
(Cliquer pour le résumé)Agrégation et couplage de mesures grâce aux Transformers
Le « Transformer » désigne une architecture de réseaux de neurones profonds qui est très utilisée dans le traitement automatique des langues. Il se trouve que le Transformer peut être modélisé comme un système de particules en interaction sur la sphère, dans lequel apparaissent certains paramètres multiplicatifs. Nous montrerons d’abord comment des agrégats apparaissent au fil du temps quand les paramètres sont judicieusement choisis. Nous montrerons ensuite que, grâce à ces paramètres, il est possible d’utiliser le flot généré par un transformer comme un couplage (non optimal) de plusieurs mesures.
séance 11:
Jeudi 18 avril 2024 - 14h00
Salle de séminaire du LIGM: 4ème étage du bâtiment Copernic.
Oratrice: Kimia Nadjahi (CSD ENS Paris)
(Cliquer pour le résumé)Scalable Unbalanced Optimal Transport by Slicing.
Substantial advances have been made in designing optimal transport (OT) variants which are either computationally and statistically more efficient or robust. Among them, sliced OT distances have been extensively used to mitigate the cubic algorithmic complexity and curse of dimensionality of OT. In parallel, unbalanced OT was designed to allow comparisons of more general positive measures, while being robust to outliers. In this talk, we bridge the gap between those two concepts and develop a general framework for efficiently comparing positive measures. We formulate two different versions of “sliced unbalanced OT” and study the associated topology and statistical properties. We then develop a GPU-friendly Frank-Wolfe algorithm to compute the two corresponding loss functions. The resulting methodology is modular, as it encompasses and extends prior related work, and brings the cubical computational complexity down to almost linear O(n log(n)) when comparing two discrete measures supported on n points. We finally conduct an empirical analysis of our methodology on both synthetic and real datasets, to illustrate its computational efficiency, relevance and applicability to real-world scenarios, including color transfer and barycenters of geophysical data.
séance 10:
Jeudi 21 mars 2024 - 14h00
Salle de séminaire du LIGM: 4ème étage du bâtiment Copernic.
Orateur: Nicolas Schreuder (LIGM)
(Cliquer pour le résumé)Fairness in machine learning: a study of the Demographic Parity constraint.
In various domains, statistical algorithms trained on personal data take pivotal decisions which influence our lives on a daily basis. Recent studies show that a naive use of these algorithms in sensitive domains may lead to unfair and discriminating decisions, often inheriting or even amplifying biases present in data. In the first part of the talk, I will introduce and discuss the question of fairness in machine learning through concrete examples of biases coming from the data and/or from the algorithms. In a second part, I will demonstrate how statistical learning theory can help us better understand and overcome some of those biases. In particular, I will present a selection of recent results from two of my papers on the Demographic Parity constraint. In particular I will present an interesting link between fairness and optimal transport.
References:
- A minimax framework for quantifying risk-fairness trade-off in regression (with E. Chzhen), Ann. Statist. 50(4): 2416-2442 (Aug. 2022). DOI: 10.1214⁄22-AOS2198;
- Fair learning with Wasserstein barycenters for non-decomposable performance measures (with S. Gaucher and E. Chzhen), AISTATS 2023.
séance 9:
Jeudi 22 février 2024 - 14h00
Salle de séminaire du LIGM: 4ème étage du bâtiment Copernic.
Orateur: Pierre Ablin (Apple)
(Cliquer pour le résumé)Transformers, Dynamical Systems and Optimal Transport.
The transformer architecture is one of the cornerstones that enables the large language models revolution. It is a neural network architecture that acts on arbitrary length sequences of vectors. At its core is the attention mechanism, which transforms a sequence into another sequence, where each element of the sequence interacts with each other.
The goal of this talk is to i) give a comprehensive introduction to the transformer architecture, ii) explain its connection with interacting particles systems and optimal transport and iii) present recent results on the Lipschitz constant of attention.
The talk is based on the following papers: - Sander, Michael E., Pierre Ablin, Mathieu Blondel, and Gabriel Peyré. “Sinkformers: Transformers with doubly stochastic attention.” In International Conference on Artificial Intelligence and Statistics, pp. 3515-3530. PMLR, 2022. https://arxiv.org/abs/2110.11773 - Castin, Valérie, Pierre Ablin, and Gabriel Peyré. “Understanding the Regularity of Self-Attention with Optimal Transport.” arXiv preprint arXiv:2312.14820 (2023). https://arxiv.org/abs/2312.14820
séance 8:
Jeudi 25 janvier 2024 - 14h00
Salle de séminaire du LIGM: 4ème étage du bâtiment Copernic.
Orateur: Marc Lambert (INRIA)
(Cliquer pour le résumé)On the approximation of the Fokker-Planck equation using a constrained JKO scheme.
We want to propagate a Gaussian distribution through a Langevin SDE. This problem has applications in Kalman filtering, where the SDE models a system’s dynamic under uncertainties, but also in density estimation since the SDE converges in distribution to the Gibbs density. We approximate the Fokker-Planck equation by considering a JKO scheme where the solution is constrained to lie in the space of Gaussian distributions. We find a solution as a system of ODE on the mean and covariance; the corresponding approximated SDE is a Mc Kean Vlasov process well known in MFG. We extend these results to a Gaussian mixture where the solution appears to be a system of Gaussian interacting particles. We finally apply these results to approximate the optimal filter Kushner equation in the context of Bayesian filtering with continuous observations. The solution appears to be a variational generalization of the Riccati equations. This presentation will be based on the following two papers: https://arxiv.org/abs/2205.15902 and https://arxiv.org/abs/2310.01859.
séance 7:
Lundi 3 avril - 14h00
Salle de séminaire du LIGM: 4ème étage du bâtiment Copernic.
Orateur: Gabriele Todeschi (LIGM)
(Cliquer pour le résumé)A relaxed Beckmann’s problem for SPD valued measures and application to Full Waveform Inversion.
Full Waveform Inversion (FWI) is a seismic imaging technique that aims at reconstructing subsurface parameters. It can be formulated as a highly non-convex variational problem, where the main difficulty comes from discerning temporal shifts in the data. For this reason, optimal transport has recently started to be considered as a misfit function between observed and generated data. However, these are made of signed oscillating signals, whence the need to handle them prior to its application. To bypass this issue, we take advantage of multi-component seismograms, lift them into positive semi-definite (SPD) matrices and design an optimal transport problem between SPD valued measures. To this end, we extend in a natural way to this setting the Beckmann’s formulation of the L1 transport. I will present and analyze the model, in particular its calibration for the FWI problem. Importantly, the model admits a primal Kantorovich form, which I will discuss. I will finally provide some good preliminary results of the application of this misfit function, showing the effectiveness of the approach. This is a joint work with Jean-Marie Mirebeau and Ludovic Métivier.
séance 6:
Lundi 13 février - 14h00
Salle de séminaire du LIGM: 4ème étage du bâtiment Copernic.
Orateur: Vincent Divol (Paris-Dauphine)
(Cliquer pour le résumé)Estimation d’applications de transport optimal dans des espaces fonctionnels généraux.
Nous considérons le problème de l’estimation d’une application de transport optimal entre une loi source P (fixée) et une loi cible inconnue Q, sur la base d’un échantillon de loi Q. Un tel problème a récemment gagné en popularité avec de nouvelles applications en apprentissage automatique, comme les modèles génératifs. Jusqu’à maintenant, des vitesses d’estimations sont connues seulement dans un petit nombre de cas (par exemple, quand P et Q ont des densités majorées et minorées et que l’application de transport appartient à un espace de Hölder), qui sont rarement réalisés en pratique. Nous présentons une méthodologie permettant d’obtenir des vitesses d’estimation de l’application de transport optimal dans des espaces généraux, qui se base sur l’optimisation de la formulation duale du problème de transport empirique. À titre d’exemple, nous pouvons donner des vitesses de convergence dans le cas où P est gaussien et l’application de transport est donnée par un réseau de neurones à deux couches avec un nombre arbitrairement grands de neurones.
Collaboration avec Aram-Alexandre Pooladian et Jonathan Niles-Weed
Séance 5:
Lundi 6 février - 14h00
Salle de séminaire du LIGM: 4ème étage du bâtiment Copernic.
Orateur: Luca Nenna (LMO)
(Cliquer pour le résumé)Grand Canonical Optimal Transport.
In this talk I will briefly review standard Multi-Marginal Optimal Transport (a number N of marginals is fixed) focusing, in particular, on the applications in Quantum Mechanics (in this case the marginals are all the same and represent the electron density). I will then extend the Optimal Transportation problem to the grand canonical setting: only the expected number of marginals/electrons is now given (i.e. we can now define an OT problem with a fractional number of marginals). I will compare these two problems and show how they behave differently despite considering the same cost functions. Existence of minimisers, Monge solution, duality, entropic formulation and numerics will be discussed.
Séance 4:
vendredi 16 décembre - 16h15
Salle de séminaire du LIGM: 4ème étage du bâtiment Copernic.
Orateur: Christophe Denis (LAMA)
(Cliquer pour le résumé)
Dans un premier temps, nous exposerons des résultats récents obtenus en équité algorithmique dans le cadre de la classification multi-classes. Dans un deuxième, nous expliciterons le lien entre la théorie du transport optimal et la problématique d’équité algorithmique et les questions qui en découlent en apprentissage multi-classes.Équité algorithmique et lien avec le transport optimal.
- Je ferai aussi un exposé court sur un travail récent (en commun avec Théo Dumont et Théo Lacombe (LIGM)):
(Cliquer pour le résumé)
For the L2-Gromov-Wasserstein distance, we study the structure of minimizers in Euclidean spaces for two different costs. The first cost is the scalar product for which we prove that it is always possible to find optimizers as Monge maps and we detail the structure of such optimal maps. The second cost is the squared Euclidean distance for which we show that the worst case scenario is the existence of a map-anti-map structure. Both results are direct and indirect consequences of an existence result on costs that are defined by submersions. In dimension one for the squared distance, we show that a monotone map is optimal in some non-symmetric situations, thereby giving insight on why such a map is often found optimal in numerical experiments. In addition, we show numerical evidence for a negative answer to the existence of a Monge map under the conditions of Breniers theorem, suggesting that our result cannot be improved in general. arXiv: 2210.11945.On the existence of Monge maps for the Gromov-Wasserstein distance.
Troisième séance:
lundi 27 juin 2022. UPEM, LAMA, 4B107 (quatrième étage du bâtiment Copernic), 13h - 15h.- Exposé de François-Pierre Paty :
(Cliquer pour le résumé)
Abstract: The theory of weak optimal transport (WOT), introduced by [Gozlan et al., 2017], generalizes the classic Monge-Kantorovich framework by allowing the transport cost between one point and the points it is matched with to be nonlinear. In the so-called barycentric version of WOT, the cost for transporting a point x only depends on x and on the barycenter of the points it is matched with. This aggregation property of WOT is appealing in machine learning, economics and finance. Yet algorithms to compute WOT have only been developed for the special case of quadratic barycentric WOT, or depend on neural networks with no guarantee on the computed value and matching. The main difficulty lies in the transportation constraints which are costly to project onto. In this paper, we propose to use mirror descent algorithms to solve the primal and dual versions of the WOT problem. We also apply our algorithms to the variant of WOT introduced by [Choné et al., 2022] where mass is distributed from one space to another through unnormalized kernels (WOTUK). We empirically compare the solutions of WOT and WOTUK with classical OT. We illustrate our numerical methods to the economic framework of [Choné and Kramarz, 2021], namely the matching between workers and firms on labor markets.Sur des approches numériques pour le transport optimal faible et ses applications.
- Exposé d’Adrien Vacher :
(Cliquer pour le résumé)
Abstract: It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds. However, rather than the distance itself, the object of interest for applications such as generative modeling is the underlying optimal transport map. Hence, computational and statistical guarantees need to be obtained for the estimated maps themselves. In this paper, we propose the first tractable algorithm for which the statistical L2 error on the maps nearly matches the existing minimax lower-bounds for smooth map estimation. Our method is based on solving the semi-dual formulation of optimal transport with an infinite-dimensional sum-of-squares reformulation, and leads to an algorithm which has dimension-free polynomial rates in the number of samples, with potentially exponentially dimension-dependent constants.Estimation statistique et computationnelle des potentiels de transport en grande dimension à partir d’échantillons.
Deuxième séance: lundi 23 mai 2022.
ENPC, salle de séminaire du CERMICS, 13h - 15h.
Exposé (seconde partie) de Paul-Marie Samson sur le problème de transport optimal faible (slides: Coûts de transport optimal faibles et dualité de Kantorovich).
Exposé de Benjamin Jourdain sur le transport optimal martingale et sa stabilité.
1ère séance: mardi 19 avril 2022.
Univ. Gustave Eiffel, Bâtiment Copernic, salle 2B140, 13h - 15h.
Exposé de Thomas Bonis sur la méthode de Stein Variational Gradient Descent (slides: Echantillonnage par descente variationnelle de Stein).
Exposé (première partie) de Paul-Marie Samson sur le problème de transport optimal faible (slides: Coûts de transport optimal faibles et dualité de Kantorovich).