2025 seminars

Europe/Lisbon
Room P3.10, Mathematics Building — Online

João Costa
João Costa, CAMGSD & ISCTE

Introduction to Deep Learning for mathematicians I

The goal of these lectures is to give a simple and direct introduction to some of the most basic concepts and techniques in Deep Learning. We will start by reviewing the fundamentals of Linear Regression and Linear Classifiers, and from there we will find our way into Deep Dense Neural Networks (aka multi-layer perceptrons). Then, we will introduce the theoretical and practical minimum to train such neural nets to perform the classification of handwritten digits, as provided by the MNIST dataset. This will require, in particular, the efficient computation of the gradients of the loss wrt the parameters of the model, which is achieved by backpropagation. Finally, if time permits, we will briefly describe other neural network architectures, such as Convolution Networks and Transformers, and other applications of deep learning, including Physics Informed Neural Networks, which apply neural nets to find approximate solutions of Differential Equations. The lectures will be accompanied by Python code, implementing some of these basic techniques.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

João Costa, CAMGSD & ISCTE

Introduction to Deep Learning for mathematicians II

The goal of these lectures is to give a simple and direct introduction to some of the most basic concepts and techniques in Deep Learning. We will start by reviewing the fundamentals of Linear Regression and Linear Classifiers, and from there we will find our way into Deep Dense Neural Networks (aka multi-layer perceptrons). Then, we will introduce the theoretical and practical minimum to train such neural nets to perform the classification of handwritten digits, as provided by the MNIST dataset. This will require, in particular, the efficient computation of the gradients of the loss wrt the parameters of the model, which is achieved by backpropagation. Finally, if time permits, we will briefly describe other neural network architectures, such as Convolution Networks and Transformers, and other applications of deep learning, including Physics Informed Neural Networks, which apply neural nets to find approximate solutions of Differential Equations. The lectures will be accompanied by Python code, implementing some of these basic techniques.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Luís Carvalho
Luís Carvalho, CAMGSD & ISCTE

Aspects of approximation, optimization, and generalization in Machine Learning I

This talk offers a leisurely-paced and informal introduction to some classical results at the intersection of mathematics and machine learning theory. We will explore the subject through three central lenses: approximation, optimization, and generalization. Particular attention will be given to universal approximation theorems, which illustrate the expressive power of neural networks. The focus is on foundational ideas and mathematical intuition, I will also highlight some limitations of these classical tools. The goal is not to be exhaustive, but to offer a broad perspective and present a few selected proofs related to expressivity along the way.

References

  1. A. Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 1999.
  2. J. Berner, P. Grohs, G. Kutyniok and P. Petersen. The Modern Mathematics of Deep Learning, in: Mathematical Aspects of Deep Learning, CUP, 2023.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Luís Carvalho, CAMGSD & ISCTE

Aspects of approximation, optimization, and generalization in Machine Learning II

This talk offers a leisurely-paced and informal introduction to some classical results at the intersection of mathematics and machine learning theory. We will explore the subject through three central lenses: approximation, optimization, and generalization. Particular attention will be given to universal approximation theorems, which illustrate the expressive power of neural networks. The focus is on foundational ideas and mathematical intuition, I will also highlight some limitations of these classical tools. The goal is not to be exhaustive, but to offer a broad perspective and present a few selected proofs related to expressivity along the way.

References

  1. A. Pinkus. Approximation theory of the MLP model in neural networks. Acta Numerica, 1999.
  2. J. Berner, P. Grohs, G. Kutyniok and P. Petersen. The Modern Mathematics of Deep Learning, in: Mathematical Aspects of Deep Learning, CUP, 2023.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Gonçalo Oliveira
Gonçalo Oliveira, CAMGSD & Instituto Superior Técnico

Infinitely wide Neural Networks I

I will explain how to think of infinitely wide neural networks at both initialization and during training. This means, its initial value and how it evolves along its training. At initialization, I will show that such neural networks are equivalent to a Gaussian process. During training, I will show that their evolution is equivalent to an autonomous linear flow in the space of functions. This is related to a phenomenon called (the lack of) feature learning and I intend to at least mention what that is.

Based on:

  1. Luís Carvalho, João Lopes Costa, José Mourão, Gonçalo Oliveira. Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training, arXiv:2304.03385.
  2. L. Carvalho, J. L. Costa, J. Mourão, G. Oliveira. The positivity of the Neural Tangent Kernel, to appear in SIMODS (SIAM Journal on Mathematics of Data Science), arXiv:2404.12928.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Gonçalo Oliveira, CAMGSD & Instituto Superior Técnico

Infinitely wide Neural Networks II

I will explain how to think of infinitely wide neural networks at both initialization and during training. This means, its initial value and how it evolves along its training. At initialization, I will show that such neural networks are equivalent to a Gaussian process. During training, I will show that their evolution is equivalent to an autonomous linear flow in the space of functions. This is related to a phenomenon called (the lack of) feature learning and I intend to at least mention what that is.

Based on:

  1. Luís Carvalho, João Lopes Costa, José Mourão, Gonçalo Oliveira. Wide neural networks: From non-gaussian random fields at initialization to the NTK geometry of training, arXiv:2304.03385.
  2. L. Carvalho, J. L. Costa, J. Mourão, G. Oliveira. The positivity of the Neural Tangent Kernel, to appear in SIMODS (SIAM Journal on Mathematics of Data Science), arXiv:2404.12928.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Pedro A. Santos
Pedro A. Santos, INESC & Instituto Superior Técnico

Introduction to Reinforcement Learning and Markov Decision Processes I

I will offer an introductory exploration into the field of Reinforcement Learning (RL) with a focus on Markov Decision Processes (MDPs). The first session provides a foundational understanding of RL, covering key concepts such as agents, environments, rewards, and actions. It explains the RL problem framework and introduces MDPs, exploring their role as the mathematical framework underpinning RL.

The second session delves into core algorithms, including Q-learning and policy gradients. The lecture highlights the connection between MDPs and dynamic programming techniques, emphasizing policy iteration and value iteration. Time allowing, I will finalize with a brief description of some recent research topics and results.

A good introduction to RL is the 2018 book on the subject by Sutton and Barto. We will talk about topics in Chapters 1,2-6 and 13.

A more rigorous introduction to MDPs, including convergence results, can be found in the book by Puterman:

Martin L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, 2005.

Additional file

document preview

PA Santos M4AI_Presentation_Part_1.pdf

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Pedro A. Santos, INESC & Instituto Superior Técnico

Introduction to Reinforcement Learning and Markov Decision Processes II

I will offer an introductory exploration into the field of Reinforcement Learning (RL) with a focus on Markov Decision Processes (MDPs). The first session provides a foundational understanding of RL, covering key concepts such as agents, environments, rewards, and actions. It explains the RL problem framework and introduces MDPs, exploring their role as the mathematical framework underpinning RL.

The second session delves into core algorithms, including Q-learning and policy gradients. The lecture highlights the connection between MDPs and dynamic programming techniques, emphasizing policy iteration and value iteration. Time allowing, I will finalize with a brief description of some recent research topics and results.

Additional file

document preview

PA Santos M4AI-Introduction to MDPs and Reinforcement Learning II.pdf

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Damian Kaloni Mayorga Pena
Damian Kaloni Mayorga Pena, Instituto Superior Técnico

Some applications of supervised and semi-supervised learning

In this talk, I will discuss applications of deep neural networks as approximators. I will demonstrate an implementation of Gaussian processes for predicting baryon operator masses based on the meson spectrum of QCD, inspired by an idea from Witten. I will compare these results with those obtained from neural networks with finite width and depth. The second part of the talk will focus on using Physics-Informed Neural Networks (PINNs) to solve the Monge-Ampère equation on a Calabi-Yau manifold, including a comparison with approaches like Donaldson's algorithm.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Damian Kaloni Mayorga Pena, Instituto Superior Técnico

Some applications of supervised and semi-supervised learning

In this talk, I will discuss applications of deep neural networks as approximators. I will demonstrate an implementation of Gaussian processes for predicting baryon operator masses based on the meson spectrum of QCD, inspired by an idea from Witten. I will compare these results with those obtained from neural networks with finite width and depth. The second part of the talk will focus on using Physics-Informed Neural Networks (PINNs) to solve the Monge-Ampère equation on a Calabi-Yau manifold, including a comparison with approaches like Donaldson's algorithm.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Miguel Couceiro
Miguel Couceiro, INESC & Instituto Superior Técnico

Analogical Reasoning: Theory, Applications and further surprises I

Analogical reasoning is a powerful inductive mechanism, widely used in human cognition and increasingly applied in artificial intelligence. Formal frameworks for analogical inference have been developed for Boolean domains, where inference is provably sound for affine functions and approximately correct when close to affine. These results enabled the design of analogy-based classifiers. However, they do not extend to regression tasks or continuous domains.

In this series of seminars we will revisit analogical inference from a foundational perspective. After a brief motivation, we will first present a recently proposed formalism to model numerical analogies that relies on p-generalized means, and that enables a unifying framework that subsume the classical notions of arithmetic, geometric and harmonic analogies. We will derive several interesting properties such as transitivity of conformity, as well as present algorithmic approaches to detect and compute the parameter p.

In the second part of this series, we will leverage this unified formalism and lift analogical reasoning to real-valued domains and various ML&AI downstream tasks. In particular, we will see that it supports analogical inference over continuous functions, and thus both classification and regression tasks. We characterize the class of analogy-preserving functions in this setting and derive both worst-case and average-case error bounds under smoothness assumptions. If time allows, we will also discuss further applications, e.g., on image reconstruction and NLP downstream tasks.

These two seminars are based on several published and recently submitted by Miguel Couceiro and his collaborators, including Francisco Malaca and Francisco Vincente Cunha, respectively, graduate and undergraduate students at the DM@IST.

Some very recent references

  1. Francisco Malaca, Yves Lepage, Miguel Couceiro. Numerical analogies through generalized means:notion, properties and algorithmic approaches. Submitted.
  2. Francisco Cunha, Yves Lepage, Zied Bouraoui, Miguel Couceiro. Generalizing Analogical Inference Across Boolean and Continuous Domains. Submitted.
  3. Jakub Pillion, Miguel Couceiro, Yves Lepage. Analogical pooling for image reconstruction. Submitted.
  4. Fadi Badra, Esteban Marquer, Marie-Jeanne Lesot, Miguel Couceiro, David Leake. EnergyCompress: A General Case Base Learning Strategy. To appear in IJCAI2025.
  5. Yves Lepage, Miguel Couceiro. Any four real numbers are on all fours with analogy. CoRR abs/2407.18770 (2024)
  6. Miguel Couceiro, Erkko Lehtonen. Galois theory for analogical classifiers. Ann. Math. Artif. Intell. 92(1): 29-47 (2024)
  7. Pierre Monnin, Cherif-Hassan Nousradine, Lucas Jarnac, Laurel Zuckerman, Miguel Couceiro. KGPRUNE: A Web Application to Extract Subgraphs of Interest from Wikidata with Analogical Pruning. ECAI 2024: 4495-4498
  8. Yves Lepage, Miguel Couceiro. Analogie et moyenne généralisée. JIAF-JFPDA 2024: 114-124
  9. Lucas Jarnac, Miguel Couceiro, Pierre Monnin. Relevant Entity Selection: Knowledge Graph Bootstrapping via Zero-Shot Analogical Pruning. CIKM 2023: 934-944
  10. N. Kumar, and S. Schockaert. Solving hard analogy questions with relation embedding chains. EMNLP 2023, 6224–6236. ACL

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Miguel Couceiro, INESC & Instituto Superior Técnico

Analogical Reasoning: Theory, Applications and further surprises II

Analogical reasoning is a powerful inductive mechanism, widely used in human cognition and increasingly applied in artificial intelligence. Formal frameworks for analogical inference have been developed for Boolean domains, where inference is provably sound for affine functions and approximately correct when close to affine. These results enabled the design of analogy-based classifiers. However, they do not extend to regression tasks or continuous domains.

In this series of seminars we will revisit analogical inference from a foundational perspective. After a brief motivation, we will first present a recently proposed formalism to model numerical analogies that relies on p-generalized means, and that enables a unifying framework that subsume the classical notions of arithmetic, geometric and harmonic analogies. We will derive several interesting properties such as transitivity of conformity, as well as present algorithmic approaches to detect and compute the parameter p.

In the second part of this series, we will leverage this unified formalism and lift analogical reasoning to real-valued domains and various ML&AI downstream tasks. In particular, we will see that it supports analogical inference over continuous functions, and thus both classification and regression tasks. We characterize the class of analogy-preserving functions in this setting and derive both worst-case and average-case error bounds under smoothness assumptions. If time allows, we will also discuss further applications, e.g., on image reconstruction and NLP downstream tasks.

These two seminars are based on several published and recently submitted by Miguel Couceiro and his collaborators, including Francisco Malaca and Francisco Vincente Cunha, respectively, graduate and undergraduate students at the DM@IST.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

João Xavier
João Xavier, ISR & Instituto Superior Técnico

From Monotone Operators and Supermartingales to Distributed Machine Learning I

Distributed machine learning addresses the problem of training a model when the dataset is scattered across spatially distributed agents. The goal is to design algorithms that allow each agent to arrive at the model trained on the whole dataset, but without agents ever disclosing their local data.

This tutorial covers the two main settings in DML, namely, Federated Learning, in which agents communicate with a common server, and Decentralized Learning, in which agents communicate only with a few neighbor agents. For each setting, we illustrate synchronous and asynchronous algorithms.

We start by discussing convex models. Although distributed algorithms can be derived from many perspectives, we show that convex models allow to generate many interesting synchronous algorithms based on the framework of contractive operators. Furthermore, by stochastically activating such operators by blocks, we obtain directly their asynchronous versions. In both kind of algorithms agents interact with their local loss functions via the convex proximity operator.

We then discuss nonconvex models. Here, agents interact with their local loss functions via the gradient. We discuss the standard mini-batch stochastic gradient (SG) and an improved version, the loopless stochastic variance-reduced gradient (L-SVRG).

We end the tutorial by briefly mentioning our recent research on the vertical federated learning setting where the dataset is scattered, not by examples, but by features.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

João Xavier, ISR & Instituto Superior Técnico

From Monotone Operators and Supermartingales to Distributed Machine Learning II

Distributed machine learning addresses the problem of training a model when the dataset is scattered across spatially distributed agents. The goal is to design algorithms that allow each agent to arrive at the model trained on the whole dataset, but without agents ever disclosing their local data.

This tutorial covers the two main settings in DML, namely, Federated Learning, in which agents communicate with a common server, and Decentralized Learning, in which agents communicate only with a few neighbor agents. For each setting, we illustrate synchronous and asynchronous algorithms.

We start by discussing convex models. Although distributed algorithms can be derived from many perspectives, we show that convex models allow to generate many interesting synchronous algorithms based on the framework of contractive operators. Furthermore, by stochastically activating such operators by blocks, we obtain directly their asynchronous versions. In both kind of algorithms agents interact with their local loss functions via the convex proximity operator.

We then discuss nonconvex models. Here, agents interact with their local loss functions via the gradient. We discuss the standard mini-batch stochastic gradient (SG) and an improved version, the loopless stochastic variance-reduced gradient (L-SVRG).

We end the tutorial by briefly mentioning our recent research on the vertical federated learning setting where the dataset is scattered, not by examples, but by features.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Mário Figueiredo
Mário Figueiredo, IT & Instituto Superior Técnico

Fenchel-Young Variational Learning I

This lecture first provides an introduction to classical variational inference (VI), a key technique for approximating complex posterior distributions in Bayesian methods, typically by minimizing the Kullback-Leibler (KL) divergence. We'll discuss its principles and common uses.

Building on this, the lecture introduces Fenchel-Young variational inference (FYVI), a novel generalization that enhances flexibility. FYVI replaces the KL divergence with broader Fenchel-Young (FY) regularizers, with a special focus on those derived from Tsallis entropies. This approach enables learning posterior distributions with significantly smaller, or sparser, support than the prior, offering advantages in model interpretability and performance.

Reference: S. Sklavidis, S. Agrawal, A. Farinhas, A. Martins and M. Figueiredo, Fenchel-Young Variational Learning,
https://arxiv.org/pdf/2502.10295

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Mário Figueiredo, IT & Instituto Superior Técnico

Fenchel-Young Variational Learning II

This lecture first provides an introduction to classical variational inference (VI), a key technique for approximating complex posterior distributions in Bayesian methods, typically by minimizing the Kullback-Leibler (KL) divergence. We'll discuss its principles and common uses.

Building on this, the lecture introduces Fenchel-Young variational inference (FYVI), a novel generalization that enhances flexibility.FYVI replaces the KL divergence with broader Fenchel-Young (FY) regularizers, with a special focus on those derived from Tsallisentropies. This approach enables learning posterior distributions with significantly smaller, or sparser, support than the prior, offering advantages in model interpretability and performance.

S. Sklavidis, S. Agrawal, A. Farinhas, A. Martins and M. Figueiredo, Fenchel-Young Variational Learning,
https://arxiv.org/pdf/2502.10295

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Francisco Vasconcelos
Francisco Vasconcelos, ISR & Instituto Superior Técnico

Invariant and Equivariant Functional Neural Networks I

Traditional neural networks prioritize generalization, but this flexibility often leads to geometrically inconsistent transformations of input data. To account for variations in object pose — such as rotations or translations — models are typically trained on large, augmented datasets. This increases computational cost and complicates learning.

We propose an alternative: neural networks that are inherently invariant or equivariant to geometric transformations by design. Such models would produce consistent outputs regardless of an object’s pose, eliminating the need for data augmentation. This approach can potentially extend to a broad range of transformations beyond just rotation and translation.

To realize this, we use geometric algebra, where operations like the geometric product are naturally equivariant under pseudo-orthogonal transformations, represented by the group SO(4,1). By building neural networks on top of this algebra, we can ensure transformation-aware computation.

Additionally, we address permutation invariance in point clouds. Instead of treating them as unordered sets of vectors, we represent them functionally — as sums of Dirac delta functions — analogous to sampled signals. This avoids point ordering issues entirely and offers a more structured geometric representation.

This leads us to functional neural networks, where the input is a function rather than a vector list, and layers are continuous operators rather than discrete ones like ReLU or linear layers. Constructed within geometric algebra, these networks naturally maintain the desired invariant and equivariant properties.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

Francisco Vasconcelos, ISR & Instituto Superior Técnico

Invariant and Equivariant Functional Neural Networks II

Traditional neural networks prioritize generalization, but this flexibility often leads to geometrically inconsistent transformations of input data. To account for variations in object pose — such as rotations or translations — models are typically trained on large, augmented datasets. This increases computational cost and complicates learning.

We propose an alternative: neural networks that are inherently invariant or equivariant to geometric transformations by design. Such models would produce consistent outputs regardless of an object’s pose, eliminating the need for data augmentation. This approach can potentially extend to a broad range of transformations beyond just rotation and translation.

To realize this, we use geometric algebra, where operations like the geometric product are naturally equivariant under pseudo-orthogonal transformations, represented by the group SO(4,1). By building neural networks on top of this algebra, we can ensure transformation-aware computation.

Additionally, we address permutation invariance in point clouds. Instead of treating them as unordered sets of vectors, we represent them functionally — as sums of Dirac delta functions — analogous to sampled signals. This avoids point ordering issues entirely and offers a more structured geometric representation.

This leads us to functional neural networks, where the input is a function rather than a vector list, and layers are continuous operators rather than discrete ones like ReLU or linear layers. Constructed within geometric algebra, these networks naturally maintain the desired invariant and equivariant properties.

Europe/Lisbon
Room P3.10, Mathematics Building — Online

António Leitão
António Leitão, Scuola Normale Superiore di Pisa

Topological Expressive Power of Neural Networks I

How many different problems can a neural network solve? What makes two machine learning problems different? In this talk, we'll show how Topological Data Analysis (TDA) can be used to partition classification problems into equivalence classes, and how the complexity of decision boundaries can be quantified using persistent homology. Then we will look at a network's learning process from a manifold disentanglement perspective. We'll demonstrate why analyzing decision boundaries from a topological standpoint provides clearer insights than previous approaches. We use the topology of the decision boundaries realized by a neural network as a measure of a neural network's expressive power. We show how such a measure of expressive power depends on the properties of the neural networks' architectures, like depth, width and other related quantities.

References

Zoom: https://tecnico-pt.zoom.us/j/93935874388?pwd=QHxbpTCtH00rY4OUsRaay48CgaglgB.1

Europe/Lisbon
Room P3.10, Mathematics Building — Online

António Leitão, Scuola Normale Superiore di Pisa

Topological Expressive Power of Neural Networks II

How many different problems can a neural network solve? What makes two machine learning problems different? In this talk, we'll show how Topological Data Analysis (TDA) can be used to partition classification problems into equivalence classes, and how the complexity of decision boundaries can be quantified using persistent homology. Then we will look at a network's learning process from a manifold disentanglement perspective. We'll demonstrate why analyzing decision boundaries from a topological standpoint provides clearer insights than previous approaches. We use the topology of the decision boundaries realized by a neural network as a measure of a neural network's expressive power. We show how such a measure of expressive power depends on the properties of the neural networks' architectures, like depth, width and other related quantities.

References

Zoom: https://tecnico-pt.zoom.us/j/93935874388?pwd=QHxbpTCtH00rY4OUsRaay48CgaglgB.1