Home

Inspiring the development of artificial intelligence for the benefit of all 

A professor talks to his students in a café/lounge.

Located in the heart of Quebec’s AI ecosystem, Mila is a community of more than 1,200 researchers specializing in machine learning and dedicated to scientific excellence and innovation.

About

Featured

Faculty 

Founded in 1993 by Professor Yoshua Bengio, Mila today brings together over 140 professors affiliated with Université de Montréal, McGill University, Polytechnique Montréal and HEC Montréal. Mila also welcomes professors from Université Laval, Université de Sherbrooke, École de technologie supérieure (ÉTS) and Concordia University. 

Browse the online directory

Photo of Yoshua Bengio

Latest Publications

The BrowserGym Ecosystem for Web Agent Research
Thibault Le Sellier de Chezelles
Alexandre Lacoste
Massimo Caccia
Léo Boisvert
Megh Thakkar
Tom Marty
Rim Assouel
Sahar Omidi Shayegan
Lawrence Keunho Jang
Xing Han Lu
Ori Yoran
Dehan Kong
Frank F. Xu
Graham Neubig
Russ Salakhutdinov
The BrowserGym ecosystem addresses the growing need for efficient evaluation and benchmarking of web agents, particularly those leveraging a… (see more)utomation and Large Language Models (LLMs) for web interaction tasks. Many existing benchmarks suffer from fragmentation and inconsistent evaluation methodologies, making it challenging to achieve reliable comparisons and reproducible results. BrowserGym aims to solve this by providing a unified, gym-like environment with well-defined observation and action spaces, facilitating standardized evaluation across diverse benchmarks. Combined with AgentLab, a complementary framework that aids in agent creation, testing, and analysis, BrowserGym offers flexibility for integrating new benchmarks while ensuring consistent evaluation and comprehensive experiment management. This standardized approach seeks to reduce the time and complexity of developing web agents, supporting more reliable comparisons and facilitating in-depth analysis of agent behaviors, and could result in more adaptable, capable agents, ultimately accelerating innovation in LLM-driven automation. As a supporting evidence, we conduct the first large-scale, multi-benchmark web agent experiment and compare the performance of 6 state-of-the-art LLMs across all benchmarks currently available in BrowserGym. Among other findings, our results highlight a large discrepancy between OpenAI and Anthropic's latests models, with Claude-3.5-Sonnet leading the way on almost all benchmarks, except on vision-related tasks where GPT-4o is superior. Despite these advancements, our results emphasize that building robust and efficient web agents remains a significant challenge, due to the inherent complexity of real-world web environments and the limitations of current models.
NNetNav: Unsupervised Learning of Browser Agents Through Environment Interaction in the Wild
Shikhar Murty
Hao Zhu
Christopher D Manning
We introduce NNetNav, a method for unsupervised interaction with websites that generates synthetic demonstrations for training browser agent… (see more)s. Given any website, NNetNav produces these demonstrations by retroactively labeling action sequences from an exploration policy. Most work on training browser agents has relied on expensive human supervision, and the limited prior work on such interaction-based techniques has failed to provide effective search through the exponentially large space of exploration. In contrast, NNetNav exploits the hierarchical structure of language instructions to make this search more tractable: Complex instructions are typically decomposable into simpler sub-tasks, allowing NNetNav to automatically prune interaction episodes when an intermediate trajectory cannot be annotated with a meaningful sub-task. \texttt{LLama-3.1-8b} finetuned on 10k NNetNav self-generated demonstrations obtains over 16\% success rate on WebArena, and 35\% on WebVoyager, an improvement of 15pts and 31pts respectively over zero-shot \texttt{LLama-3.1-8b}, outperforming zero-shot GPT-4 and reaching the state-of-the-art among unsupervised methods, for both benchmarks.
Towards Graph Foundation Models: A Study on the Generalization of Positional and Structural Encodings
Billy Joe Franks
Moshe Eliasof
Semih Cantürk
Carola-Bibiane Schönlieb
Sophie Fellenz
Marius Kloft
Recent advances in integrating positional and structural encodings (PSEs) into graph neural networks (GNNs) have significantly enhanced thei… (see more)r performance across various graph learning tasks. However, the general applicability of these encodings and their potential to serve as foundational representations for graphs remain uncertain. This paper investigates the fine-tuning efficiency, scalability with sample size, and generalization capability of learnable PSEs across diverse graph datasets. Specifically, we evaluate their potential as universal pre-trained models that can be easily adapted to new tasks with minimal fine-tuning and limited data. Furthermore, we assess the expressivity of the learned representations, particularly, when used to augment downstream GNNs. We demonstrate through extensive benchmarking and empirical analysis that PSEs generally enhance downstream models. However, some datasets may require specific PSE-augmentations to achieve optimal performance. Nevertheless, our findings highlight their significant potential to become integral components of future graph foundation models. We provide new insights into the strengths and limitations of PSEs, contributing to the broader discourse on foundation models in graph learning.
A Joint Space-Time Encoder for Geographic Time-Series Data
David Mickisch
Konstantin Klemmer
Mélisande Teng
Many real-world processes are characterized by complex spatio-temporal dependencies, from climate dynamics to disease spread. Here, we intro… (see more)duce a new neural network architecture to model such dynamics at scale: the \emph{Space-Time Encoder}. Building on recent advances in \emph{location encoders}, models that take as inputs geographic coordinates, we develop a method that takes in geographic and temporal information simultaneously and learns smooth, continuous functions in both space and time. The inputs are first transformed using positional encoding functions and then fed into neural networks that allow the learning of complex functions. We implement a prototype of the \emph{Space-Time Encoder}, discuss the design choices of the novel temporal encoding, and demonstrate its utility in climate model emulation. We discuss the potential of the method across use cases, as well as promising avenues for further methodological innovation.

AI for Humanity

Socially responsible and beneficial development of AI is a fundamental component of Mila’s mission. As a leader in the field, we wish to contribute to social dialogue and the development of applications that will benefit society.

Learn more

A person looks up at a starry sky.