Bio

Hi there! I am a Research Scientist at Google DeepMind. I am doing research on Reinforcement Learning and LLMs and also contributing to large-scale efforts including Bard, Gemini and Gemma.

Prior to that, I did my PhD at Google Brain and Inria Lille (Scool team, ex-SequeL). I worked on Reinforcement Learning, with a focus on credit assignment and interpretability. My advisors were Olivier Pietquin and Philippe Preux. I also collaborated with Matthieu Geist.

My PhD thesis is available for consultation (manuscript, slides).

Prior to my PhD, I got an engineering degree in Computer Science and Applied Mathematics from Télécom Paris, and an M.Sc. in Machine Learning from École Polytechnique. I then worked for two years as a Research Engineer at DreamQuark.

Outside of work, I make generative art (no GANs involved for now!). I also love music, roguelikes, high-intensity interval training and spicy food.

My CV is available online, and all my publications can be consulted here.

Publications

WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé, Nino Vieillard, Léonard Hussenot, Robert Dadashi, Geoffrey Cideron, Olivier Bachem, Johan Ferret
ICML 2024
[ paper ]

RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Harrison Lee, Samrat Phatale, Hassan Mansoor, Thomas Mesnard, Johan Ferret, Kellie Lu, Colton Bishop, Ethan Hall, Victor Carbune, Abhinav Rastogi, Sushant Prakash
ICML 2024
[ paper ]

A Survey of Temporal Credit Assignment in Deep Reinforcement Learning
Eduardo Pignatelli, Johan Ferret, Matthieu Geist, Hado van Hasselt, Olivier Pietquin, Laura Toni
TMLR 2024
[ paper ]

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Johan Ferret*, Paul Roit*, Lior Shani*, Roee Aharoni, Geoffrey Cideron, Robert Dadashi, Matthieu Geist, Sertan Girgin, Léonard Hussenot, Orgad Keller, Nikola Momchev, Sabela Ramos, Piotr Stanczyk, Nino Vieillard, Olivier Bachem, Gal Elidan, Avinatan Hassidim, Olivier Pietquin, Idan Szpektor
ACL 2023
[ paper | code soon! ]

On Actions that Matter: Credit Assignment and Interpretability in Reinforcement Learning
Johan Ferret
PhD thesis
[ manuscript | slides ]

Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act
Johan Ferret*, Alexis Jacq*, Olivier Pietquin, Matthieu Geist
AAMAS 2022
[ paper | code soon! ]

There is no Turning Back: A Self-Supervised Approach to Reversibility-Aware Reinforcement Learning
Johan Ferret*, Nathan Grinsztajn*, Olivier Pietquin, Philippe Preux, Matthieu Geist
NeurIPS 2021
[ paper | blog post | slides | code ]

Self-Imitation Advantage Learning
Johan Ferret, Olivier Pietquin, Matthieu Geist
AAMAS 2021
[ paper | slides | code ]

Adversarially Guided Actor-Critic
Johan Ferret*, Yannis Flet-Berliac*, Olivier Pietquin, Philippe Preux, Matthieu Geist
ICLR 2021
[ paper | slides | video | code ]

Self-Attentional Credit Assignment for Transfer in Reinforcement Learning
Johan Ferret, Raphaël Marinier, Matthieu Geist, Olivier Pietquin
IJCAI 2020
[ paper | slides | video ]

Preprints

WARP: On the Benefits of Weight Averaged Rewarded Policies
Alexandre Ramé, Johan Ferret, Nino Vieillard, Robert Dadashi, Léonard Hussenot, Pierre-Louis Cedoz, Pier Giuseppe Sessa, Sertan Girgin, Arthur Douillard, Olivier Bachem
arxiv preprint
[ paper ]

BOND: Aligning LLMs with Best-of-N Distillation
Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Nino Vieillard, Alexandre Ramé, Bobak Shariari, Sarah Perrin, Abe Friesen, Geoffrey Cideron, Sertan Girgin, Piotr Stanczyk, Andrea Michi, Danila Sinopalnikov, Sabela Ramos, Amélie Héliou, Aliaksei Severyn, Matt Hoffman, Nikola Momchev, Olivier Bachem
arxiv preprint
[ paper ]

Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Avinava Dubey, Alexandre Ramé, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Léonard Hussenot, Olivier Bachem, Edouard Leurent
arxiv preprint
[ paper ]

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Griffin Team, RLHF Team, Gemma Team
Technical report
[ paper ]

Gemma: Open Models Based on Gemini Research and Technology
Gemma Team
Technical report
[ paper ]

Direct Language Model Alignment from Online AI Feedback
Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi Liu, Misha Khalman, Felipe Llinares, Alexandre Ramé, Thomas Mesnard, Yao Zhao, Bilal Piot, Johan Ferret, Mathieu Blondel
arxiv preprint
[ paper ]

Gemini: A Family of Highly Capable Multimodal Models
Gemini Team
Technical report
[ paper ]

Acme: A Research Framework for Distributed Reinforcement Learning
Acme Team
arxiv preprint
[ paper | colab | code ]

Workshops

Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL
Eduardo Pignatelli, Johan Ferret, Davide Paglieri, Samuel Coward, Tim Rocktäschel, Edward Grefenstette, Laura Toni
AutoRL workshop, ICML 2024
[ paper ]

More Efficient Exploration with Symbolic Priors on Action Sequence Equivalence
Nathan Grinsztajn, Toby Johnstone, Johan Ferret, Philippe Preux
Deep Reinforcement Learning workshop, NeurIPS 2022
[ paper ]

Offline Credit Assignment in Deep Reinforcement Learning with Hindsight Discriminator Networks
Johan Ferret, Olivier Pietquin, Matthieu Geist
EWRL 2022
[ paper ]

Credit Assignment as a Proxy for Transfer in Reinforcement Learning
Johan Ferret, Raphaël Marinier, Matthieu Geist, Olivier Pietquin
Learning Transferable Skills workshop, NeurIPS 2019 (oral)
[ paper ]