Jean-Baptiste Alayrac

Research Scientist

DeepMind, London

Bio

I am a Research Director at DeepMind . I obtained a Ph.D. from the Sierra and Willow groups. I was supervised by Simon Lacoste-Julien, Ivan Laptev and Josef Sivic. Before that, I graduated from Ecole polytechnique and Telecom ParisTech in 2015 and got a Masters Degree in Mathematics, Machine Learning and Computer Vision (MVA). My work focuses on structured learning from video and natural language. More details can be found in my resume.

Invited talks

Leave Those Nets Alone: Advances in Self-Supervised Learning

Tutorials, CVPR 2021.

[Tutorial page] [Video]

Towards Versatile and Powerful Multimodal networks

The 6th International Challenge on Activity Recognition, CVPR 2021.

[ActivityNet workshop] [Video]

Representation Learning from Unlabeled Narrated Videos

Computer Vision and Deep Learning Summit, Machine Can See 2020.

[Summit website] [Video]

Learning from Narrated Videos

The 3rd Workshop on YouTube-8M Large-Scale Video Understanding, ICCV 2019.

[YT8M workshop] [slides]

Selected publications

🦩Flamingo: a Visual Language Model for Few-Shot Learning

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc, Antoine Miech, Iain Barr, Yana Hasson, Karel Lenc, Arthur Mensch, Katie Millican, Malcolm Reynolds, Roman Ring, Eliza Rutherford, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock, Aida Nematzadeh, Sahand Sharifzadeh, Mikolaj Binkowski, Ricardo Barreira, Oriol Vinyals, Andrew Zisserman, and Karen Simonyan

In arXiv, 2022.

[arXiv] [blog] [bibtex]

@article{alayrac2022flamingo,
   title={Flamingo: a Visual Language Model for Few-Shot Learning},
   author = {Alayrac, Jean-Baptiste and Donahue, Jeff and Luc, Pauline and Miech, Antoine and Barr, Iain and Hasson, Yana and Lenc, Karel and Mensch, Arthur and Millican, Katie and Reynolds, Malcolm and Ring, Roman and Rutherford, Eliza and Cabi, Serkan and Han, Tengda and Gong, Zhitao and Samangooei, Sina and Monteiro, Marianne and Menick, Jacob and Borgeaud, Sebastian and Brock, Andrew and Nematzadeh, Aida and Sharifzadeh, Sahand and Binkowski, Mikolaj and Barreira, Ricardo and Vinyals, Oriol and Zisserman, Andrew and Simonyan, Karen},
   journal={arXiv preprint arXiv:2204.14198},
   year={2022}
 }
}

Perceiver IO: A general architecture for structured inputs & outputs

Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, and Joāo Carreira.

In Proc. ICLR 2022.

[arXiv] [GitHub] [bibtex]

@InProceedings{searnn18,
         author  = {Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and Hénaff, Olivier and Botvinick, Matthew M. and Zisserman, Andrew and Vinyals, Oriol and Carreira, Joāo},
         title   = {Perceiver {IO}: A General Architecture for Structured Inputs & Outputs},
         booktitle = {International Conference on Learning Representations (ICLR)},
         year    = {2022}
         }

Thinking Fast and Slow: Efficient text-to-visual retrieval with transformers

Antoine Miech, Jean-Baptiste Alayrac, Ivan Laptev, Josef Sivic and Andrew Zisserman.

In Proc. CVPR 2021.

[arXiv] [bibtex]

@InProceedings{miech2019end2end,
  title={{Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers}},
  author={Miech, Antoine and Alayrac, Jean-Baptiste and Laptev, Ivan and Sivic, Josef and Zisserman, Andrew},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  year={2021}
}

Self-Supervised MultiModal Versatile Networks

Jean-Baptiste Alayrac, Adrià Recasens, Rosalia Schneider, Relja Arandjelović, Jason Ramapuram, Jeffrey De Fauw, Lucas Smaira, Sander Dieleman, Andrew Zisserman.

In Proc. NeurIPS 2020.

[arXiv] [TFHub] [bibtex]

@article{alayrac2020self,
  title={Self-Supervised MultiModal Versatile Networks},
  author={Alayrac, Jean-Baptiste and Recasens, Adri{\`a} and Schneider, Rosalia and Arandjelovi{\'c}, Relja and Ramapuram, Jason and De Fauw, Jeffrey and Smaira, Lucas and Dieleman, Sander and Zisserman, Andrew},
  journal={arXiv preprint arXiv:2006.16228},
  year={2020}
}
}

End-to-End Learning of Visual Representations from Uncurated Instructional Videos

Antoine Miech, Jean-Baptiste Alayrac, Lucas Smaira, Ivan Laptev, Josef Sivic and Andrew Zisserman.

In Proc. CVPR 2020. (oral).

[arXiv] [Talk] [TF I3D model] [TF S3D model] [PyTorch S3D model] [webpage] [YouCook2 demo] [bibtex]

@InProceedings{miech2019end2end,
  title={{E}nd-to-{E}nd {L}earning of {V}isual {R}epresentations from {U}ncurated {I}nstructional {V}ideos},
  author={Miech, Antoine and Alayrac, Jean-Baptiste and Smaira, Lucas and Laptev, Ivan and Sivic, Josef and Zisserman, Andrew},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  year={2019}
}

Visual Grounding in Video for Unsupervised Word Translation

Gunnar Sigurdsson, Jean-Baptiste Alayrac, Aida Nematzadeh, Lucas Smaira, Mateusz Malinowski, Joao Carreira, Phil Blunsom and Andrew Zisserman.

In Proc. CVPR 2020.

[arXiv] [bibtex]

@InProceedings{sigurdsson2020visual,
  title={Visual Grounding in Video for Unsupervised Word Translation},
  author={Sigurdsson, Gunnar A and Alayrac, Jean-Baptiste and Nematzadeh, Aida and Smaira, Lucas and Malinowski, Mateusz and Carreira, Jo{\~a}o and Blunsom, Phil and Zisserman, Andrew},
  booktitle={Computer Vision and Pattern Recognition (CVPR)},
  year={2020}
}

HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips

Antoine Miech, Dimitri Zhukov, Jean-Baptiste Alayrac, Makarand Tapaswi, Ivan Laptev, Josef Sivic.

In Proc. ICCV 2019.

[arXiv] [webpage] [poster] [GitHub] [bibtex]

@InProceedings{miech2019howto100m,
  title={HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips},
  author={Miech, Antoine and Zhukov, Dimitri and Alayrac, Jean-Baptiste and Tapaswi, Makarand and Laptev, Ivan and Sivic, Josef},
  booktitle={International Conference on Computer Vision (ICCV)},
  year={2019}
}

Are Labels Required for Improving Adversarial Robustness?

Jonathan Uesato, Jean-Baptiste Alayrac, Po-Sen Huang, Robert Stanforth, Alhussein Fawzi, Pushmeet Kohli.

In Proc. NeurIPS 2019.

[arXiv] [bibtex]

@InProceedings{ueasato19UAT,
  title={Are Labels Required for Improving Adversarial Robustness?},
  author={Jonathan Uesato and Jean-Baptiste Alayrac and Po-Sen Huang and Robert Stanforth and Alhussein Fawzi and Pushmeet Kohli},
  booktitle={Neural Information Processing Systems (NeurIPS)},
  year={2019}
}

The Visual Centrifuge: Model-Free Layered Video Representations

Jean-Baptiste Alayrac, Joao Carreira, Andrew Zisserman.

In Proc. CVPR 2019. (oral).

[talk] [arXiv] [poster] [bibtex]

@InProceedings{Alayrac19Centrifuge,
  author  = {Jean-Baptiste Alayrac and Joao Carreira and Andrew Zisserman.},
  title   = {The Visual Centrifuge: Model-Free Layered Video Representations.},
  booktitle   = {Computer Vision and Pattern Recognition (CVPR)},
  year        = {2019}
}

Cross-task weakly supervised learning from instructional videos

Dimitri Zhukov, Jean-Baptiste Alayrac, Ramazan Gokberk Cinbis, David Fouhey, Ivan Laptev, Josef Sivic.

In Proc. CVPR 2019.

[arXiv] [bibtex]

@InProceedings{Zhukov19CrossTask,
  author  = {Dimitri Zhukov and Jean-Baptiste Alayrac and Ramazan Gokberk Cinbis and David Fouhey and Ivan Laptev and Josef Sivic.},
  title   = {Cross-task weakly supervised learning from instructional videos.},
  booktitle   = {Computer Vision and Pattern Recognition (CVPR)},
  year        = {2019}
}

SeaRnn: Training RNNs with Global-Local Losses

Rémi Leblond, Jean-Baptiste Alayrac, Anton Osokin and Simon Lacoste-Julien.

Accepted to ICLR 2018.

[paper] [arXiv] [webpage] [GitHub] [workshop version] [bibtex]

@InProceedings{searnn18,
         author  = {Leblond, R\'emi and Alayrac, Jean-Baptiste and Osokin, Anton and Lacoste-Julien, Simon},
         title   = {\textsc{SeaRnn}: Training RNNs with Global-Local Losses},
         booktitle = {International Conference on Learning Representations (ICLR)},
         year    = {2018}
         }

Joint Discovery of Object States and Manipulation Actions

Jean-Baptiste Alayrac, Josef Sivic, Ivan Laptev and Simon Lacoste-Julien.

In Proc. ICCV 2017.

[paper] [arXiv] [poster] [webpage] [GitHub] [bibtex]

@InProceedings{alayrac16objectstates,
      author      = {Alayrac, Jean-Baptiste and Sivic, Josef and Laptev, Ivan and Lacoste-Julien, Simon},
      title       = {Joint Discovery of Object States and Manipulation Actions},
      booktitle   = {International Conference on Computer Vision (ICCV)},
      year        = {2017}
              }

Learning from Video and Text via Large-Scale Discriminative Clustering

Antoine Miech, Jean-Baptiste Alayrac, Piotr Bojanowski, Ivan Laptev and Josef Sivic.

In Proc. ICCV 2017 (Spotlight).

[paper] [arXiv] [GitHub] [bibtex]

@InProceedings{miech17learning,
               author      = {Miech, Antoine and Alayrac, Jean-Baptiste and Bojanowski, Piotr
                              and Laptev, Ivan and Sivic, Josef},
               title       = {Learning from Video and Text via Large-Scale Discriminative Clustering},
               booktitle   = {International Conference on Computer Vision (ICCV)},
               year        = {2017}
              }

Unsupervised learning from narrated instruction videos

Jean-Baptiste Alayrac, Piotr Bojanowski, Nishant Agrawal, Ivan Laptev, Josef Sivic and Simon Lacoste-Julien.

In Proc. CVPR 2016 (Oral).

[talk] [paper] [poster] [HAL] [arXiv] [bibtex] [webpage] [GitHub]

@InProceedings{alayrac16unsupervised,
      author      = {Alayrac, Jean-Baptiste and Bojanowski, Piotr and Agrawal, Nishant
        and Laptev, Ivan and Sivic, Josef and Lacoste-Julien, Simon},
      title       = {Unsupervised Learning from Narrated Instruction Videos},
      booktitle   = {Computer Vision and Pattern Recognition (CVPR)},
      year        = {2016}
              }

PhD Thesis

Structured Learning from Videos and Language

Jean-Baptiste Alayrac

[thesis]

Teaching

Statistical machine learning - Master M1 - Ecole normale supérieure, TA, 2015-2016.

[webpage]

1M001 : Analyse et algèbre pour les sciences, Universite Pierre et Marie Curie, TA, 2014-2016.

1M004 : Calcul matriciel, Universite Pierre et Marie Curie, TA, 2014-2016.

2M223 : Algèbre bilinéaire et géométrie, Universite Pierre et Marie Curie, TA, 2014-2015.

Other projects

Simple sketch recognizer used in an exhibition at Palais de la Découverte.

[GitHub] [Exhibition]