Jean-Baptiste Alayrac
Research Scientist
DeepMind, London
Bio
I am a Staff Research Scientist at DeepMind. I obtained a Ph.D. from the Sierra and Willow groups. I was supervised by Simon Lacoste-Julien, Ivan Laptev and Josef Sivic. Before that, I graduated from Ecole polytechnique and Telecom ParisTech in 2015 and got a Masters Degree in Mathematics, Machine Learning and Computer Vision (MVA). My work focuses on structured learning from video and natural language. More details can be found in my resume.
Invited talks
Leave Those Nets Alone: Advances in Self-Supervised Learning
Tutorials, CVPR 2021.
[Tutorial page]
[Video]
Towards Versatile and Powerful Multimodal networks
The 6th International Challenge on Activity Recognition, CVPR 2021.
Representation Learning from Unlabeled Narrated Videos
Computer Vision and Deep Learning Summit, Machine Can See 2020.
[Summit website]
[Video]
Learning from Narrated Videos
The 3rd Workshop on YouTube-8M Large-Scale Video Understanding, ICCV 2019.
[YT8M workshop]
[slides]
Selected publications
🦩Flamingo: a Visual Language Model for Few-Shot Learning
In arXiv, 2022.
[arXiv]
[blog]
[bibtex]
@article{alayrac2022flamingo, title={Flamingo: a Visual Language Model for Few-Shot Learning}, author = {Alayrac, Jean-Baptiste and Donahue, Jeff and Luc, Pauline and Miech, Antoine and Barr, Iain and Hasson, Yana and Lenc, Karel and Mensch, Arthur and Millican, Katie and Reynolds, Malcolm and Ring, Roman and Rutherford, Eliza and Cabi, Serkan and Han, Tengda and Gong, Zhitao and Samangooei, Sina and Monteiro, Marianne and Menick, Jacob and Borgeaud, Sebastian and Brock, Andrew and Nematzadeh, Aida and Sharifzadeh, Sahand and Binkowski, Mikolaj and Barreira, Ricardo and Vinyals, Oriol and Zisserman, Andrew and Simonyan, Karen}, journal={arXiv preprint arXiv:2204.14198}, year={2022} } }
Perceiver IO: A general architecture for structured inputs & outputs
In Proc. ICLR 2022.
[arXiv]
[GitHub]
[bibtex]
@InProceedings{searnn18, author = {Jaegle, Andrew and Borgeaud, Sebastian and Alayrac, Jean-Baptiste and Doersch, Carl and Ionescu, Catalin and Ding, David and Koppula, Skanda and Zoran, Daniel and Brock, Andrew and Shelhamer, Evan and Hénaff, Olivier and Botvinick, Matthew M. and Zisserman, Andrew and Vinyals, Oriol and Carreira, Joāo}, title = {Perceiver {IO}: A General Architecture for Structured Inputs & Outputs}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2022} }
Thinking Fast and Slow: Efficient text-to-visual retrieval with transformers
In Proc. CVPR 2021.
[arXiv]
[bibtex]
@InProceedings{miech2019end2end, title={{Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers}}, author={Miech, Antoine and Alayrac, Jean-Baptiste and Laptev, Ivan and Sivic, Josef and Zisserman, Andrew}, booktitle={Computer Vision and Pattern Recognition (CVPR)}, year={2021} }
Self-Supervised MultiModal Versatile Networks
In Proc. NeurIPS 2020.
[arXiv]
[TFHub]
[bibtex]
@article{alayrac2020self, title={Self-Supervised MultiModal Versatile Networks}, author={Alayrac, Jean-Baptiste and Recasens, Adri{\`a} and Schneider, Rosalia and Arandjelovi{\'c}, Relja and Ramapuram, Jason and De Fauw, Jeffrey and Smaira, Lucas and Dieleman, Sander and Zisserman, Andrew}, journal={arXiv preprint arXiv:2006.16228}, year={2020} } }
End-to-End Learning of Visual Representations from Uncurated Instructional Videos
In Proc. CVPR 2020. (oral).
[arXiv]
[Talk]
[TF I3D model]
[TF S3D model]
[PyTorch S3D model]
[webpage]
[YouCook2 demo]
[bibtex]
@InProceedings{miech2019end2end, title={{E}nd-to-{E}nd {L}earning of {V}isual {R}epresentations from {U}ncurated {I}nstructional {V}ideos}, author={Miech, Antoine and Alayrac, Jean-Baptiste and Smaira, Lucas and Laptev, Ivan and Sivic, Josef and Zisserman, Andrew}, booktitle={Computer Vision and Pattern Recognition (CVPR)}, year={2019} }
Visual Grounding in Video for Unsupervised Word Translation
In Proc. CVPR 2020.
[arXiv]
[bibtex]
@InProceedings{sigurdsson2020visual, title={Visual Grounding in Video for Unsupervised Word Translation}, author={Sigurdsson, Gunnar A and Alayrac, Jean-Baptiste and Nematzadeh, Aida and Smaira, Lucas and Malinowski, Mateusz and Carreira, Jo{\~a}o and Blunsom, Phil and Zisserman, Andrew}, booktitle={Computer Vision and Pattern Recognition (CVPR)}, year={2020} }
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
In Proc. ICCV 2019.
[arXiv]
[webpage]
[poster]
[GitHub]
[bibtex]
@InProceedings{miech2019howto100m, title={HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips}, author={Miech, Antoine and Zhukov, Dimitri and Alayrac, Jean-Baptiste and Tapaswi, Makarand and Laptev, Ivan and Sivic, Josef}, booktitle={International Conference on Computer Vision (ICCV)}, year={2019} }
Are Labels Required for Improving Adversarial Robustness?
In Proc. NeurIPS 2019.
[arXiv]
[bibtex]
@InProceedings{ueasato19UAT, title={Are Labels Required for Improving Adversarial Robustness?}, author={Jonathan Uesato and Jean-Baptiste Alayrac and Po-Sen Huang and Robert Stanforth and Alhussein Fawzi and Pushmeet Kohli}, booktitle={Neural Information Processing Systems (NeurIPS)}, year={2019} }
The Visual Centrifuge: Model-Free Layered Video Representations
In Proc. CVPR 2019. (oral).
Cross-task weakly supervised learning from instructional videos
In Proc. CVPR 2019.
[arXiv]
[bibtex]
@InProceedings{Zhukov19CrossTask, author = {Dimitri Zhukov and Jean-Baptiste Alayrac and Ramazan Gokberk Cinbis and David Fouhey and Ivan Laptev and Josef Sivic.}, title = {Cross-task weakly supervised learning from instructional videos.}, booktitle = {Computer Vision and Pattern Recognition (CVPR)}, year = {2019} }
Accepted to ICLR 2018.
[paper]
[arXiv]
[webpage]
[GitHub]
[workshop version]
[bibtex]
@InProceedings{searnn18, author = {Leblond, R\'emi and Alayrac, Jean-Baptiste and Osokin, Anton and Lacoste-Julien, Simon}, title = {\textsc{SeaRnn}: Training RNNs with Global-Local Losses}, booktitle = {International Conference on Learning Representations (ICLR)}, year = {2018} }
Joint Discovery of Object States and Manipulation Actions
In Proc. ICCV 2017.
[paper]
[arXiv]
[poster]
[webpage]
[GitHub]
[bibtex]
@InProceedings{alayrac16objectstates, author = {Alayrac, Jean-Baptiste and Sivic, Josef and Laptev, Ivan and Lacoste-Julien, Simon}, title = {Joint Discovery of Object States and Manipulation Actions}, booktitle = {International Conference on Computer Vision (ICCV)}, year = {2017} }
Learning from Video and Text via Large-Scale Discriminative Clustering
In Proc. ICCV 2017 (Spotlight).
[paper]
[arXiv]
[GitHub]
[bibtex]
@InProceedings{miech17learning, author = {Miech, Antoine and Alayrac, Jean-Baptiste and Bojanowski, Piotr and Laptev, Ivan and Sivic, Josef}, title = {Learning from Video and Text via Large-Scale Discriminative Clustering}, booktitle = {International Conference on Computer Vision (ICCV)}, year = {2017} }
Unsupervised learning from narrated instruction videos
In Proc. CVPR 2016 (Oral).
[talk]
[paper]
[poster]
[HAL]
[arXiv]
[bibtex]
[webpage]
[GitHub]
@InProceedings{alayrac16unsupervised, author = {Alayrac, Jean-Baptiste and Bojanowski, Piotr and Agrawal, Nishant and Laptev, Ivan and Sivic, Josef and Lacoste-Julien, Simon}, title = {Unsupervised Learning from Narrated Instruction Videos}, booktitle = {Computer Vision and Pattern Recognition (CVPR)}, year = {2016} }
PhD Thesis
Structured Learning from Videos and Language
[thesis]
Teaching
Statistical machine learning - Master M1 - Ecole normale supérieure, TA, 2015-2016.
[webpage]
1M001 : Analyse et algèbre pour les sciences, Universite Pierre et Marie Curie, TA, 2014-2016.
1M004 : Calcul matriciel, Universite Pierre et Marie Curie, TA, 2014-2016.
2M223 : Algèbre bilinéaire et géométrie, Universite Pierre et Marie Curie, TA, 2014-2015.
Other projects
Simple sketch recognizer used in an exhibition at Palais de la Découverte.
[GitHub]
[Exhibition]