About Me

I am an assistant professor of computer science at Brown University, where I direct the PALM🌴 research lab, studying computer vision, machine learning, and artificial intelligence. I work part-time as a staff research scientist at Google Research.

Our research focuses on multimodal concept learning and reasoning, temporal dynamics modeling of humans and objects, and transfer learning of representation and skills to embodied agents. I believe multimodal learning is a pathway for computer vision to help language understanding, robotics, and social science.

Previously, I received my Ph.D. from the University of Southern California in 2016, advised by Prof. Ram Nevatia. I completed my bachelor degree in Computer Science at Tsinghua University in 2011.

I have received Brown University's Richard B. Salomon Faculty Research Award and Samsung AIT's Global Research Outreach Award for multimodal concept learning from videos. My research on behavior prediction appeared in the CVPR 2019 best paper finalist. I am serving as area chairs for CVPR, NeurIPS, and ACL conferences. I am also a junior faculty teaching fellow at Brown.

Our lab always welcomes highly motivated student researchers, please find information for prospective students.

Teaching

Group

PhD students

Alumni

  • Ce Zhang (master of 2023 at Brown, now PhD student at UNC CS)
  • Changcheng Fu (master of 2023 at Brown, now PhD student at USC CS)
  • Kunal Handa (class of 2023 at Brown, now master student at Oxford)
  • Jessica Li (class of 2023 at Brown, now software engineer at Headway)
  • Usha Bhalla (class of 2022 at Brown, now PhD student at Harvard CS)
  • Emily Byun (class of 2021 at Brown, now PhD student at CMU MLD)

Mentorship

Services

  • Area Chair, CVPR 2020 to 2024.
  • Area Chair, ICCV 2023.
  • Area Chair, ECCV 2022 and 2024.
  • Area Chair, ACL 2023.
  • Area Chair, NeurIPS 2023.
  • Senior PC, AAAI 2021 and 2022.
  • Area Chair, WACV 2017 and 2018.

Recent Projects

Self-Correcting Self-Consuming Loops Self-Correcting Self-Consuming Loops for Generative Model Training
Nate Gillman, Michael Freeman, Daksh Aggarwal, Chia-Hong Hsu, Calvin Luo, Yonglong Tian, and Chen Sun
preprint
arXiv / Project
Spacewalk-18 Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan*, Zitian Tang*, Zhiqiu Yu, and Chen Sun
preprint
arXiv / Project
Vamos Vamos: Versatile Action Models for Video Understanding
Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, and Chen Sun
preprint
arXiv / Project
AntGPT AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao*, Shijie Wang*, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, and Chen Sun
ICLR 2024
arXiv / Project
Emergence of Abstract State Representations in Embodied Sequence Modeling Emergence of Abstract State Representations in Embodied Sequence Modeling
Tian Yun*, Zilai Zeng*, Kunal Handa, Ashish V Thapliyal, Bo Pang, Ellie Pavlick, and Chen Sun
EMNLP 2023
arXiv / Project
Analyzing Modular Approaches for Visual Question Decomposition Analyzing Modular Approaches for Visual Question Decomposition
Apoorv Khandelwal, Ellie Pavlick, and Chen Sun
EMNLP 2023
arXiv / Code
Goal-Conditioned Predictive Coding for Offline Reinforcement Learning Goal-Conditioned Predictive Coding for Offline Reinforcement Learning
Zilai Zeng, Ce Zhang, Shijie Wang, and Chen Sun
NeurIPS 2023
arXiv / Project / Code
Does Visual Pretraining Help End-to-End Reasoning? Does Visual Pretraining Help End-to-End Reasoning?
Chen Sun, Calvin Luo, Xingyi Zhou, Anurag Arnab, and Cordelia Schmid
NeurIPS 2023
arXiv
Do Vision-Language Pretrained Models Learn Primitive Concepts? Do Vision-Language Pretrained Models Learn Primitive Concepts?
Tian Yun, Usha Bhalla, Ellie Pavlick, and Chen Sun
Transactions on Machine Learning Research (TMLR)
arXiv / Project / Code
How can objects help action recognition? How Can Objects Help Action Recognition?
Xingyi Zhou, Anurag Arnab, Chen Sun, and Cordelia Schmid
CVPR 2023
arXiv / Code
Learning Audio-Video Modalities from Image Captions Learning Audio-Video Modalities from Image Captions
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, and Cordelia Schmid
ECCV 2022
arXiv / Dataset
Attention Bottlenecks for Multimodal Fusion Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun
NeurIPS 2021
arXiv / Research Blog / Project / Code
Episodic Transformer for Vision-and-Language Navigation Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich, Cordelia Schmid, and Chen Sun
ICCV 2021
arXiv / Project / Code
Temporal Dynamics from Cycles Learning Temporal Dynamics from Cycles in Narrated Video
Dave Epstein, Jiajun Wu, Cordelia Schmid, and Chen Sun
ICCV 2021
arXiv / Research Blog / Project
InfoMin What Makes for Good Views for Contrastive Learning?
Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola
NeurIPS 2020
arXiv / Research Blog / Project / Code
VectorNet VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation
Jiyang Gao*, Chen Sun*, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid
CVPR 2020
arXiv / Waymo Blog / VentureBeat
VideoBERT VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid
ICCV 2019
arXiv / Research Blog / VentureBeat