About Me
I am an assistant professor of computer science at Brown University, where I direct the PALM🌴 research lab, studying computer vision, machine learning, and artificial intelligence. I work part-time as a staff research scientist at Google DeepMind.
Our research focuses on multimodal concept learning and reasoning, temporal dynamics modeling of humans and objects, and transfer learning of representation and skills to embodied agents. I believe multimodal learning is a pathway for computer vision to help language understanding and robotics.
Previously, I received my Ph.D. from the University of Southern California in 2016, advised by Prof. Ram Nevatia. I completed my bachelor degree in Computer Science at Tsinghua University in 2011.
I have received Brown University's Richard B. Salomon Faculty Research Award and Samsung's Global Research Outreach Award for multimodal concept learning from videos. My research on behavior prediction appeared in the CVPR 2019 best paper finalist. Our lab's research has been supported by Adobe Research, Honda Research, Meta AI, NASA, and Samsung.
My office hours are 3:30 to 5:00 pm ET on Tuesdays at CIT 379.
Teaching
Group
PhD students
Alumni
- Kevin Zhao (master of 2024 at Brown, now researcher at TikTok)
- Zilai Zeng (master of 2024 at Brown, now PhD student at Brown CS)
- Yunhao Luo (master of 2024 at Brown, now research assistant at Georgia Tech)
- Mandy He (class of 2024 at Brown, now software engineer at Duolingo)
- Minh Quan Do (master of 2024 at Brown, now co-founder at Tan Kim Nhat Trading)
- David Heffren (class of 2024 at Brown, now PhD student at JHU Applied Math)
- John Ryan Byers (class of 2024 at Brown, now master student at Cornell Tech)
- Ce Zhang (master of 2023 at Brown, now PhD student at UNC CS)
- Changcheng Fu (master of 2023 at Brown, now PhD student at USC CS)
- Kunal Handa (class of 2023 at Brown, now member of technical staff at Anthropic)
- Jessica Li (class of 2023 at Brown, now software engineer at Headway)
- Tian Yun (master of 2022 at Brown, now PhD student at Brown CS)
- Usha Bhalla (class of 2022 at Brown, now PhD student at Harvard CS)
- Emily Byun (class of 2021 at Brown, now PhD student at CMU MLD)
Mentorship
Services
- Workshop Chair, CVPR 2025.
- Action Editor, TMLR.
- Area Chair, ICLR 2025.
- Area Chair, CVPR 2020 to 2024.
- Area Chair, ICCV 2023 and 2025.
- Area Chair, ECCV 2022 and 2024.
- Area Chair, ACL 2023.
- Area Chair, NeurIPS 2023 and 2024.
- Senior PC, AAAI 2021 and 2022.
- Area Chair, WACV 2017 and 2018.
Recent Projects
|
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
Nate Gillman*, Daksh Aggarwal*, Michael Freeman, Saurabh Singh, and Chen Sun
Preprint
arXiv / Project / Code
|
|
Text-Aware Diffusion for Policy Learning
Calvin Luo, Mandy He*, Zilai Zeng*, and Chen Sun
NeurIPS 2024
(Also appeared at NeurIPS 2023 workshop on Diffusion Models)
arXiv / Poster / Project / Code
|
|
Motion Prompting: Controlling Video Generation with Motion Trajectories
Daniel Geng, Charles Herrmann et al.
Preprint
arXiv / Project
|
|
Self-Correcting Self-Consuming Loops for Generative Model Training
Nate Gillman, Michael Freeman, Daksh Aggarwal, Chia-Hong Hsu, Calvin Luo, Yonglong Tian, and Chen Sun
ICML 2024
arXiv / Poster / Project / Code
|
|
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Lijie Fan*, Tianhong Li, Siyang Qin, Yuanzhen Li, Chen Sun, Michael Rubinstein, Deqing Sun, Kaiming He, and Yonglong Tian*
Preprint
arXiv
|
|
Do Music Generation Models Encode Music Theory?
Megan Wei*, Michael Freeman*, Chris Donahue, and Chen Sun
ISMIR 2024
(Also appeared at BayLearn 2024 as an oral)
arXiv / Project / Dataset
|
|
$100 K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Apoorv Khandelwal, Tian Yun, Nihal V Nayak, Jack Merullo, Stephen H Bach, Chen Sun, and Ellie Pavlick
Preprint
arXiv / Code
|
|
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan*, Zitian Tang*, Zhiqiu Yu, and Chen Sun
preprint
arXiv / Project / Dataset
|
|
Vamos: Versatile Action Models for Video Understanding
Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, and Chen Sun
ECCV 2024
arXiv / Project / Code
|
|
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao*, Shijie Wang*, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, and Chen Sun
ICLR 2024
arXiv / Project
|
|
EPO: Hierarchical LLM Agents with Environment Preference Optimization
Qi Zhao*, Haotian Fu*, Chen Sun, and George Konidaris
EMNLP 2024
arXiv / Code
|
|
Do Pre-trained Vision-Language Models Encode Object States?
Kaleb Newman, Shijie Wang, Yuan Zang, David Heffren, and Chen Sun
ECCV 2024 Workshop on Emergent Visual Abilities and Limits of Foundation Models
arXiv / Data
|
|
Pre-trained Vision-Language Models Learn Discoverable Visual Concepts
Yuan Zang, Tian Yun, Hao Tan, Trung Bui, and Chen Sun
preprint
arXiv / Project / Code
|
|
Emergence of Abstract State Representations in Embodied Sequence Modeling
Tian Yun*, Zilai Zeng*, Kunal Handa, Ashish V Thapliyal, Bo Pang, Ellie Pavlick, and Chen Sun
EMNLP 2023
arXiv / Project
|
|
Analyzing Modular Approaches for Visual Question Decomposition
Apoorv Khandelwal, Ellie Pavlick, and Chen Sun
EMNLP 2023
arXiv / Code
|
|
Goal-Conditioned Predictive Coding for Offline Reinforcement Learning
Zilai Zeng, Ce Zhang, Shijie Wang, and Chen Sun
NeurIPS 2023
arXiv / Project / Code
|
|
Does Visual Pretraining Help End-to-End Reasoning?
Chen Sun, Calvin Luo, Xingyi Zhou, Anurag Arnab, and Cordelia Schmid
NeurIPS 2023
arXiv
|
|
Do Vision-Language Pretrained Models Learn Primitive Concepts?
Tian Yun, Usha Bhalla, Ellie Pavlick, and Chen Sun
Transactions on Machine Learning Research (TMLR)
arXiv / Project / Code
|
|
How Can Objects Help Action Recognition?
Xingyi Zhou, Anurag Arnab, Chen Sun, and Cordelia Schmid
CVPR 2023
arXiv / Code
|
|
Learning Audio-Video Modalities from Image Captions
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, and Cordelia Schmid
ECCV 2022
arXiv / Dataset
|
|
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun
NeurIPS 2021
arXiv / Research Blog / Project / Code
|
|
Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich, Cordelia Schmid, and Chen Sun
ICCV 2021
arXiv / Project / Code
|
|
Learning Temporal Dynamics from Cycles in Narrated Video
Dave Epstein, Jiajun Wu, Cordelia Schmid, and Chen Sun
ICCV 2021
arXiv / Research Blog / Project
|
|
What Makes for Good Views for Contrastive Learning?
Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola
NeurIPS 2020
arXiv / Research Blog / Project / Code
|
|
VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation
Jiyang Gao*, Chen Sun*, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid
CVPR 2020
arXiv / Waymo Blog / VentureBeat
|
|
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid
ICCV 2019
arXiv / Research Blog / VentureBeat
|