About Me

I am an assistant professor of computer science at Brown University, where I direct the PALM🌴 research lab, studying computer vision, machine learning, and artificial intelligence. I am also a staff research scientist at Google Research.

Previously, I received my Ph.D. from the University of Southern California in 2016, advised by Prof. Ram Nevatia. I completed my bachelor degree in Computer Science at Tsinghua University in 2011. I did research internships at Google and Facebook.

My ongoing research projects involve learning multimodal representation and visual commonsense from unlabeled videos, to recognize human activities, objects, and their interactions over time, and to transfer the representation to embodied agents. I believe multimodal learning is a pathway for computer vision to help language understanding, robotics, and cognitive science.

Our lab always welcomes highly motivated student researchers, please find information for prospective students.

Teaching

Group

PhD students

Alumni

  • Usha Bhalla (class of 2022 at Brown, now PhD student at Harvard CS)
  • Emily Byun (class of 2021 at Brown, now PhD student at CMU RI)
  • Jake Sokol (class of 2021 at Brown, now at a startup)
  • Michael Mao (class of 2021 at Brown, now software engineer at Microsoft)
  • Trang Dang (ExploreCSR 2021, undergrad at NJIT)
  • Girish Ganesan (ExploreCSR 2021, undergrad at Rutgers)

Mentorship

Recent Projects

Do Vision-Language Pretrained Models Learn Primitive Concepts? Do Vision-Language Pretrained Models Learn Primitive Concepts?
Tian Yun, Usha Bhalla, Ellie Pavlick, and Chen Sun
Preprint
arXiv
Learning Audio-Video Modalities from Image Captions Learning Audio-Video Modalities from Image Captions
Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, and Cordelia Schmid
ECCV 2022
arXiv
Attention Bottlenecks for Multimodal Fusion Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun
NeurIPS 2021
arXiv / Research Blog / Project / Code
Episodic Transformer for Vision-and-Language Navigation Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich, Cordelia Schmid, and Chen Sun
ICCV 2021
arXiv / Project / Code
Temporal Dynamics from Cycles Learning Temporal Dynamics from Cycles in Narrated Video
Dave Epstein, Jiajun Wu, Cordelia Schmid, and Chen Sun
ICCV 2021
arXiv / Research Blog / Project
Composable Augmentation Encoding for Video Representation Learning Composable Augmentation Encoding for Video Representation Learning
Chen Sun, Arsha Nagrani, Yonglong Tian, and Cordelia Schmid
ICCV 2021
arXiv / Project / Code
InfoMin What Makes for Good Views for Contrastive Learning?
Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola
NeurIPS 2020
arXiv / Research Blog / Project / Code
VectorNet VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation
Jiyang Gao*, Chen Sun*, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid
CVPR 2020
arXiv / Waymo Blog / VentureBeat
VideoBERT VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid
ICCV 2019
arXiv / Research Blog / VentureBeat

Services

  • Area Chair, CVPR 2020, 2021, and 2022.
  • Area Chair, ECCV 2022.
  • Senior PC, AAAI 2021, and 2022.
  • Area Chair, WACV 2017, and 2018.