About Me

I am an assistant professor of computer science at Brown University, studying computer vision, machine learning, and artificial intelligence. I am also a staff research scientist at Google Research.

Previously, I received my Ph.D. from the University of Southern California in 2016, advised by Prof. Ram Nevatia. I completed my bachelor degree in Computer Science at Tsinghua University in 2011. I did research internships at Google and Facebook.

My ongoing research projects involve learning multimodal representation and visual commonsense from unlabeled videos, to recognize human activities, objects, and their interactions over time, and to transfer the representation to embodied agents. I believe multimodal learning is a pathway for computer vision to help language understanding, robotics, and cognitive science.

I am looking for highly motivated students to join my lab at Brown, please find information for prospective students. Together with my amazing colleagues, we are organizing the exploreCSR program.

Teaching

Services

  • Area Chair, CVPR 2020, 2021, and 2022.
  • Senior PC, AAAI 2021, and 2022.
  • Area Chair, WACV 2017, and 2018.

Group

PhD students

Student researchers

  • Usha Bhalla
  • Tian Yun (Joint with Ellie Pavlick)
  • Jake Sokol (class of 2021 at Brown, now at a startup)
  • Emily Byun (class of 2021 at Brown, now PhD student at CMU)
  • Michael Mao (class of 2021 at Brown, now software engineer at Microsoft)

Recent Projects

Attention Bottlenecks for Multimodal Fusion Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun
NeurIPS 2021
arXiv / Project
Episodic Transformer for Vision-and-Language Navigation Episodic Transformer for Vision-and-Language Navigation
Alexander Pashevich, Cordelia Schmid, and Chen Sun
ICCV 2021
arXiv / Project / Code
Temporal Dynamics from Cycles Learning Temporal Dynamics from Cycles in Narrated Video
Dave Epstein, Jiajun Wu, Cordelia Schmid, and Chen Sun
ICCV 2021
arXiv / Research Blog / Project
Composable Augmentation Encoding for Video Representation Learning Composable Augmentation Encoding for Video Representation Learning
Chen Sun, Arsha Nagrani, Yonglong Tian, and Cordelia Schmid
ICCV 2021
arXiv / Project / Code
InfoMin What Makes for Good Views for Contrastive Learning?
Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid, and Phillip Isola
NeurIPS 2020
arXiv / Research Blog / Project / Code
VideoBERT VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid
ICCV 2019
arXiv / Research Blog / VentureBeat
VectorNet VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation
Jiyang Gao*, Chen Sun*, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li, and Cordelia Schmid
CVPR 2020
arXiv / Waymo Blog / VentureBeat