Recent Work

Google Scholar

  • Preprints

    • Alexander Pashevich, Cordelia Schmid, and Chen Sun, Episodic Transformer for Vision-and-Language Navigation. arXiv: 2105.06453 [arXiv]
    • Chen Sun, Arsha Nagrani, Yonglong Tian, and Cordelia Schmid, Composable Augmentation Encoding for Video Representation Learning. arXiv: 2104.00616 [arXiv]
    • Dave Epstein, Jiajun Wu, Cordelia Schmid, and Chen Sun, Learning Temporal Dynamics from Cycles in Narrated Video. arXiv:2101.02337 [arXiv]
    • Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, and Cordelia Schmid, ViViT: A Video Vision Transformer. arXiv:2103.15691 [arXiv]
    • Anurag Arnab, Chen Sun, and Cordelia Schmid, Unified Graph Structured Models for Video Understanding. arXiv:2103.15662 [arXiv]
    • Jack Valmadre, Alex Bewley, Jonathan Huang, Chen Sun, Cristian Sminchisescu, and Cordelia Schmid, Local Metrics for Multi-Object Tracking. arXiv:2104.02631 [arXiv]
    • Chen Sun, Fabien Baradel, Kevin Murphy, and Cordelia Schmid, Contrastive Bidirectional Transformer for Temporal Representation Learning. arXiv:1906.05743 [arXiv]
    • Jonathan C. Stroud, David A. Ross, Chen Sun, Jia Deng, Rahul Sukthankar and Cordelia Schmid, Learning Video Representations from Textual Web Supervision. arXiv:2007.14937 [arXiv]
  • 2020

    • Hang Zhao, Jiyang Gao, Tian Lan, Chen Sun, Benjamin Sapp, Balakrishnan Varadarajan, Yue Shen, Yi Shen, Yuning Chai, Cordelia Schmid, Congcong Li and Dragomir Anguelov, TNT: Target-driveN Trajectory Prediction. Conference on Robot Learning (CoRL) 2020 [arXiv]
    • Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid and Phillip Isola, What makes for good views for contrastive learning. Conference on Neural Information Processing Systems (NeurIPS) 2020 [arXiv] [Project] [Code]
    • Anurag Arnab, Chen Sun, Arsha Nagrani and Cordelia Schmid, Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos. European Conference on Computer Vision (ECCV) 2020 [arXiv]
    • Valentin Gabeur, Chen Sun, Karteek Alahari and Cordelia Schmid, Multi-modal Transformer for Video Retrieval. European Conference on Computer Vision (ECCV) 2020 [arXiv] [Project][Code (in progress)]
    • Arsha Nagrani, Chen Sun, David Ross, Rahul Sukthankar, Cordelia Schmid and Andrew Zisserman, Speech2Action: Cross-modal Supervision for Action Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020 [arXiv] [Project] [Data]
    • Jiyang Gao*, Chen Sun*, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li and Cordelia Schmid, VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020 [arXiv] [Blog] [VentureBeat]
    • Jonathan C Stroud, David A Ross, Chen Sun, Jia Deng, and Rahul Sukthankar, D3D: Distilled 3D Networks for Video Action Recognition. Winter Conference on Applications of Computer Vision (WACV) 2020 [arXiv] [Project] [Code] [Checkpoints]
  • 2019

    • Matthias Minderer, Chen Sun, Ruben Villegas, Forrester Cole, Kevin Murphy, and Honglak Lee, Unsupervised Learning of Object Structure and Dynamics from Videos. Conference on Neural Information Processing Systems (NeurIPS) 2019 [arXiv] [Project] [Code]
    • Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid, VideoBERT: A Joint Model for Video and Language Representation Learning. International Conference on Computer Vision (ICCV) 2019 [arXiv] [Blog] [VentureBeat]
    • Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, and Cordelia Schmid, Relational Action Forecasting. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019 (best paper finalist) [arXiv]
    • Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, and James Hays, Composing Text and Image for Image Retrieval-An Empirical Odyssey. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019 [arXiv] [Code]
    • Chen Sun, Per Karlsson, Jiajun Wu, Joshua B. Tenenbaum, and Kevin Murphy, Stochastic Prediction of Multi-Agent Interactions from Partial Observations. International Conference on Learning Representations (ICLR) 2019 [arXiv] [Videos] [Annotations]
    • Zhenjia Xu*, Zhijian Liu*, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, and Jiajun Wu, Unsupervised Discovery of Parts, Structure, and Dynamics. International Conference on Learning Representations (ICLR) 2019 [arXiv] [Project] [Code]
  • 2018

    • Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar and Cordelia Schmid, Actor-centric Relation Network. European Conference on Computer Vision (ECCV) 2018 [arXiv]
    • Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu and Kevin Murphy, Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification. European Conference on Computer Vision (ECCV) 2018 [arXiv] [Code] [Kinetics checkpoint] [HowTo100M checkpoint]
    • Chunhui Gu et al., AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018 [arXiv] [Data] [Code]
    • Yin Cui, Yang Song, Chen Sun, Andrew Howard and Serge Belongie, Large Scale Fine-Grained Categorization and the Effectiveness of Domain-Specific Transfer Learning. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018 [pdf] [Code]
    • Grant van Horn et al., The iNaturalist Species Classification and Detection Dataset. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018 [pdf] [Data] [Detection baseline]
  • 2017

    • Chen Sun, Abhinav Shrivastava, Saurabh Singh and Abhinav Gupta, Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. International Conference on Computer Vision (ICCV) 2017 [arXiv] [Blog] [WIRED]
    • Jonathan Huang et al., Speed/accuracy Trade-offs for Modern Convolutional Object Detectors. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017 [arXiv] [Code]

Earlier Work

  • 2017
    • Jiyang Gao, Chen Sun, Zhenheng Yang and Ram Nevatia, TALL: Temporal Activity Localization via Language Query. International Conference on Computer Vision (ICCV) 2017 [arXiv] [Code]
    • Jiyang Gao*, Zhenheng Yang*, Chen Sun, Kan Chen and Ram Nevatia, TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals. International Conference on Computer Vision (ICCV) 2017 [arXiv] [Code]
    • Chuang Gan et al., VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation. International Conference on Computer Vision (ICCV) 2017 [pdf] [Data and code]
    • Chuang Gan*, Chen Sun* and Ram Nevatia, DECK: Discovering Event Composition Knowledge from Web Images for Zero-Shot Event Detection and Recounting in Videos. AAAI Conference on Artificial Intelligence (AAAI) 2017 [pdf]
  • 2016

    • Chen Sun, Manohar Paluri, Ronan Collobert, Ram Nevatia and Lubomir Bourdev, ProNet: Learning to Propose Object-specific Boxes for Cascaded Neural Networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016 [arXiv]
    • Chuang Gan, Chen Sun, Lixin Duan and Boqing Gong, Webly-supervised video recognition by mutually voting for relevant web images and web video frames. European Conference on Computer Vision (ECCV) 2016 [pdf]
  • 2015

    • Chen Sun, Chuang Gan and Ram Nevatia, Automatic Concept Discovery from Parallel Text and Visual Corpora. International Conference on Computer Vision (ICCV) 2015 [arXiv]
    • Chen Sun, Sanketh Shetty, Rahul Sukthankar and Ram Nevatia, Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images. ACM Multimedia 2015 [arXiv]
  • 2014
    • Chen Sun, Ram Nevatia, Semantic Aware Video Transcription Using Random Forest Classifiers. European Conference on Computer Vision (ECCV) 2014 [pdf]
    • Chen Sun, Ram Nevatia, DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2014 [pdf]
    • Chen Sun, Brian Burns, Ram Nevatia, Cees Snoek, Bob Bolles, Greg Myers, Wen Wang, Eric Yeh, ISOMER: Informative Segment Observations for Multimedia Event Recounting. International Conference on Multimedia Retrieval (ICMR) 2014 [pdf]
    • Julien van Hout et al., Late Fusion and Calibration for Multimedia Event Detection Using Few Examples. International Conference on on Acoustics, Speech, and Signal Processing (ICASSP) 2014 [pdf]
  • 2013
    • Chen Sun, Ram Nevatia, ACTIVE: Activity Concept Transitions in Video Event Classification. International Conference on Computer Vision (ICCV) 2013 [pdf]
    • Chen Sun, Ram Nevatia, Large-scale Web Video Event Classification by use of Fisher Vectors. Workshop on Applications of Computer Vision (WACV) 2013 [pdf] [Code]