Recent Work
-
Preprints
-
2024
-
2023
- Tian Yun*, Zilai Zeng*, Kunal Handa, Ashish V Thapliyal, Bo Pang, Ellie Pavlick, and Chen Sun, Emergence of Abstract State Representations in Embodied Sequence Modeling. Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023 [arXiv]
- Apoorv Khandelwal, Ellie Pavlick, and Chen Sun, Analyzing Modular Approaches for Visual Question Decomposition. Conference on Empirical Methods in Natural Language Processing (EMNLP) 2023 [arXiv]
- Zilai Zeng, Ce Zhang, Shijie Wang, and Chen Sun, Goal-Conditioned Predictive Coding for Offline Reinforcement Learning. Conference on Neural Information Processing Systems (NeurIPS) 2023 [arXiv]
- Chen Sun, Calvin Luo, Xingyi Zhou, Anurag Arnab, and Cordelia Schmid, Does Visual Pretraining Help End-to-End Reasoning? Conference on Neural Information Processing Systems (NeurIPS) 2023 [arXiv]
- Ziniu Hu, Ahmet Iscen, Chen Sun, Kai-Wei Chang, Yizhou Sun, David A Ross, Cordelia Schmid, and Alireza Fathi, AVIS: Autonomous Visual Information Seeking with Large Language Models. Conference on Neural Information Processing Systems (NeurIPS) 2023 [arXiv]
- Tian Yun, Usha Bhalla, Ellie Pavlick, and Chen Sun, Do Vision-Language Pretrained Models Learn Primitive Concepts? Transactions on Machine Learning Research (TMLR) [arXiv] [Project]
- Xingyi Zhou, Anurag Arnab, Chen Sun, and Cordelia Schmid, How Can Objects Help Action Recognition? IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2023 [arXiv]
- Ziniu Hu, Ahmet Iscen, Chen Sun, Zirui Wang, Kai-Wei Chang, Yizhou Sun, Cordelia Schmid, David A Ross, and Alireza Fathi, REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2023 [arXiv]
-
2022
- Arsha Nagrani, Paul Hongsuck Seo, Bryan Seybold, Anja Hauth, Santiago Manen, Chen Sun, and Cordelia Schmid, Learning Audio-Video Modalities from Image Captions. European Conference on Computer Vision (ECCV) 2022 [arXiv]
- Medhini Narasimhan, Arsha Nagrani, Chen Sun, Michael Rubinstein, Trevor Darrell, Anna Rohrbach, and Cordelia Schmid, Summarizing Instructional Videos with Task Relevance and Cross-Modal Saliency. European Conference on Computer Vision (ECCV) 2022
- Dylan Ebert, Chen Sun, and Ellie Pavlick, Do Trajectories Encode Verb Meaning? Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL) 2022 [arXiv] [data]
- Valentin Gabeur, Paul Hongsuck Seo, Arsha Nagrani, Chen Sun, Karteek Alahari, and Cordelia Schmid, AVATAR: Unconstrained Audiovisual Speech Recognition. Conference of the International Speech Communication Association (INTERSPEECH) 2022 [arXiv]
- Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, and Cordelia Schmid, Multiview Transformers for Video Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2022 [pdf]
- Valentin Gabeur, Arsha Nagrani, Chen Sun, Karteek Alahari, and Cordelia Schmid, Masking Modalities for Cross-modal Video Retrieval. Winter Conference on Applications of Computer Vision (WACV) 2022 [arXiv]
-
2021
- Arsha Nagrani, Shan Yang, Anurag Arnab, Aren Jansen, Cordelia Schmid, and Chen Sun, Attention Bottlenecks for Multimodal Fusion. Conference on Neural Information Processing Systems (NeurIPS) 2021 [arXiv]
- Tian Yun, Chen Sun, and Ellie Pavlick, Does Vision-and-Language Pretraining Improve Lexical Grounding? Findings of EMNLP 2021 [arXiv]
- Alexander Pashevich, Cordelia Schmid, and Chen Sun, Episodic Transformer for Vision-and-Language Navigation. International Conference on Computer Vision (ICCV) 2021 [arXiv]
- Dave Epstein, Jiajun Wu, Cordelia Schmid, and Chen Sun, Learning Temporal Dynamics from Cycles in Narrated Video. International Conference on Computer Vision (ICCV) 2021 [arXiv]
- Chen Sun, Arsha Nagrani, Yonglong Tian, and Cordelia Schmid, Composable Augmentation Encoding for Video Representation Learning. International Conference on Computer Vision (ICCV) 2021 [arXiv]
- Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, and Cordelia Schmid, ViViT: A Video Vision Transformer. International Conference on Computer Vision (ICCV) 2021 [arXiv]
- Anurag Arnab, Chen Sun, and Cordelia Schmid, Unified Graph Structured Models for Video Understanding. International Conference on Computer Vision (ICCV) 2021 [arXiv]
- Junru Gu, Chen Sun, and Hang Zhao, DenseTNT: End-to-end Trajectory Prediction from Dense Goal Sets. International Conference on Computer Vision (ICCV) 2021 [arXiv]
- Lu Mi, et al., HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021 [pdf]
- Jack Valmadre, Alex Bewley, Jonathan Huang, Chen Sun, Cristian Sminchisescu, and Cordelia Schmid, Local Metrics for Multi-Object Tracking. arXiv:2104.02631 [arXiv]
-
2020
- Hang Zhao, Jiyang Gao, Tian Lan, Chen Sun, Benjamin Sapp, Balakrishnan Varadarajan, Yue Shen, Yi Shen, Yuning Chai, Cordelia Schmid, Congcong Li and Dragomir Anguelov, TNT: Target-driveN Trajectory Prediction. Conference on Robot Learning (CoRL) 2020 [arXiv]
- Yonglong Tian, Chen Sun, Ben Poole, Dilip Krishnan, Cordelia Schmid and Phillip Isola, What makes for good views for contrastive learning. Conference on Neural Information Processing Systems (NeurIPS) 2020 [arXiv] [Project] [Code]
- Anurag Arnab, Chen Sun, Arsha Nagrani and Cordelia Schmid, Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos. European Conference on Computer Vision (ECCV) 2020 [arXiv]
- Valentin Gabeur, Chen Sun, Karteek Alahari and Cordelia Schmid, Multi-modal Transformer for Video Retrieval. European Conference on Computer Vision (ECCV) 2020 [arXiv] [Project][Code]
- Arsha Nagrani, Chen Sun, David Ross, Rahul Sukthankar, Cordelia Schmid and Andrew Zisserman, Speech2Action: Cross-modal Supervision for Action Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020 [arXiv] [Project] [Data]
- Jiyang Gao*, Chen Sun*, Hang Zhao, Yi Shen, Dragomir Anguelov, Congcong Li and Cordelia Schmid, VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020 [arXiv] [Blog] [VentureBeat]
- Jonathan C Stroud, David A Ross, Chen Sun, Jia Deng, and Rahul Sukthankar, D3D: Distilled 3D Networks for Video Action Recognition. Winter Conference on Applications of Computer Vision (WACV) 2020 [arXiv] [Project] [Code] [Checkpoints]
- Jonathan C. Stroud, David A. Ross, Chen Sun, Jia Deng, Rahul Sukthankar and Cordelia Schmid, Learning Video Representations from Textual Web Supervision. arXiv:2007.14937 [arXiv]
-
2019
- Matthias Minderer, Chen Sun, Ruben Villegas, Forrester Cole, Kevin Murphy, and Honglak Lee, Unsupervised Learning of Object Structure and Dynamics from Videos. Conference on Neural Information Processing Systems (NeurIPS) 2019 [arXiv] [Project] [Code]
- Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid, VideoBERT: A Joint Model for Video and Language Representation Learning. International Conference on Computer Vision (ICCV) 2019 [arXiv] [Blog] [VentureBeat]
- Chen Sun, Abhinav Shrivastava, Carl Vondrick, Rahul Sukthankar, Kevin Murphy, and Cordelia Schmid, Relational Action Forecasting. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019 (best paper finalist) [arXiv]
- Nam Vo, Lu Jiang, Chen Sun, Kevin Murphy, Li-Jia Li, Li Fei-Fei, and James Hays, Composing Text and Image for Image Retrieval-An Empirical Odyssey. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019 [arXiv] [Code]
- Chen Sun, Per Karlsson, Jiajun Wu, Joshua B. Tenenbaum, and Kevin Murphy, Stochastic Prediction of Multi-Agent Interactions from Partial Observations. International Conference on Learning Representations (ICLR) 2019 [arXiv] [Videos] [Annotations]
- Zhenjia Xu*, Zhijian Liu*, Chen Sun, Kevin Murphy, William T. Freeman, Joshua B. Tenenbaum, and Jiajun Wu, Unsupervised Discovery of Parts, Structure, and Dynamics. International Conference on Learning Representations (ICLR) 2019 [arXiv] [Project] [Code]
- Chen Sun, Fabien Baradel, Kevin Murphy, and Cordelia Schmid, Contrastive Bidirectional Transformer for Temporal Representation Learning. arXiv:1906.05743 [arXiv]
-
2018
- Chen Sun, Abhinav Shrivastava, Carl Vondrick, Kevin Murphy, Rahul Sukthankar and Cordelia Schmid, Actor-centric Relation Network. European Conference on Computer Vision (ECCV) 2018 [arXiv]
- Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu and Kevin Murphy, Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification. European Conference on Computer Vision (ECCV) 2018 [arXiv] [Code] [Kinetics checkpoint] [HowTo100M checkpoint]
- Chunhui Gu et al., AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018 [arXiv] [Data] [Code]
- Yin Cui, Yang Song, Chen Sun, Andrew Howard and Serge Belongie, Large Scale Fine-Grained Categorization and the Effectiveness of Domain-Specific Transfer Learning. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018 [pdf] [Code]
- Grant van Horn et al., The iNaturalist Species Classification and Detection Dataset. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018 [pdf] [Data] [Detection baseline]
-
2017
- Chen Sun, Abhinav Shrivastava, Saurabh Singh and Abhinav Gupta, Revisiting Unreasonable Effectiveness of Data in Deep Learning Era. International Conference on Computer Vision (ICCV) 2017 [arXiv] [Blog] [WIRED]
- Jonathan Huang et al., Speed/accuracy Trade-offs for Modern Convolutional Object Detectors. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017 [arXiv] [Code]
Earlier Work
- 2017
- Jiyang Gao, Chen Sun, Zhenheng Yang and Ram Nevatia, TALL: Temporal Activity Localization via Language Query. International Conference on Computer Vision (ICCV) 2017 [arXiv] [Code]
- Jiyang Gao*, Zhenheng Yang*, Chen Sun, Kan Chen and Ram Nevatia, TURN TAP: Temporal Unit Regression Network for Temporal Action Proposals. International Conference on Computer Vision (ICCV) 2017 [arXiv] [Code]
- Chuang Gan et al., VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation. International Conference on Computer Vision (ICCV) 2017 [pdf] [Data and code]
- Chuang Gan*, Chen Sun* and Ram Nevatia, DECK: Discovering Event Composition Knowledge from Web Images for Zero-Shot Event Detection and Recounting in Videos. AAAI Conference on Artificial Intelligence (AAAI) 2017 [pdf]
-
2016
- Chen Sun, Manohar Paluri, Ronan Collobert, Ram Nevatia and Lubomir Bourdev, ProNet: Learning to Propose Object-specific Boxes for Cascaded Neural Networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016 [arXiv]
- Chuang Gan, Chen Sun, Lixin Duan and Boqing Gong, Webly-supervised video recognition by mutually voting for relevant web images and web video frames. European Conference on Computer Vision (ECCV) 2016 [pdf]
-
2015
- Chen Sun, Chuang Gan and Ram Nevatia, Automatic Concept Discovery from Parallel Text and Visual Corpora. International Conference on Computer Vision (ICCV) 2015 [arXiv]
- Chen Sun, Sanketh Shetty, Rahul Sukthankar and Ram Nevatia, Temporal Localization of Fine-Grained Actions in Videos by Domain Transfer from Web Images. ACM Multimedia 2015 [arXiv]
- 2014
- Chen Sun, Ram Nevatia, Semantic Aware Video Transcription Using Random Forest Classifiers. European Conference on Computer Vision (ECCV) 2014 [pdf]
- Chen Sun, Ram Nevatia, DISCOVER: Discovering Important Segments for Classification of Video Events and Recounting. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2014 [pdf]
- Chen Sun, Brian Burns, Ram Nevatia, Cees Snoek, Bob Bolles, Greg Myers, Wen Wang, Eric Yeh, ISOMER: Informative Segment Observations for Multimedia Event Recounting. International Conference on Multimedia Retrieval (ICMR) 2014 [pdf]
- Julien van Hout et al., Late Fusion and Calibration for Multimedia Event Detection Using Few Examples. International Conference on on Acoustics, Speech, and Signal Processing (ICASSP) 2014 [pdf]
- 2013
- Chen Sun, Ram Nevatia, ACTIVE: Activity Concept Transitions in Video Event Classification. International Conference on Computer Vision (ICCV) 2013 [pdf]
- Chen Sun, Ram Nevatia, Large-scale Web Video Event Classification by use of Fisher Vectors. Workshop on Applications of Computer Vision (WACV) 2013 [pdf] [Code]