| Zihao Yue (岳子豪)PhD Candidate @ AIM3 Lab School of Information Renmin University of China Email: yzihao [at] ruc.edu.cn |   | 
| 
            Partial Vocabulary Learning for Neural Text Generation | 
| 
            Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation | 
| 
            Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective  | 
| 
            MiMo-VL Technical Report | 
| 
            R1-V: Reinforcing Super Generalization Ability in Vision Language Models | 
| 
            Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding | 
| 
            Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement | 
| 
            VideoOrion: Tokenizing Object Dynamics in Videos | 
| 
            Unified Multimodal Understanding via Byte-Pair Visual Encoding | 
| 
            Movie101: A New Movie Understanding Benchmark | 
| 
            Movie101v2: Improved Movie Narration Benchmark | 
| 
            Exploring Attention Attractors in Large Language Models | 
| 
            Unveiling Visual Biases in Audio-Visual Localization Benchmarks | 
| 
            ChartM3: Benchmarking Chart Editing with Multimodal Instructions | 
| 
            Video to Text Description @ TRECVID 2024, 1st Place  
            Video to Text Description @ TRECVID 2023, 1st Place  
            Video to Text Description @ TRECVID 2022, 1st Place  |