Zihao Yue (岳子豪)

PhD Student @ AIM3 Lab
School of Information
Renmin University of China

Email: yzihao [at] ruc.edu.cn



Bio


I am currently a third year PhD student at Renmin University of China (RUC), advised by Prof. Qin Jin. I received my B.E. degree in Computer Science from University of Electronic Science and Technology of China (UESTC) in 2022. My research interests include language modeling and video understanding.



Research


Partial Vocabulary Learning

Partial Vocabulary Learning for Neural Text Generation
Zihao Yue
Invited Talk on CCAI (中国人工智能大会) 2025
[Slides]

Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation
Zihao Yue, Anwen Hu, Liang Zhang, Qin Jin
NeurIPS 2023
[Paper] [Github] [Demo]

Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective
Zihao Yue, Liang Zhang, Qin Jin
ACL 2024
[Paper] [Github]

Multimodal Large Language Models

MiMo-VL Technical Report
Zihao Yue, Zhenru Lin, Yifan Song, Weikun Wang, Shuhuai Ren, Shuhao Gu, Shicheng Li, Peidian Li, Liang Zhao, Lei Li, et al
2025
[Report] [Github] [Huggingface]

R1-V: Reinforcing Super Generalization Ability in Vision Language Models
Liang Chen, Lei Li, Haozhe Zhao, Yifan Song, Vinci, Zihao Yue
2025
[Report] [Github]

Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding
Ye Wang, Ziheng Wang, Boshen Xu, Yang Du, Kejun Lin, Zihan Xiao, Zihao Yue, ..., Qin Jin
2025
[Paper] [Github]

Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement
Yuqi Liu, Bohao Peng, Zhisheng Zhong, Zihao Yue, Fanbin Lu, Bei Yu, Jiaya Jia
2025
[Paper] [Github]

VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng, Yijiang Li, Wanpeng Zhang, Hao Luo, Zihao Yue, Sipeng Zheng, Zongqing Lu
ICCV 2025
[Paper]

Unified Multimodal Understanding via Byte-Pair Visual Encoding
Wanpeng Zhang, Yicheng Feng, Hao Luo, Yijiang Li, Zihao Yue, Sipeng Zheng, Zongqing Lu
ICCV 2025
[Paper]

Movie Understanding

Movie101: A New Movie Understanding Benchmark
Zihao Yue, Qi Zhang, Anwen Hu, Liang Zhang, Ziheng Wang, Qin Jin
ACL 2023
[Paper] [Website] [Github] [Huggingface]

Movie101v2: Improved Movie Narration Benchmark
Zihao Yue, Yepeng Zhang, Ziheng Wang, Qin Jin
ACL 2025
[Paper] [Website] [Github] [Huggingface]

Other Papers (with Junior Collaborators)

Exploring Attention Attractors in Large Language Models
Ziheng Wang, Zihao Yue, Qin Jin
ACL 2025 Findings

Unveiling Visual Biases in Audio-Visual Localization Benchmarks
Liangyu Chen, Zihao Yue, Boshen Xu, Qin Jin
ECCV 2024 Workshop
[Paper]

ChartM3: Benchmarking Chart Editing with Multimodal Instructions
Donglu Yang, Liang Zhang, Zihao Yue, Liangyu Chen, Yichen Xu, Wenxuan Wang, Qin Jin
ACM Multimedia 2025
[Paper]

Competitions

Video to Text Description @ TRECVID 2024, 1st Place
2024

Video to Text Description @ TRECVID 2023, 1st Place
2023

Video to Text Description @ TRECVID 2022, 1st Place
2022