Zihao Yue (岳子豪)PhD Candidate @ AIM3 Lab School of Information Renmin University of China Email: yzihao [at] ruc.edu.cn |
|
|
Partial Vocabulary Learning for Neural Text Generation |
|
Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation |
|
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective |
|
MiMo-VL Technical Report |
|
R1-V: Reinforcing Super Generalization Ability in Vision Language Models |
|
Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding |
|
Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement |
|
VideoOrion: Tokenizing Object Dynamics in Videos |
|
Unified Multimodal Understanding via Byte-Pair Visual Encoding |
|
Movie101: A New Movie Understanding Benchmark |
|
Movie101v2: Improved Movie Narration Benchmark |
|
Exploring Attention Attractors in Large Language Models |
|
Unveiling Visual Biases in Audio-Visual Localization Benchmarks |
|
ChartM3: Benchmarking Chart Editing with Multimodal Instructions |
|
Video to Text Description @ TRECVID 2024, 1st Place
Video to Text Description @ TRECVID 2023, 1st Place
Video to Text Description @ TRECVID 2022, 1st Place |