Contributions are most welcome, if you have any suggestions or improvements, feel free to create an issue or raise a pull request.
- Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
- InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
- Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
- MM-RLHF: The Next Step Forward in Multimodal LLM Alignment
- Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding
- video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model