|
1 |
| -WIP |
| 1 | +# DCVLR: Data Curation for Vision Language Reasoning |
| 2 | + |
| 3 | +[](https://neurips.cc/Conferences/2025) |
| 4 | +[](https://dcvlr.org) |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +<div align="center"> |
| 9 | + |
| 10 | + <h3> |
| 11 | + 🌐 <a href="https://dcvlr-neurips.github.io">Official webpage</a> • |
| 12 | + 🚀 <a href="https://oumi-ai.typeform.com/to/LnYoisi5">Sign up for updates</a> • |
| 13 | + 🎯 <a href="https://oumi-ai.typeform.com/to/OGPuRt6U">Apply for GPU credits (sponsored by Lambda Labs)</a> |
| 14 | + </h3> |
| 15 | +</div> |
| 16 | + |
| 17 | +--- |
| 18 | + |
| 19 | + |
| 20 | +DCVLR is the first open-data, open-models, open-source competition for data curation in vision-language reasoning, hosted at NeurIPS 2025. |
| 21 | + |
| 22 | + |
| 23 | +## 🎯 Challenge |
| 24 | + |
| 25 | +Participants can leverage any source datasets to curate high-quality instruction-tuning datasets (1K or 10K examples). Participants are encouraged to explore diverse curation strategies, from synthetic data generation to subset selection. Submissions will be evaluated by fine-tuning an undisclosed, open-source vision-language model on the curated data and measuring performance across a wide variety of benchmarks. |
| 26 | + |
| 27 | +## 🚀 Quick Start |
| 28 | + |
| 29 | +Get started with training in minutes: |
| 30 | + |
| 31 | +```bash |
| 32 | +# Install oumi |
| 33 | +uv pip install "oumi[gpu]" |
| 34 | + |
| 35 | +# Train with Molmo-7B-O |
| 36 | +oumi train -c molmo-o --dataset dataset.jsonl |
| 37 | + |
| 38 | +# Train with Qwen2.5-VL-7B-Instruct |
| 39 | +oumi train -c qwen2.5-vl-7b-instruct --dataset dataset.jsonl |
| 40 | +``` |
| 41 | + |
| 42 | +## 📅 Key Dates |
| 43 | + |
| 44 | +| Date | Milestone | |
| 45 | +|------|-----------| |
| 46 | +| **June 11, 2025** | Release of Competition Materials | |
| 47 | +| **July 1, 2025** | Submission Portal Opens | |
| 48 | +| **October 1, 2025** | Final Submission Deadline | |
| 49 | +| **November 1, 2025** | Results Announced | |
| 50 | +| **December 2025** | NeurIPS 2025 Presentation | |
| 51 | + |
| 52 | + |
| 53 | +## 📚 Competition Resources |
| 54 | + |
| 55 | +| Resource | Description | Link | |
| 56 | +|----------|-------------|------| |
| 57 | +| 📊 **Starter Kit** | Comprehensive starter kit with example datasets, training scripts, and best practices | [Access Starter Kit](https://huggingface.co/datasets/oumi-ai/dcvlr-starter-kit) | |
| 58 | +| 💻 **Training Scripts** | Starting scripts for fine-tuning multiple vision-language models | [View Scripts](https://github.com/oumi-ai/oumi/tree/main/configs/projects/dcvlr) | |
| 59 | +| 🧪 **Evaluation Code** | Scripts to evaluate model outputs on diverse benchmark development sets | [Get Code](https://github.com/oumi-ai/oumi/tree/main/configs/projects/dcvlr) | |
| 60 | +| ☁️ **Compute Resources** | GPU credits from Lambda Labs for participants | [Apply for Credits](https://oumi-ai.typeform.com/to/OGPuRt6U") | |
| 61 | +| 📚 **Documentation** | Complete guides and tutorials | [View Documentation](https://oumi.ai/docs) | |
| 62 | + |
| 63 | +## 🤝 Sponsors |
| 64 | + |
| 65 | +- **Lambda Labs** - Compute Resources |
| 66 | +- **Oumi.ai** - Competition Support |
| 67 | + |
| 68 | +## 📞 Contact |
| 69 | + |
| 70 | +Have questions? Get in touch with the DCVLR team: |
| 71 | + |
| 72 | +- **Website**: [dcvlr.org](https://dcvlr.org) |
| 73 | +- **Email**: [Contact Form](https://dcvlr.org/contact) |
0 commit comments