feat(swe): improve dataset generation pipeline with validation and progress monitoring by echobt · Pull Request #13 · CortexLM/swe-forge

echobt · 2026-02-17T15:46:54Z

Summary

Overhauls the SWE synthetic dataset generation pipeline to improve quality, speed, and reliability. Switches the default LLM provider to moonshotai/kimi-k2.5:nitro, adds real-time progress monitoring, introduces test script validation, and cleans up stale test artifacts.

Changes

Pipeline & Performance

Refactor pipeline.rs to streamline dataset generation flow (~250 lines removed)
Add progress.rs module with async progress monitoring and atomic counters for real-time generation tracking
Add filters.rs module for PR/issue filtering logic extracted from pipeline
Add test_generator.rs with validation for generated test scripts (checks for missing files, broken redirects, script coherence)

Model Configuration

Replace openai/gpt-5.2-codex:nitro with moonshotai/kimi-k2.5:nitro as default model across OpenRouter provider and CLI defaults

Prompt & Harness Improvements

Enhance prompt_rewriter.rs to strip PR numbers and project identifiers from generated prompts
Update harness.rs with improved orchestration hooks
Add orchestrator-level validation entry point in orchestrator.rs

Test Artifacts & Cleanup

Remove stale agent-tests/ directory with outdated benchmark results
Remove baseagent-echo artifact
Update test-run workspace configurations and fix test script paths
Replace Python-based test suites with JavaScript equivalents where appropriate

Documentation

Update benchmark_validation_report.md and validation_summary.json with latest results
Update AGENTS.md files for cli and llm modules

…imi-k2.5:nitro

Overhaul the synthetic dataset generator to produce higher-quality benchmarks faster and with better validation: Quality improvements: - Add min_description_length filter (30 chars) to reject PRs with empty or very short descriptions, preventing blank/useless benchmark tasks - Strip repository names, PR numbers, and GitHub URLs from generated prompts via post-processing in prompt_rewriter to avoid leaking project identity into benchmark tasks - Add test script validation (validate_test_scripts) that checks shell scripts have shebang lines, are non-empty, and that referenced test files actually exist in the submission set — with retry loop support Observability: - Add new progress module with ProgressCounters (shared atomics) and ProgressMonitor (background tokio task) that logs pipeline stats (filtered/extracted/scored/accepted) every 30s with ETA percentage - Wire progress monitor into SweOrchestrator::run lifecycle Build config: - Simplify .cargo/config.toml linker to use cc instead of clang+mold for broader compatibility

…ity-speed-validation

…ogress monitoring (#13) * refactor(model): replace openai/gpt-5.2-codex:nitro with moonshotai/kimi-k2.5:nitro * feat(swe): improve dataset quality, speed, and validation pipeline Overhaul the synthetic dataset generator to produce higher-quality benchmarks faster and with better validation: Quality improvements: - Add min_description_length filter (30 chars) to reject PRs with empty or very short descriptions, preventing blank/useless benchmark tasks - Strip repository names, PR numbers, and GitHub URLs from generated prompts via post-processing in prompt_rewriter to avoid leaking project identity into benchmark tasks - Add test script validation (validate_test_scripts) that checks shell scripts have shebang lines, are non-empty, and that referenced test files actually exist in the submission set — with retry loop support Observability: - Add new progress module with ProgressCounters (shared atomics) and ProgressMonitor (background tokio task) that logs pipeline stats (filtered/extracted/scored/accepted) every 30s with ETA percentage - Wire progress monitor into SweOrchestrator::run lifecycle Build config: - Simplify .cargo/config.toml linker to use cc instead of clang+mold for broader compatibility

echobt added 3 commits February 17, 2026 15:38

refactor(model): replace openai/gpt-5.2-codex:nitro with moonshotai/k…

2684371

…imi-k2.5:nitro

Merge remote-tracking branch 'origin/main' into feat/swe-dataset-qual…

7d60c99

…ity-speed-validation

echobt merged commit 990a765 into main Feb 17, 2026
9 checks passed

echobt deleted the feat/swe-dataset-quality-speed-validation branch February 17, 2026 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(swe): improve dataset generation pipeline with validation and progress monitoring#13

feat(swe): improve dataset generation pipeline with validation and progress monitoring#13
echobt merged 3 commits intomainfrom
feat/swe-dataset-quality-speed-validation

echobt commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

echobt commented Feb 17, 2026

Summary

Changes

Pipeline & Performance

Model Configuration

Prompt & Harness Improvements

Test Artifacts & Cleanup

Documentation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant