Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
143 commits
Select commit Hold shift + click to select a range
56414ac
Initial vLLM test piece done
BabyChouSr Oct 20, 2025
3c56d97
vLLM load weight
BabyChouSr Oct 20, 2025
d30183b
Initial rollout worker done
BabyChouSr Oct 21, 2025
e260780
Add code
BabyChouSr Oct 21, 2025
1bc975e
Add new
BabyChouSr Oct 22, 2025
d8cd5a0
Add whole curriculum
BabyChouSr Oct 23, 2025
53e92b6
Separate parsing logic and allow passing in sampling params
BabyChouSr Oct 27, 2025
cf4de94
Add fix to load the fetch params weights to CPU
BabyChouSr Oct 28, 2025
cb217bb
Address comments so far
BabyChouSr Oct 28, 2025
6bf98dc
maybe_edit probably not needed
BabyChouSr Oct 28, 2025
3f34704
Clean up a bit more and add integration
BabyChouSr Oct 28, 2025
5a6b2a6
Remove extra run
BabyChouSr Oct 28, 2025
5714633
Add config
BabyChouSr Oct 28, 2025
d40410e
Merge
BabyChouSr Oct 28, 2025
8c9d397
Revert some cluster config changes
BabyChouSr Oct 28, 2025
16986b2
Revert dockerfile changes and use vLLM dockerfile
BabyChouSr Oct 28, 2025
ef5d2b8
Delete unused files
BabyChouSr Oct 28, 2025
a316bb3
Remove subdir
BabyChouSr Oct 28, 2025
c81dd6c
Remove some files
BabyChouSr Oct 28, 2025
addc014
Remove extra print
BabyChouSr Oct 28, 2025
e20ea55
Add Rollout worker abstraction fixes
BabyChouSr Oct 28, 2025
2b4411d
REmove commneted out code
BabyChouSr Oct 28, 2025
00cbaf8
Some fixes
BabyChouSr Oct 28, 2025
13deae3
Add changes
BabyChouSr Oct 29, 2025
6c008f0
Add changes
BabyChouSr Oct 29, 2025
e491606
Merge
BabyChouSr Oct 29, 2025
6798839
Address comments
BabyChouSr Oct 30, 2025
9592aec
Revert pyproject
BabyChouSr Oct 30, 2025
5b1881f
Add back the original uv lock
BabyChouSr Oct 30, 2025
d1db1c4
Guard vllm import
BabyChouSr Oct 30, 2025
29955f5
Add some unittest fixes
BabyChouSr Oct 30, 2025
e7e4fed
Fun chnages
BabyChouSr Oct 30, 2025
2b1a5aa
Save
BabyChouSr Nov 4, 2025
6a12f65
Experiments in RL
BabyChouSr Nov 4, 2025
64439df
Add change
BabyChouSr Nov 6, 2025
966bf7b
add
BabyChouSr Nov 8, 2025
1baeaa8
Save
BabyChouSr Nov 14, 2025
88fbab5
Add
BabyChouSr Nov 15, 2025
6eb4e4b
Add overlong filterin
BabyChouSr Nov 17, 2025
18ae618
Save
BabyChouSr Nov 18, 2025
19dac6b
Save
BabyChouSr Nov 18, 2025
b39cf51
Save progress
BabyChouSr Nov 19, 2025
59fee2b
Save
BabyChouSr Nov 19, 2025
94997cd
Save
BabyChouSr Nov 20, 2025
cce1bad
Save work so far
BabyChouSr Nov 21, 2025
b84dcee
Add async LLM
BabyChouSr Nov 28, 2025
25b7031
Save
BabyChouSr Nov 28, 2025
123a105
Merge
BabyChouSr Nov 28, 2025
059fe75
Fix some linting
BabyChouSr Nov 28, 2025
56a26b4
Fix all lint
BabyChouSr Nov 28, 2025
32ef577
Remove some tests
BabyChouSr Nov 28, 2025
ffd708c
Fix
BabyChouSr Nov 28, 2025
885f0f6
Fix some more
BabyChouSr Nov 28, 2025
70b4954
Fix some more
BabyChouSr Nov 28, 2025
2bc27f7
Fix some more
BabyChouSr Nov 28, 2025
7eba87c
Fix some more
BabyChouSr Nov 28, 2025
ef2bc81
Fix some more
BabyChouSr Nov 28, 2025
b2a1b7d
Fix some more
BabyChouSr Nov 29, 2025
ea4db13
Fix some more
BabyChouSr Nov 29, 2025
ca8321c
Fix some more
BabyChouSr Nov 29, 2025
02bc4a0
Fix some more
BabyChouSr Nov 29, 2025
4841367
push everything
BabyChouSr Dec 1, 2025
12734a0
Fix state dict weight update interface
BabyChouSr Dec 3, 2025
c4aacd6
Lint
BabyChouSr Dec 3, 2025
fc78081
Delete some unused files
BabyChouSr Dec 6, 2025
932600b
Rename some metrics
BabyChouSr Dec 6, 2025
d0bf051
Merge remote-tracking branch 'origin/main' into chris/exp-rl
AlienKevin Dec 7, 2025
8735879
Small fixes so it runs
AlienKevin Dec 8, 2025
19c4a4e
Merge branch 'chris/exp-rl-tinker' into chris/exp-rl
AlienKevin Dec 8, 2025
0f176e8
Merge remote-tracking branch 'origin/main' into chris/exp-rl
AlienKevin Dec 8, 2025
6d954bd
Added us-east5-a-vllm
AlienKevin Dec 8, 2025
c0fdaa3
Update Dockerfile.vllm to include lib/
AlienKevin Dec 9, 2025
8d2b0cf
Update vllm docker tag on marin-us-east5-a-vllm and fix gcloud paths
AlienKevin Dec 9, 2025
cc288cf
Reduced min_workers to 1 and bumped min v5p-8 to 2 for us-east5-a-vllm
AlienKevin Dec 9, 2025
87efc4f
Added us-central1-vllm
AlienKevin Dec 9, 2025
c6d003b
Speeds up testing by skipping vllm precompile
AlienKevin Dec 9, 2025
49c45b9
Increase max_input_tokens to 4096 to prevent prompt overflow
AlienKevin Dec 9, 2025
77a0fc9
Handle integer prng_key coming from python random number generator in…
AlienKevin Dec 9, 2025
237fb4f
Reformat to satisfy linter
AlienKevin Dec 9, 2025
6e36fdd
Aligned MATH prompts with Tinker
AlienKevin Dec 11, 2025
889d685
Run full eval every step
AlienKevin Dec 11, 2025
a1dbf0c
Log up to 1000 samples and show special tokens in sample_table
AlienKevin Dec 11, 2025
8b1b6ee
Removed extra initial eval (should already be done in while loop that…
AlienKevin Dec 11, 2025
bad025e
Make training worker wait for first rollouts from the inference worke…
AlienKevin Dec 11, 2025
8f9c23f
Use same temperature for sampling and eval
AlienKevin Dec 11, 2025
4ab54ce
Suport Qwen 3 in async RL
AlienKevin Dec 12, 2025
b564697
Test vLLM inference in isolation
AlienKevin Dec 13, 2025
44704f9
Pass vLLM env vars properly through env dict to ray job
AlienKevin Dec 13, 2025
766c454
Append a timestamp suffix to experiment id to prevent loading old che…
AlienKevin Dec 14, 2025
b94b543
Switch back to Llama 3.1 8B Instruct for RL experiment
AlienKevin Dec 14, 2025
924043f
Log policy_entropy_true following VeRL's true entropy formula
AlienKevin Dec 15, 2025
d646620
Fix max_tokens truncation issue which caused trainer_sampler_prob_di…
AlienKevin Dec 15, 2025
561a5ef
Double max_output_tokens to 1024 and enable do_overlong_filtering
AlienKevin Dec 15, 2025
98fc267
Add additional logging to identify training forward/backward as the m…
AlienKevin Dec 16, 2025
39c5f82
Track perf metrics in trainer and actor
AlienKevin Dec 19, 2025
eaeea73
Support vocab tiling and remove support for compute_entropy
AlienKevin Dec 19, 2025
0a9b1e8
Reduce max_input_tokens by 4x down to 1024 as empirically max sample …
AlienKevin Dec 19, 2025
76b3e54
Support sequence packing in RL trainer
AlienKevin Dec 19, 2025
5af7121
Revert "Support sequence packing in RL trainer"
AlienKevin Dec 19, 2025
f4eec71
Support dynamic max_seq_len for trainer
AlienKevin Dec 20, 2025
39d3c03
Added notes on ray TPU worker disk cleanup to AGENTS.md
AlienKevin Dec 20, 2025
01f4e1c
Fix pad_to_multiple to be SPLASH block size
AlienKevin Dec 20, 2025
94eec64
Fix model_config's seq_len to include max_input_tokens
AlienKevin Dec 20, 2025
011e1bb
Increase flash_attention_block_size from default 512 to 2048
AlienKevin Dec 20, 2025
bf93f8a
Turn on DATA_PARALLEL_OVERLAP and CF_FOR_ALL_GATHER; reset SPLASH blo…
AlienKevin Dec 20, 2025
106a447
Remove DATA_PARALLEL_OVERLAP and CF_FOR_ALL_GATHER XLA flags as they …
AlienKevin Dec 20, 2025
31f73b7
Free up memory before weight sync to prevent OOM
AlienKevin Dec 21, 2025
c519cc1
Refactor exp_vllm_inference.py to run on all 500 MATH questions
AlienKevin Jan 1, 2026
213d5ee
Double max_output_tokens from 512 to 1024
AlienKevin Jan 2, 2026
bd869a6
Added exp_vllm_inference_modal.py to run the same model with vLLM on …
AlienKevin Jan 2, 2026
e4d73e4
Set top_k to workaround sampling bug
AlienKevin Jan 2, 2026
9f9b257
Merge remote-tracking branch 'origin/main' into chris/exp-rl
AlienKevin Jan 3, 2026
5581aae
Workaround for https://github.com/marin-community/marin/issues/2279
AlienKevin Jan 3, 2026
aa68b52
Add back vllm dependency group
AlienKevin Jan 3, 2026
019ff14
Fixed max_seq_len name error due to recent merge
AlienKevin Jan 3, 2026
5054678
Fixed RLJob.run() call and wandb project names
AlienKevin Jan 3, 2026
1bc39e5
Specify top_k, log training samples, and more dependency fixes
AlienKevin Jan 5, 2026
e0ad6ad
Fixed WorkerExtension MRO issue
AlienKevin Jan 5, 2026
4540958
Fixed race condition on actor creation
AlienKevin Jan 5, 2026
479c81e
Fix "No available node types can fulfill resource requests" by settin…
AlienKevin Jan 5, 2026
a0a8312
Implement missing .call() method for fray actor handles
AlienKevin Jan 5, 2026
50db833
Pass in context_length to flops_per_token to satisfy updated signatur…
AlienKevin Jan 5, 2026
c777cd1
Merge remote-tracking branch 'origin/main' into chris/exp-rl
AlienKevin Jan 9, 2026
4b1ca97
Remove unused RL and vLLM inference experiments
AlienKevin Jan 9, 2026
7a970cb
test(rl): Fix broken unit tests and update API usage
AlienKevin Jan 9, 2026
3da0367
chore: Resolve uv.lock conflict
AlienKevin Jan 9, 2026
8303527
Fixed linter issues in math_env.py
AlienKevin Jan 9, 2026
0b21d63
Removed unused reward calculation script
AlienKevin Jan 10, 2026
1a9cec3
Removed more unused scripts
AlienKevin Jan 10, 2026
74dbaaf
Removed tinker-cookbook
AlienKevin Jan 10, 2026
5f33bd6
Removed unrelated infra notes in AGENTS.md
AlienKevin Jan 10, 2026
ba37e7d
Removed try except around vllm imports
AlienKevin Jan 10, 2026
06125fa
Removed commented debug logs in inference_ctx/base.py
AlienKevin Jan 10, 2026
9694b9e
Removed unused comments in exp1782_vllm_rl.py to focus on MATH-500 on…
AlienKevin Jan 10, 2026
a1bd567
Renamed exp1782_vllm_rl.py to exp2039_rl_math500.py
AlienKevin Jan 10, 2026
0087241
Refactor RL experiment config for reusability
AlienKevin Jan 10, 2026
98104d9
Merge branch 'main' into chris/exp-rl
AlienKevin Jan 10, 2026
b87c4ed
Set torchvision upper bound to be <0.24.1 to ensure compatability wit…
AlienKevin Jan 10, 2026
8faeb3e
Lock torchvision to 0.24.0 to be compatible with vllm-tpu and workaro…
AlienKevin Jan 10, 2026
3e34e48
Made vllm imports optional in vllm.py to allow tests like test_gsm8k_…
AlienKevin Jan 10, 2026
e261bff
Remove unnecessary pinned triton dependency
AlienKevin Jan 11, 2026
2e5dee3
Merge branch 'main' into chris/exp-rl
AlienKevin Jan 12, 2026
ec2bd39
Refactor: Revert RayActorMethod.call and explicit future handling
AlienKevin Jan 13, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions docs/recipes/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@ marin/
│ ├── classifiers/ # Train quality classifiers (fasttext/, bert/, hf/, custom/)
│ ├── training/ # Model training (training.py, scaling_laws.py)
│ ├── rl/ # Async PPO (rollout_worker.py, train_worker.py, replay_buffer.py, curriculum.py, environments/, weight_transfer/)
│ ├── post_training/ # Deprecated, see rl/
│ ├── evaluation/ # Evaluation (evaluators/, visualize.py)
│ ├── datashop/ # LLM-based filtering (pipeline.py, dataset_processor.py, templates.py)
│ ├── generation/ # LLM inference (inference.py, llm_generation.py, pipeline.py)
Expand All @@ -48,7 +47,7 @@ marin/
├── tests/ # Test suite
│ ├── integration_test.py # Full pipeline smoke test (<10min, no GPU)
│ ├── processing/, rl/, post_training/, evals/, deduplication/, data-curation/, snapshots/, tpu/, vllm/
│ ├── processing/, rl/, evals/, deduplication/, data-curation/, snapshots/, tpu/, vllm/
│ └── quickstart-data/
├── docs/ # Documentation (tutorials/, explanations/, references/, recipes/, reports/, design/, dev-guide/, model-cards/)
Expand Down
260 changes: 0 additions & 260 deletions experiments/exp1247_rl_async.py

This file was deleted.

Loading