Skip to content

Pull requests: mlfoundations/dclm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Semantic scores
#106 by Juan-escobar94 was closed Feb 10, 2025 Loading…
DCLM-RW subsets and other documentation additions
#102 by jeffreywpli was merged Dec 11, 2024 Loading…
Update README.md
#98 by revbucket was merged Nov 26, 2024 Loading…
Improved Documentation, Rust Tokenize Shuffle
#96 by afang-story was merged Dec 3, 2024 Loading…
more details to the documentation of data preprocessing
#94 by Mivg was merged Dec 9, 2024 Loading…
Fix typos
#92 by Muennighoff was merged Dec 6, 2024 Loading…
add eval heavy results for MATES in the 1B-1x setting
#89 by yuzc19 was merged Oct 23, 2024 Loading…
Update README.md -- naive-both -> old-both
#84 by revbucket was merged Oct 17, 2024 Loading…
Documentation updates
#82 by GeorgiosSmyrnis was merged Oct 10, 2024 Loading…
tokenize and shuffle test
#81 by dhgottesman was closed Sep 25, 2024 Loading…
added missing train_fasttext_classifier.py file
#73 by Mivg was merged Sep 5, 2024 Loading…
Add instructions for setup.py
#60 by GeorgiosSmyrnis was merged Sep 5, 2024 Loading…
Add architecture study CSVs
#58 by GeorgiosSmyrnis was merged Sep 5, 2024 Loading…
Update README.md
#57 by RyanMarten was merged Aug 19, 2024 Loading…
README and eval updates
#56 by Mivg was merged Aug 16, 2024 Loading…
Added instructions for finetuning
#51 by GeorgiosSmyrnis was closed Oct 1, 2024 Loading…
migrate bff deduplication documentation (v1)
#43 by jeffreywpli was merged Aug 1, 2024 Loading…
missing fasttext_filter yaml
#42 by jeffreywpli was merged Aug 1, 2024 Loading…
ported fasttext code and fixes
#39 by Mivg was merged Jul 30, 2024 Loading…
sanitized s3 paths
#38 by Mivg was merged Jul 30, 2024 Loading…
Feature/readme updates
#36 by Mivg was merged Jul 29, 2024 Loading…
added training configs
#35 by Mivg was merged Jul 26, 2024 Loading…
point curated banlist download to HF instead of s3
#28 by jeffreywpli was closed Jul 29, 2024 Loading…
Skip hyperparams for nonexisting config dirs.
#25 by dwadden was merged Jul 18, 2024 Loading…
ProTip! What’s not been updated in a month: updated:<2025-02-10.