-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: merge set of changes for v2.3.0 #428
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Abhishek <[email protected]>
Signed-off-by: Will Johnson <[email protected]>
Code to perform dataset sampling via sampling probabilities in data Signed-off-by: Dushyant Behl <[email protected]>
* Expose additional data handlers as an argument to the train function. Signed-off-by: Dushyant Behl <[email protected]>
#399) * fix: set legacy behavior to false, enable new behavior Signed-off-by: Will Johnson <[email protected]> * fix: Resolve push_to_hub_token warning Signed-off-by: Will Johnson <[email protected]> * fix: Remove max_seq_length and dataset_text_field from SFTTrainer Signed-off-by: Will Johnson <[email protected]> * fmt Signed-off-by: Will Johnson <[email protected]> * fix: Resolve tokenizer.padding_side warning Signed-off-by: Will Johnson <[email protected]> * nit: restructure warning fixes Signed-off-by: Will Johnson <[email protected]> * fix: Add packing directly to SFTConfig Signed-off-by: Will Johnson <[email protected]> * fmt Signed-off-by: Will Johnson <[email protected]> * Removed dataset_kwargs from SFTTrainer Removed the argument dataset_kwargs from the the invocation of SFTTRainer() because it will be deprecated in V1.0.0. Instead, dataset_kwargs have been added as a key to the training_args variable. Following the example provided by HF found here: https://huggingface.co/docs/trl/en/sft_trainer#training-the-vision-language-model Signed-off-by: Luka Dojcinovic <[email protected]> * fix: Added max_seq_length back to SFTConfig() Signed-off-by: Luka Dojcinovic <[email protected]> * Removed legacy and padding_side args Removed these args as they were based on changes from @willmj that haven't been approved yet Signed-off-by: Luka Dojcinovic <[email protected]> * Moved all args to additional_args Following @kmehant suggestion. Signed-off-by: Luka Dojcinovic <[email protected]> * Removed packing and max_seq_length Removed packing and max_seq_length variables from additional_args Signed-off-by: Luka Dojcinovic <[email protected]> * Removed check is_pretokenized_dataset Co-authored-by: Mehant Kammakomati <[email protected]> Signed-off-by: Luka-D <[email protected]> * Removed max_seq_length from additional_args Signed-off-by: Luka Dojcinovic <[email protected]> * Removed error.log Signed-off-by: Luka Dojcinovic <[email protected]> * fix: move packing to SFTConfig as well Co-authored-by: Luka-D <[email protected]> Signed-off-by: Mehant Kammakomati <[email protected]> --------- Signed-off-by: Will Johnson <[email protected]> Signed-off-by: Luka Dojcinovic <[email protected]> Signed-off-by: Luka-D <[email protected]> Signed-off-by: Mehant Kammakomati <[email protected]> Co-authored-by: Will Johnson <[email protected]> Co-authored-by: Mehant Kammakomati <[email protected]> Co-authored-by: Mehant Kammakomati <[email protected]>
…les (#418) Signed-off-by: Mehant Kammakomati <[email protected]>
…ts (#412) * test: Add unit tests to test multiple files in single/multiple datasets Signed-off-by: Abhishek <[email protected]> * e2e testing unit test for multiple datasets with multiple files Signed-off-by: Abhishek <[email protected]> * test: multiple datasets with multiple datafiles column names Signed-off-by: Will Johnson <[email protected]> * PR changes Signed-off-by: Abhishek <[email protected]> * PR Changes Signed-off-by: Abhishek <[email protected]> * fix: fmt Signed-off-by: Abhishek <[email protected]> * Merge test_process_dataconfig_multiple_files_varied_data_formats Signed-off-by: Abhishek <[email protected]> --------- Signed-off-by: Abhishek <[email protected]> Signed-off-by: Will Johnson <[email protected]> Co-authored-by: Will Johnson <[email protected]>
Signed-off-by: Dushyant Behl <[email protected]>
Also add mlflow docs and add mlflow to docker file and as optional requirement Signed-off-by: Dushyant Behl <[email protected]>
feat: Integrate MLflow tracker
…atterns, HF Dataset and combination (#424) Signed-off-by: Abhishek <[email protected]>
aluu317
requested review from
anhuong,
Ssukriti,
fabianlim and
kmehant
as code owners
December 23, 2024 14:54
Thanks for making a pull request! 😃 |
aluu317
changed the title
release: merge set of changes for v2.3.0
chore: merge set of changes for v2.3.0
Dec 23, 2024
The commits looks good to me. After addition of this one more PR, looks good to merge. |
Signed-off-by: Dushyant Behl <[email protected]> Signed-off-by: Will Johnson <[email protected]> Signed-off-by: Abhishek <[email protected]> Co-authored-by: Will Johnson <[email protected]> Co-authored-by: Abhishek <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of the change
Related issue number
How to verify the PR
Was the PR tested