-
Notifications
You must be signed in to change notification settings - Fork 204
Issues: modelscope/data-juicer
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
搭建好环境后执行python tools/process_data.py --config configs/demo/process.yaml 命令报错
#580
opened Feb 18, 2025 by
ctgushiwei
process_data.py pre-start is too slow 数据处理脚本启动过慢
question
Further information is requested
#578
opened Feb 18, 2025 by
hhhhsc701
3 tasks done
datajuicer是否可以理解成给Ray data提供了多模态数据处理的能力?
question
Further information is requested
#577
opened Feb 16, 2025 by
nihaoqingtuan
3 tasks done
Installation progress could be optimzed. (Cmake error during installation)
enhancement
New feature or request
environment
related to third-party dependency, DJ-pypi, DJ-docker, etc.
#576
opened Feb 14, 2025 by
zhenqincn
2 tasks done
[Bug]: HumanVbench test error: ERROR opening: HumanVBench/Emotion_Intensity_Compare/Emotion_Intensity_Compare_1.mp4, No such file or directory
bug
Something isn't working
#575
opened Feb 12, 2025 by
Reneea1
3 tasks done
以ray模式启动,当内存不足的时候,会溢写到磁盘吗?
question
Further information is requested
#574
opened Feb 11, 2025 by
javapythonphp
3 tasks done
process过程有算子会导致卡死
question
Further information is requested
#560
opened Jan 22, 2025 by
SkyAndFly
3 tasks done
数据分类器有具体的下载链接吗
question
Further information is requested
#558
opened Jan 21, 2025 by
obj12
2 of 3 tasks
How to do sentence_dedup
enhancement
New feature or request
#556
opened Jan 20, 2025 by
ftgreat
1 of 2 tasks
When will version 2.0 be released
question
Further information is requested
#548
opened Jan 14, 2025 by
javapythonphp
3 tasks done
[Bug]: Fail to run ray_bts_minhash_deduplicator
bug
Something isn't working
#547
opened Jan 14, 2025 by
javapythonphp
3 tasks done
Hash configuration information for the dedup performance test of DataJuicer 2.0
question
Further information is requested
#546
opened Jan 14, 2025 by
cist
3 tasks done
[Bug]: ds.JSONDatasource
bug
Something isn't working
#539
opened Jan 10, 2025 by
ariexBear
3 tasks done
Support others LLMs & APIs for the OP issues/PRs about some specific OPs
enhancement
New feature or request
generate_qa_from_text_mapper
dj:op
#535
opened Jan 9, 2025 by
yxdyc
2 tasks done
[BUG]: inappropriate arguments for Something isn't working
dj:dist
issues/PRs about distributed data processing
map_batches
in ray mode
bug
#533
opened Jan 8, 2025 by
HYLcool
Can the cleaning statistics be viewed after creating the config file and performing the cleaning?
question
Further information is requested
#499
opened Nov 27, 2024 by
Tendo33
3 tasks done
Guidance on Monitoring Task Execution with Ray Executor in Data Juicer
dj:dist
issues/PRs about distributed data processing
question
Further information is requested
#496
opened Nov 24, 2024 by
Fatima-0SA
3 tasks done
Anyone tried DJ on multimodal datasets of more than 20M samples?
question
Further information is requested
#482
opened Nov 11, 2024 by
serser
3 tasks done
Update of Jupyter Notebooks
bug
Something isn't working
documentation
Improvements or additions to documentation
#476
opened Nov 6, 2024 by
HYLcool
[Bug]: perplexity_filter 算子内存OOM
bug
Something isn't working
#474
opened Nov 5, 2024 by
weiaicunzai
3 tasks done
[Feat]: Unified LLM Calling Management
enhancement
New feature or request
#451
opened Oct 16, 2024 by
drcege
2 tasks done
[Feat]: Automatic Version Matching During Installation
enhancement
New feature or request
#450
opened Oct 16, 2024 by
drcege
2 tasks done
[Feat]: Enhance Unit Test Coverage for Python and CUDA Compatibility
enhancement
New feature or request
#449
opened Oct 16, 2024 by
drcege
2 tasks done
Previous Next
ProTip!
Follow long discussions with comments:>50.