Skip to content
Open
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
3cf4c86
Add S3 download and upload mappers for distributed processing
kyo-tom Dec 4, 2025
8d6f3d9
feat: test s3 download mapper
Dec 24, 2025
8f42978
feat: resume file download
Dec 24, 2025
f16e036
feat: s3 download test
Dludora Dec 25, 2025
cd9cbb0
Merge branch 'datajuicer:main' into pr-839
Dludora Dec 25, 2025
4f6983b
fix: pre-commit
Dludora Dec 25, 2025
4041ed8
feat: test upload
Dludora Dec 25, 2025
2a3b25c
Merge branch 'pr-839' of github.com:Dludora/data-juicer into pr-839
Dludora Dec 25, 2025
82a513d
feat: unittest 4 s3 upload
Dludora Dec 25, 2025
2fc184e
fix: s3 demo dataset config
Dludora Dec 25, 2025
7a7b26a
feat: timeout config 4 s3
Dludora Dec 25, 2025
6ba0c8b
style: pre-commit
Dludora Dec 25, 2025
b54eb3a
feat: support hdfs and iceberg
Dludora Dec 28, 2025
06f034a
feat: add load data from hdfs source
Dec 29, 2025
c4c7f0b
feat: demo for hdfs load
Dec 29, 2025
37db9bf
feat: read iceberg file
Dludora Dec 29, 2025
1368eaf
feat: iceberg read
Dludora Dec 30, 2025
f4856b6
feat: process iceberg and hdfs
Dludora Dec 30, 2025
9f8a551
refractor: secret
Dludora Dec 30, 2025
2b3b620
feat: export iceberg and others
Dec 30, 2025
4a24682
feat: write iceberg
Dludora Dec 31, 2025
ebd23d2
refractor: move fs
Dludora Dec 31, 2025
12a4524
refractor: move fs
Dludora Dec 31, 2025
2744914
feat: delta and hudi
Dludora Dec 31, 2025
79b3e04
refractor: restore file_utils
Dludora Jan 5, 2026
cc7b07b
refractor: add Any type
Dludora Jan 5, 2026
b4c698c
feat: fallback ray_expoeter
Dludora Jan 5, 2026
9d78a7a
Merge remote-tracking branch 'upstream/main'
Dludora Jan 5, 2026
087941c
fix: pyproject
Dludora Jan 5, 2026
9d18059
style: code style check
Dludora Jan 5, 2026
2b00595
restore: ray_executor s3
Dludora Jan 6, 2026
75457da
Merge remote-tracking branch 'origin/main' into pr-839
Dludora Jan 6, 2026
185be51
fix: tests/config/test_config.py SpecifiedFieldFilter OP python 3.11+…
Dludora Jan 6, 2026
aea124c
Merge remote-tracking branch 'origin/main' into pr-839
Dludora Jan 6, 2026
c261c29
Merge remote-tracking branch 'upstream/main' into pr-839
Dludora Jan 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions data_juicer/ops/mapper/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@
RemoveWordsWithIncorrectSubstringsMapper,
)
from .replace_content_mapper import ReplaceContentMapper
from .s3_download_file_mapper import S3DownloadFileMapper
from .s3_upload_file_mapper import S3UploadFileMapper
from .sdxl_prompt2prompt_mapper import SDXLPrompt2PromptMapper
from .sentence_augmentation_mapper import SentenceAugmentationMapper
from .sentence_split_mapper import SentenceSplitMapper
Expand Down Expand Up @@ -173,6 +175,8 @@
"RemoveTableTextMapper",
"RemoveWordsWithIncorrectSubstringsMapper",
"ReplaceContentMapper",
"S3DownloadFileMapper",
"S3UploadFileMapper",
"SDXLPrompt2PromptMapper",
"SentenceAugmentationMapper",
"SentenceSplitMapper",
Expand Down
Loading
Loading