DataX is a data transfer framework that supports data transmission between various data sources and targets, used for data collection and synchronization.
runtime/datax/
├── core/ # DataX core components
├── transformer/ # Data transformers
├── readers/ # Data readers
│ ├── mysqlreader/
│ ├── postgresqlreader/
│ ├── oracleReader/
│ ├── mongodbreader/
│ ├── hdfsreader/
│ ├── s3rader/
│ ├── nfsreader/
│ ├── glusterfsreader/
│ └── apireader/
└── writers/ # Data writers
├── mysqlwriter/
├── postgresqlwriter/
├── oraclewriter/
├── mongodbwriter/
├── hdfswriter/
├── s3writer/
├── nfswriter/
├── glusterfswriter/
└── txtfilewriter/
- MySQL
- PostgreSQL
- Oracle
- SQL Server
- DB2
- KingbaseES
- GaussDB
- MongoDB
- Elasticsearch
- Cassandra
- HBase
- Redis
- HDFS
- S3 (AWS S3, MinIO, Alibaba Cloud OSS)
- NFS
- GlusterFS
- Local file system
- API interfaces
- Kafka
- Pulsar
- DataHub
- LogHub
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "password",
"column": ["id", "name", "email"],
"connection": [
{
"jdbcUrl": "jdbc:mysql://localhost:3306/database",
"table": ["users"]
}
]
}
},
"writer": {
"name": "txtfilewriter",
"parameter": {
"path": "/output/users.txt",
"fileName": "users",
"writeMode": "truncate"
}
}
}
]
}
}# Build DataX
cd runtime/datax
mvn clean package
# Run
python datax.py -j job.json- JDK 8+
- Maven 3.8+
- Python 3.6+
cd runtime/datax
mvn clean packagepython datax.py -j examples/mysql2text.json- Create new module in
readers/ - Implement Reader interface
- Configure reader parameters
- Add to package.xml
- Create new module in
writers/ - Implement Writer interface
- Configure writer parameters
- Add to package.xml