This project demonstrates a serverless ETL process using AWS services. It automatically processes CSV files uploaded to an S3 bucket by converting them to Parquet format.
- AWS S3 buckets for raw (sample-raw) and processed (sample-processed) data
- AWS Lambda function for data processing
- Amazon EventBridge for event-driven processing
- AWS SAM for infrastructure as code
- Docker container with Python 3.9 and pyarrow
- AWS SAM CLI
- Docker
- AWS CLI configured with appropriate credentials
- GitHub repository with appropriate AWS credentials configured as secrets:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- Build the application:
sam build- Deploy the application:
sam deploy --guidedThe project includes a GitHub Actions workflow that automatically builds and deploys the application when changes are pushed to the main branch. The workflow:
- Sets up Python and AWS SAM
- Configures AWS credentials
- Builds the application using SAM
- Deploys to AWS using SAM
To use the automated deployment:
- Fork this repository
- Configure AWS credentials as GitHub repository secrets:
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- Push changes to the main branch to trigger the deployment
- Upload a CSV file to the
sample-rawbucket - The Lambda function will automatically:
- Process the file
- Print the row count to CloudWatch logs
- Save the file as Parquet in the
sample-processedbucket
template.yaml: SAM template defining AWS resourcessrc/app.py: Lambda function codeDockerfile: Container configuration for Lambdarequirements.txt: Python dependencies