A Python CLI tool for analyzing email data in mbox format.
- 📧 Process mbox format email archives
- 🔧 Unix-style pipeline architecture for flexible processing
- 📊 Extendable framework for building analysis pipelines
- Coming soon: More analysis processors...
pip install swecc-email-scraper
git clone https://github.com/swecc-uw/swecc-email-scraper.git
cd swecc-email-scraper
pip install -e ".[dev]" # Install with development dependencies
# Run tests
pytest
The tool uses Unix pipes to compose commands. Each command does one thing and can be combined with others:
- Basic usage - get email stats with example processor:
swecc-email-scraper read mailbox.mbox \
| swecc-email-scraper stats \
| swecc-email-scraper format -f json > results.json
- List available processors:
swecc-email-scraper list-processors
- List available output formats:
swecc-email-scraper list-formats
Reads an mbox file and outputs email data as JSON:
swecc-email-scraper read input.mbox > emails.json
Processes email data from stdin and outputs statistics:
cat emails.json | swecc-email-scraper stats > stats.json
Formats JSON data using the specified formatter:
cat stats.json \
| swecc-email-scraper format -f json \
> formatted.json
- Basic email statistics to terminal:
swecc-email-scraper read inbox.mbox \
| swecc-email-scraper stats \
| swecc-email-scraper format
- Save analysis to a file:
swecc-email-scraper read inbox.mbox \
| swecc-email-scraper stats \
> analysis.json
- Process with custom formatting:
swecc-email-scraper read inbox.mbox \
| swecc-email-scraper stats \
| swecc-email-scraper format -f json \
> analysis.json
- Use with Unix tools:
# Filter emails before analysis
swecc-email-scraper read inbox.mbox \
| jq 'map(select(.sender | contains("important")))' \
| swecc-email-scraper stats
The tool is designed to be easily extensible. See CONTRIBUTING.md for detailed information on:
- Creating custom processors
- Adding new output formats
- Contributing to the project
- Development setup and guidelines
The tool uses a Unix pipeline architecture where:
read
command converts mbox files to JSON email data- Processor commands (like
stats
) transform or analyze the data format
command handles output formatting- Standard Unix pipes (
|
) connect the components
MIT License - See LICENSE file for details.
Developed as part of SWECC Labs at the University of Washington.