env-agents: Environmental Data Integration Framework

Semantics-centered framework for discovering, fetching, and harmonizing public environmental data via uniform adapters

🌍 Overview

env-agents provides a unified API for accessing diverse environmental data sources through standardized adapters. It returns analysis-ready datasets with rich, machine-readable metadata using ontology-aware semantic integration.

Production Scale: Successfully integrates 10+ environmental services delivering 100K+ observations per query across soil, air, water, weather, biodiversity, and satellite data.

✨ Key Features

🔌 Unified API: Single interface for 10+ heterogeneous environmental data services
🌐 Production Ready: Handles enterprise-scale workloads (1M+ observations)
📊 Analysis Ready: Returns standardized pandas DataFrames with consistent schema
🔗 Semantic Integration: Ontology-aware variable harmonization across services
🛰️ Multi-Modal Data: Satellite imagery, sensors, surveys, and model outputs
⚡ Optimized Performance: Service-specific configurations and intelligent caching

🚀 Quick Start

Installation

# Install from source
git clone https://github.com/aparkin/env-agents
cd env-agents
pip install -e .

Basic Usage

from env_agents.core.models import RequestSpec, Geometry
from env_agents.adapters import CANONICAL_SERVICES

# Define your area of interest
geometry = Geometry(type='bbox', coordinates=[-122.5, 37.6, -122.3, 37.8])
time_range = ("2021-06-01T00:00:00Z", "2021-08-31T23:59:59Z")

# Get water quality data
wqp_adapter = CANONICAL_SERVICES['WQP']()
spec = RequestSpec(geometry=geometry, time_range=time_range)
water_data = wqp_adapter.fetch(spec)

# Get satellite data
ee_adapter = CANONICAL_SERVICES['EARTH_ENGINE'](asset_id="MODIS/061/MOD13Q1")
satellite_data = ee_adapter.fetch(spec)

print(f"Water quality: {len(water_data)} observations")
print(f"Satellite data: {len(satellite_data)} observations")

📊 Supported Data Sources

Service	Domain	Data Type	Coverage
WQP	Water Quality	Measurements	Global
OpenAQ	Air Quality	Sensor data	Global
EARTH_ENGINE	Satellite/Climate	Multi-modal	Global
SoilGrids	Soil Properties	Model predictions	Global
GBIF	Biodiversity	Species occurrences	Global
NASA_POWER	Weather/Climate	Model reanalysis	Global
EPA_AQS	Air Quality	EPA monitoring	US
USGS_NWIS	Hydrology	Stream/groundwater	US
OSM_Overpass	Infrastructure	Geographic features	Global
SSURGO	Soil Survey	Detailed soil maps	US

🔬 Production Example

Multi-service environmental data fusion returning nearly 1M observations:

from env_agents.adapters import CANONICAL_SERVICES
from env_agents.core.models import RequestSpec, Geometry
import pandas as pd

# Production-scale data collection
geometry = Geometry(type='bbox', coordinates=[-122.8, 37.2, -121.8, 38.2])
fusion_results = []

for service_name, adapter_class in CANONICAL_SERVICES.items():
    adapter = adapter_class()
    spec = RequestSpec(geometry=geometry, time_range=("2021-01-01", "2021-12-31"))

    result = adapter._fetch_rows(spec)
    if result:
        for row in result:
            row['service'] = service_name
        fusion_results.extend(result)

# Create unified dataset
fusion_df = pd.DataFrame(fusion_results)
print(f"Unified dataset: {fusion_df.shape}")
print(f"Services: {fusion_df['service'].nunique()}")
print(f"Variables: {fusion_df['variable'].nunique()}")

Sample Output:

Unified dataset: (999674, 26)
Services: 15 unique
Variables: 190 environmental parameters

📚 Documentation

Complete documentation: docs/README.md

Quick Links

Installation Guide - Get env-agents installed and running
Quick Start - Your first query in 5 minutes
API Reference - Complete API documentation
Services Guide - All 16+ data sources and capabilities
Credentials Setup - Configure API keys

For Developers

Architecture - System design and components
Adding New Services - Create custom adapters
Local Development - Development environment

Production Operations

Pangenome Pipeline - Production data acquisition
Database Management - Managing data storage
Earth Engine Operations - EE-specific guidance

🧪 Testing

Run the production test suite:

# Quick test of all services
python run_tests.py

# Full validation suite
python tests/run_validation_suite.py

# Contract tests
python tests/test_contract.py

🏗️ Architecture

env-agents uses a unified adapter pattern with semantic harmonization:

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Data Sources  │    │   env-agents     │    │   Applications  │
│                 │    │                  │    │                 │
│ • WQP           │────│ • Adapters       │────│ • Research      │
│ • Earth Engine  │    │ • Semantics      │    │ • Monitoring    │
│ • SoilGrids     │    │ • Harmonization  │    │ • Analysis      │
│ • OpenAQ        │    │ • Caching        │    │ • Visualization │
│ • ...           │    │ • Validation     │    │ • ML/AI         │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Key Components

BaseAdapter: Abstract interface for all data sources
RequestSpec: Unified request specification (geometry, time, variables)
Semantic Engine: Variable harmonization and metadata enrichment
Registry System: Ontology-aware variable mapping

🌟 Key Advantages

Unified Interface: One API for 10+ heterogeneous services
Production Scale: Handles millions of observations efficiently
Semantic Integration: Harmonized variables across data sources
Analysis Ready: Clean, standardized output format
Extensible: Easy to add new data sources
Robust: Production-tested with comprehensive error handling

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Quick Development Setup

git clone https://github.com/aparkin/env-agents
cd env-agents
pip install -e ".[dev]"
pytest tests/

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built for environmental research and monitoring applications
Integrates data from NASA, NOAA, EPA, USGS, and other public agencies
Designed for the ENIGMA project and broader environmental science community

env-agents - Unifying environmental data for science and society

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
analysis		analysis
archive		archive
config		config
data		data
docs		docs
env_agents		env_agents
examples		examples
notebooks		notebooks
registry		registry
scripts		scripts
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
READY_TO_RUN.md		READY_TO_RUN.md
pyproject.toml		pyproject.toml
run_tests.py		run_tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

env-agents: Environmental Data Integration Framework

🌍 Overview

✨ Key Features

🚀 Quick Start

Installation

Basic Usage

📊 Supported Data Sources

🔬 Production Example

📚 Documentation

Quick Links

For Developers

Production Operations

🧪 Testing

🏗️ Architecture

Key Components

🌟 Key Advantages

🤝 Contributing

Quick Development Setup

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

aparkin/env-agents

Folders and files

Latest commit

History

Repository files navigation

env-agents: Environmental Data Integration Framework

🌍 Overview

✨ Key Features

🚀 Quick Start

Installation

Basic Usage

📊 Supported Data Sources

🔬 Production Example

📚 Documentation

Quick Links

For Developers

Production Operations

🧪 Testing

🏗️ Architecture

Key Components

🌟 Key Advantages

🤝 Contributing

Quick Development Setup

📄 License

🙏 Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages