Semantics-centered framework for discovering, fetching, and harmonizing public environmental data via uniform adapters
env-agents provides a unified API for accessing diverse environmental data sources through standardized adapters. It returns analysis-ready datasets with rich, machine-readable metadata using ontology-aware semantic integration.
Production Scale: Successfully integrates 10+ environmental services delivering 100K+ observations per query across soil, air, water, weather, biodiversity, and satellite data.
- π Unified API: Single interface for 10+ heterogeneous environmental data services
- π Production Ready: Handles enterprise-scale workloads (1M+ observations)
- π Analysis Ready: Returns standardized pandas DataFrames with consistent schema
- π Semantic Integration: Ontology-aware variable harmonization across services
- π°οΈ Multi-Modal Data: Satellite imagery, sensors, surveys, and model outputs
- β‘ Optimized Performance: Service-specific configurations and intelligent caching
# Install from source
git clone https://github.com/aparkin/env-agents
cd env-agents
pip install -e .
from env_agents.core.models import RequestSpec, Geometry
from env_agents.adapters import CANONICAL_SERVICES
# Define your area of interest
geometry = Geometry(type='bbox', coordinates=[-122.5, 37.6, -122.3, 37.8])
time_range = ("2021-06-01T00:00:00Z", "2021-08-31T23:59:59Z")
# Get water quality data
wqp_adapter = CANONICAL_SERVICES['WQP']()
spec = RequestSpec(geometry=geometry, time_range=time_range)
water_data = wqp_adapter.fetch(spec)
# Get satellite data
ee_adapter = CANONICAL_SERVICES['EARTH_ENGINE'](asset_id="MODIS/061/MOD13Q1")
satellite_data = ee_adapter.fetch(spec)
print(f"Water quality: {len(water_data)} observations")
print(f"Satellite data: {len(satellite_data)} observations")
Service | Domain | Data Type | Coverage |
---|---|---|---|
WQP | Water Quality | Measurements | Global |
OpenAQ | Air Quality | Sensor data | Global |
EARTH_ENGINE | Satellite/Climate | Multi-modal | Global |
SoilGrids | Soil Properties | Model predictions | Global |
GBIF | Biodiversity | Species occurrences | Global |
NASA_POWER | Weather/Climate | Model reanalysis | Global |
EPA_AQS | Air Quality | EPA monitoring | US |
USGS_NWIS | Hydrology | Stream/groundwater | US |
OSM_Overpass | Infrastructure | Geographic features | Global |
SSURGO | Soil Survey | Detailed soil maps | US |
Multi-service environmental data fusion returning nearly 1M observations:
from env_agents.adapters import CANONICAL_SERVICES
from env_agents.core.models import RequestSpec, Geometry
import pandas as pd
# Production-scale data collection
geometry = Geometry(type='bbox', coordinates=[-122.8, 37.2, -121.8, 38.2])
fusion_results = []
for service_name, adapter_class in CANONICAL_SERVICES.items():
adapter = adapter_class()
spec = RequestSpec(geometry=geometry, time_range=("2021-01-01", "2021-12-31"))
result = adapter._fetch_rows(spec)
if result:
for row in result:
row['service'] = service_name
fusion_results.extend(result)
# Create unified dataset
fusion_df = pd.DataFrame(fusion_results)
print(f"Unified dataset: {fusion_df.shape}")
print(f"Services: {fusion_df['service'].nunique()}")
print(f"Variables: {fusion_df['variable'].nunique()}")
Sample Output:
Unified dataset: (999674, 26)
Services: 15 unique
Variables: 190 environmental parameters
Complete documentation: docs/README.md
- Installation Guide - Get env-agents installed and running
- Quick Start - Your first query in 5 minutes
- API Reference - Complete API documentation
- Services Guide - All 16+ data sources and capabilities
- Credentials Setup - Configure API keys
- Architecture - System design and components
- Adding New Services - Create custom adapters
- Local Development - Development environment
- Pangenome Pipeline - Production data acquisition
- Database Management - Managing data storage
- Earth Engine Operations - EE-specific guidance
Run the production test suite:
# Quick test of all services
python run_tests.py
# Full validation suite
python tests/run_validation_suite.py
# Contract tests
python tests/test_contract.py
env-agents uses a unified adapter pattern with semantic harmonization:
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Data Sources β β env-agents β β Applications β
β β β β β β
β β’ WQP ββββββ β’ Adapters ββββββ β’ Research β
β β’ Earth Engine β β β’ Semantics β β β’ Monitoring β
β β’ SoilGrids β β β’ Harmonization β β β’ Analysis β
β β’ OpenAQ β β β’ Caching β β β’ Visualization β
β β’ ... β β β’ Validation β β β’ ML/AI β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
- BaseAdapter: Abstract interface for all data sources
- RequestSpec: Unified request specification (geometry, time, variables)
- Semantic Engine: Variable harmonization and metadata enrichment
- Registry System: Ontology-aware variable mapping
- Unified Interface: One API for 10+ heterogeneous services
- Production Scale: Handles millions of observations efficiently
- Semantic Integration: Harmonized variables across data sources
- Analysis Ready: Clean, standardized output format
- Extensible: Easy to add new data sources
- Robust: Production-tested with comprehensive error handling
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
git clone https://github.com/aparkin/env-agents
cd env-agents
pip install -e ".[dev]"
pytest tests/
This project is licensed under the MIT License - see the LICENSE file for details.
- Built for environmental research and monitoring applications
- Integrates data from NASA, NOAA, EPA, USGS, and other public agencies
- Designed for the ENIGMA project and broader environmental science community
env-agents - Unifying environmental data for science and society