Gaia is a platform for research and development of geospatial machine learning models. Read more about the long-term vision in our whitepaper.
Note
BETA VERSION = 2.0.0
The beta version of Gaia launches with limited functionality. Many of the planned features are not yet available, however we are still accepting miners and validators for the initial tasks.
git clone https://github.com/Nickel5-Inc/Gaia.git
cd Gaiapip install -e .Miners develop models to understand future events. These events currently include soil moisture and geomagnetic readings at the equator. Miners will receive data from validators for the models that we have in place. They are also free to gather their own data from other resources. The tasks are consistent in design and in timing; this predictability allows miners the flexibility to retrieve any data that their model requires.
Miners can choose between these two tasks or perform both. Incentive Distribution: 40% of emissions are allocated to Geomagnetic Dst Index Prediction. 60% of emissions are allocated based on a sigmoid-weighted scoring mechanism. The incentive split was previously 50:50, but has been adjusted to favor higher-quality predictions.
Validators will connect to a few API's to provide miners with the data they need to run models.
Gaia is built on Fiber - special thanks to namoray and the Rayon labs team.
python./scripts/setup.py- This will create a virtual environment and install all dependencies
- The virtual environment (.gaia) will be located above the gaia directory.
- Activate it after running the setup script using
source ../.gaia/bin/activatepip install "git+https://github.com/rayonlabs/fiber.git@production#egg=fiber[full]"If you are running the validator and PostgreSQL on the same machine and intend to use the default database user (postgres), you might encounter a "Peer authentication failed" error. This typically happens if the validator application is run as an OS user different from postgres (e.g., as root via pm2).
To resolve this and enable password authentication for the postgres user via local Unix domain sockets (recommended for security over peer when OS users don't match):
-
Locate your
pg_hba.conffile. This file is critical for PostgreSQL's client authentication. Common locations include:/etc/postgresql/<YOUR_PG_VERSION>/main/pg_hba.conf/var/lib/pgsql/data/pg_hba.confYou can find its exact location by connecting topsqland runningSHOW hba_file;.
-
Edit
pg_hba.confwithsudoprivileges. Open the file using a text editor, for example:sudo nano /path/to/your/pg_hba.conf
(Replace
/path/to/your/pg_hba.confwith the actual path). -
Modify the
localconnection rule for thepostgresuser. Look for a line similar to:# TYPE DATABASE USER ADDRESS METHOD local all postgres peerChange
peertomd5. The line should now look like:# TYPE DATABASE USER ADDRESS METHOD local all postgres md5If you have a more general rule like
local all all peer, you can either change that tomd5(which will require passwords for all local users) or add the more specific line forpostgresbefore the generalpeerrule. -
Save the
pg_hba.conffile. -
Reload the PostgreSQL configuration. For the changes to take effect, PostgreSQL needs to reload its configuration:
sudo systemctl reload postgresql # Or, for older systems: # sudo service postgresql reload
-
Ensure Environment Variables are Set. Make sure your
.envfile (or your environment configuration method forpm2) correctly sets:DB_USER=postgresDB_PASSWORD=your_actual_postgres_passwordDB_HOST=/var/run/postgresql(or your correct Unix socket directory)DB_NAME=your_database_name(e.g.,validator_db)DB_PORT(will be ignored ifDB_HOSTis a Unix socket path, but good to have)
After these steps, your validator should be able to connect to the local PostgreSQL server using the postgres user and its password. For enhanced security in production, consider creating a dedicated, less-privileged PostgreSQL user for the validator application.
btcli subnets register --subtensor.network <NETWORK> --netuid <NETUID> --wallet.name <COLDKEY> --wallet.hotkey <HOTKEY>Gaia uses a Proxy server to handle connections to Validators. You can setup your own proxy server, or use our script for an nginx server as follows:
./setup_proxy_server.sh --ip <YOUR IP> --port <PORT> --forwarding_port <PORT_FOR_MINER_OR_VALIDATOR> --server_name <NAME>- This will run as a background process, but it must be running for proper communication between Miners and Validators
- IMPORTANT: the port argument is your external facing port, and the forwarding port is the INTERNAL port that the miner or validator will be using to communicate with the proxy server
- The server name argument is optional and could be set to whatever you'd like
fiber-post-ip --netuid <NETUID> --external_ip <YOUR_IP> --external_port <YOUR_PORT> --subtensor.network <NETWORK> --wallet.name <COLDKEY> --wallet.hotkey <HOTKEY> - You only need to do this once per key, but if you get deregistered or your IP changes, you'll need to re-post your IP. We recommend using a static IP for all nodes.
- Make sure that you use the EXTERNAL port that you configured in the proxy server script above
- This will be automated in a future version, but for now you'll need to post your IP manually
- Further instructions are linked below, but ensure that you start the miner with the
--portargument pointing to the FORWARDING/INTERNAL port configured in the proxy server script above.
π NEW: Automated Database Sync System
Gaia now includes a streamlined database synchronization system that eliminates manual pgBackRest configuration and provides application-controlled backup scheduling. This system automatically sets up and manages database backups to Cloudflare R2 storage.
β
One-command setup - No more manual pgbackrest configuration
β
Application-controlled scheduling - No cron jobs needed, easily configurable
β
Automated recovery - Self-healing backup system
β
Simplified replica setup - No manual IP coordination required
β
Real-time monitoring - Health checks and status reporting
Add these to your .env file:
# Required for R2 backup storage
PGBACKREST_R2_BUCKET=your-backup-bucket-name
PGBACKREST_R2_ENDPOINT=https://your-account-id.r2.cloudflarestorage.com
PGBACKREST_R2_ACCESS_KEY_ID=your-r2-access-key
PGBACKREST_R2_SECRET_ACCESS_KEY=your-r2-secret-key
# Optional - defaults shown
PGBACKREST_STANZA_NAME=gaia
PGBACKREST_R2_REGION=auto
PGBACKREST_PGDATA=/var/lib/postgresql/data
PGBACKREST_PGPORT=5432
PGBACKREST_PGUSER=postgres
# Set to True for primary node, False for replica
IS_SOURCE_VALIDATOR_FOR_DB_SYNC=TruePrimary Node (creates backups):
sudo python gaia/validator/sync/setup_auto_sync.py --primaryReplica Node (restores from backups):
sudo python gaia/validator/sync/setup_auto_sync.py --replicaTest Mode (faster setup, shorter retention):
sudo python gaia/validator/sync/setup_auto_sync.py --primary --test- Primary nodes automatically create backups (full daily at 2 AM, differential every 4 hours)
- Replica nodes can restore with a simple command
- All scheduling happens in the application - no cron jobs to manage
- Health monitoring runs continuously with automatic recovery
Check backup status:
from gaia.validator.sync.auto_sync_manager import *
import asyncio
asyncio.run(check_status())Restore from latest backup (replicas):
from gaia.validator.sync.auto_sync_manager import *
import asyncio
asyncio.run(restore_latest())Change backup schedule dynamically:
# No need to edit cron - just update in the application
manager.update_schedule({
'full_backup_time': '03:00',
'diff_backup_interval': 6 # Every 6 hours instead of 4
})If you're currently using the manual pgBackRest setup, the new system will:
- β Detect existing configuration and work alongside it
- β Gradually replace cron-based scheduling with application control
- β Provide the same functionality with better monitoring and error handling
The legacy scripts in gaia/validator/sync/pgbackrest/ remain available but are no longer recommended for new installations.
- Setup fails: Check environment variables and ensure running as root
- Backup failures: The system includes automatic recovery and will retry failed operations
- Connection issues: Verify R2 credentials and network connectivity
- Status monitoring: All operations are logged and monitored automatically
For detailed logs, check /var/log/pgbackrest/ and the validator application logs.
Problem: Validator crashes with SyntaxError: invalid syntax related to async keyword in multiprocessing workers.
Cause: An old asyncio package (v3.4.3) was installed that uses async as a function name, but async became a reserved keyword in Python 3.7+.
Solution: The validator automatically detects and fixes this issue on startup. However, to prevent it from happening:
-
Manual fix (immediate):
# Remove the problematic package pip uninstall asyncio -y # Remove any remaining files find ~/.local/lib/python*/site-packages -name "*asyncio*" -exec rm -rf {} + find /usr/local/lib/python*/site-packages -name "*asyncio*" -exec rm -rf {} +
-
Automatic prevention: The validator now includes automatic detection and removal of problematic asyncio packages during startup.
-
Prevention in new installs: Always use Python's built-in
asynciomodule. Never install the separateasynciopackage on Python 3.7+.
Verification: After fixing, test multiprocessing compatibility:
python -c "import asyncio, multiprocessing; print('β
Asyncio import successful')"Problem: Validator logs are flooded with hundreds of "Logging mode is DEBUG" messages from fiber modules.
Cause: Every module logs its logging mode during import, and multiprocessing workers import modules independently.
Solution: The validator automatically suppresses these verbose DEBUG messages by default.
Control options:
- Default behavior: DEBUG spam is suppressed for cleaner logs
- Enable all DEBUG messages: Set
GAIA_ENABLE_DEBUG_SPAM=truein your environment - Only important messages are shown: INFO level and above are still logged normally
# To see all DEBUG spam (not recommended in production)
export GAIA_ENABLE_DEBUG_SPAM=true
# Default behavior (recommended) - suppress spam
unset GAIA_ENABLE_DEBUG_SPAMProblem: Small weather tasks may run slower with multiprocessing due to overhead.
Cause: Process creation and module imports (~30-40 seconds) can exceed computation time for small workloads.
Solution: Smart multiprocessing with configurable thresholds.
Configuration options:
# Set minimum computations required for multiprocessing (default: 10)
export GAIA_MP_THRESHOLD=20
# Completely disable multiprocessing (forces sequential)
export GAIA_DISABLE_MP=true
# Default behavior (recommended) - auto-detect based on workload size
unset GAIA_MP_THRESHOLD
unset GAIA_DISABLE_MPWhen multiprocessing helps:
- β Large weather scoring runs (10+ computations)
- β Production validation with many miners
- β Complex climatology computations
When to use sequential:
- β Small test runs (< 10 computations)
- β Development/debugging
- β Memory-constrained systems
Custom Miner Models: HERE
- CPU: 6-core processor
- RAM: 8 GB
- Network: Reliable connection with at least 80 Mbps upload and download speeds. 1Tb monthly bandwidth.
- CPU: 8-core processor
- RAM: 16 GB
- Network: Reliable connection with at least 80 Mbps upload and download speeds. 1Tb monthly bandwidth.
Copyright Β© [2024] European Centre for Medium-Range Weather Forecasts (ECMWF).
Copyright 2024 Nickel5 Inc.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the βSoftwareβ), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED βAS ISβ, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
A Foundation Model for the Earth System.
Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Anna Allen, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan A. Weyn, Haiyu Dong, Jayesh K. Gupta, Kit Thambiratnam, Alexander T. Archibald, Chun-Chieh Wu, Elizabeth Heider, Max Welling, Richard E. Turner, and Paris Perdikaris.
Nature, 2025.
DOI: 10.1038/s41586-025-09005-y
ECMWF does not accept any liability whatsoever for any error or omission in the data, their availability, or for any loss or damage arising from their use.
Material has been modified by the following: resolution transformations, spatial transformations, and evapotranspiration calculations on existing bands of ECMWF data.
Masek, J., Ju, J., Roger, J., Skakun, S., Vermote, E., Claverie, M., Dungan, J., Yin, Z., Freitag, B., Justice, C. (2021). HLS Sentinel-2 Multi-spectral Instrument Surface Reflectance Daily Global 30m v2.0 [Data set]. NASA EOSDIS Land Processes Distributed Active Archive Center. Accessed 2024-11-27 from https://doi.org/10.5067/HLS/HLSS30.002
Reichle, R., De Lannoy, G., Koster, R. D., Crow, W. T., Kimball, J. S., Liu, Q. & Bechtold, M. (2022). SMAP L4 Global 3-hourly 9 km EASE-Grid Surface and Root Zone Soil Moisture Geophysical Data. (SPL4SMGP, Version 7). [Data Set]. Boulder, Colorado USA. NASA National Snow and Ice Data Center Distributed Active Archive Center. https://doi.org/10.5067/EVKPQZ4AFC4D. Date Accessed 11-27-2024.
NASA JPL (2013). NASA Shuttle Radar Topography Mission Global 1 arc second [Data set]. NASA EOSDIS Land Processes Distributed Active Archive Center. Accessed 2024-11-27 from https://doi.org/10.5067/MEaSUREs/SRTM/SRTMGL1.003
World Data Center for Geomagnetism, Kyoto. World Data Center for Geomagnetism, Kyoto. Kyoto University. Accessed December 2, 2024.
