diff --git a/.gitignore b/.gitignore
index 25eb168..a9c67b2 100644
--- a/.gitignore
+++ b/.gitignore
@@ -7,3 +7,6 @@ build/
 .pytest_cache/
 venv/
 .env
+# Rust build outputs
+target/
+target_local/
\ No newline at end of file
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..1f0309e
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,558 @@
+# Contributing to AstroML
+
+Thank you for your interest in contributing to AstroML! This document provides guidelines and instructions for contributing code, documentation, and research to the project.
+
+## Table of Contents
+
+- [Code of Conduct](#code-of-conduct)
+- [Getting Started](#getting-started)
+- [Research to Production Workflow](#research-to-production-workflow)
+- [Development Setup](#development-setup)
+- [Code Standards](#code-standards)
+- [Testing Requirements](#testing-requirements)
+- [PR Process](#pr-process)
+- [Documentation](#documentation)
+- [Questions & Support](#questions--support)
+
+---
+
+## Code of Conduct
+
+AstroML is committed to providing a welcoming and inclusive environment. All contributors are expected to:
+
+- Be respectful and constructive in all interactions
+- Welcome feedback and criticism gracefully
+- Focus on what is best for the community
+- Show empathy towards other community members
+
+---
+
+## Getting Started
+
+### 1. Fork and Clone
+
+```bash
+# Fork the repository on GitHub, then:
+git clone https://github.com/<your-username>/astroml.git
+cd astroml
+git remote add upstream https://github.com/Traqora/astroml.git
+```
+
+### 2. Create a Feature Branch
+
+```bash
+# Sync with latest upstream
+git fetch upstream
+git checkout -b feature/your-feature-name upstream/main
+
+# Or for bug fixes:
+git checkout -b fix/bug-description upstream/main
+```
+
+### 3. Set Up Development Environment
+
+See [Development Setup](#development-setup) section below.
+
+---
+
+## Research to Production Workflow
+
+AstroML follows a clear data pipeline model that moves research from exploration to production. Understanding this workflow is essential for contributing effectively.
+
+### The Data Pipeline
+
+```
+Ledger Data
+    ↓
+Ingestion & Normalization
+    ↓
+Graph Construction
+    ↓
+Feature Engineering
+    ↓
+Model Training & Evaluation
+    ↓
+Experimentation & Deployment
+```
+
+### Component Breakdown
+
+| Stage | Module | Purpose | Examples |
+|-------|--------|---------|----------|
+| **Ingestion** | `astroml.ingestion` | Fetch ledgers from Stellar Horizon | `backfill`, `enhanced_stream` |
+| **Normalization** | `astroml.ingestion` | Validate & deduplicate data | Duplicate removal, type conversion |
+| **Graph Building** | `astroml.graph` | Construct transaction graphs | `build_snapshot`, windowing logic |
+| **Features** | `astroml.features` | Extract node/edge features | Asset diversity, temporal decay, node importance |
+| **Models** | `astroml.models` | GNN architectures & embeddings | GCN, GAT, GraphSAGE |
+| **Training** | `astroml.training` | Model training pipelines | Config-driven experiments, checkpoints |
+
+### Contributing to Each Stage
+
+**When adding ingestion logic:**
+- Ensure idempotency (re-runs are safe)
+- Handle database constraints gracefully
+- Test with small ledger ranges first
+- Document config requirements in `config/database.yaml`
+
+**When building graph features:**
+- Test windowing logic thoroughly
+- Ensure reproducibility (random seeds, checksums)
+- Validate against edge cases (empty graphs, single nodes)
+- Add unit tests before integration
+
+**When creating models:**
+- Use config files for hyperparameters (see `configs/`)
+- Store checkpoints with metadata
+- Log metrics consistently
+- Provide examples in `examples/`
+
+---
+
+## Development Setup
+
+### Prerequisites
+
+- **Python 3.10+**
+- **PostgreSQL 12+** (for ingestion tests; SQLite for unit tests)
+- **Git**
+
+### Installation
+
+```bash
+# 1. Create virtual environment
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+
+# 2. Install dependencies
+pip install -r requirements.txt
+
+# 3. (Optional) CPU-only PyTorch
+pip install -r requirements-cpu.txt
+
+# 4. Configure database
+cp config/database.yaml.example config/database.yaml
+# Edit config/database.yaml with your PostgreSQL credentials
+
+# 5. Install package in editable mode
+pip install -e .
+
+# 6. Run tests to verify setup
+pytest tests/ -v
+```
+
+### Database Setup (for integration tests)
+
+```bash
+# Create a test database
+createdb astroml_test
+
+# Update config/database.yaml to point to test database
+# Then run migrations:
+alembic upgrade head
+```
+
+---
+
+## Code Standards
+
+### Python Style
+
+AstroML follows **PEP 8** with these conventions:
+
+- **Line length**: 88 characters (Black formatter)
+- **Imports**: Organize as (stdlib, third-party, local)
+- **Docstrings**: Use Google-style docstrings for all public functions/classes
+
+#### Example:
+
+```python
+from datetime import datetime
+from typing import Optional
+
+import pandas as pd
+from sqlalchemy import Column, String, Integer
+from sqlalchemy.orm import declarative_base
+
+from astroml.db.session import Base
+
+
+def calculate_node_importance(
+    graph: 'nx.DiGraph',
+    measure: str = 'betweenness',
+) -> dict:
+    """Calculate node importance metrics for a transaction graph.
+    
+    Args:
+        graph: NetworkX directed graph of transactions
+        measure: One of 'betweenness', 'degree', 'closeness'
+        
+    Returns:
+        Dictionary mapping node IDs to importance scores
+        
+    Raises:
+        ValueError: If measure is not recognized
+    """
+    if measure not in ('betweenness', 'degree', 'closeness'):
+        raise ValueError(f"Unknown measure: {measure}")
+    
+    # Implementation
+    return {}
+```
+
+### Type Hints
+
+- Use type hints for all function parameters and return types
+- Import from `typing` module for complex types
+
+```python
+from typing import List, Dict, Optional, Tuple
+
+def process_accounts(
+    accounts: List[str],
+    filters: Optional[Dict[str, int]] = None,
+) -> Tuple[int, List[str]]:
+    """Process a list of account IDs."""
+    pass
+```
+
+### Naming Conventions
+
+- **Functions/variables**: `snake_case`
+- **Classes**: `PascalCase`
+- **Constants**: `UPPER_SNAKE_CASE`
+- **Private members**: Prefix with `_`
+
+```python
+class TransactionGraph:
+    DEFAULT_WINDOW_SIZE = 30  # days
+    
+    def __init__(self):
+        self._cache = {}
+    
+    def get_node_count(self) -> int:
+        """Return number of nodes."""
+        pass
+```
+
+### Comments & Documentation
+
+- Write comments that explain **why**, not **what**
+- Use docstrings for all public APIs
+- Keep comments concise and up-to-date
+
+```python
+# Good: explains reasoning
+# Use cached result if available to avoid re-querying Stellar Horizon
+if node_id in self._cache:
+    return self._cache[node_id]
+
+# Avoid: obvious from code
+# increment counter
+count += 1
+```
+
+---
+
+## Testing Requirements
+
+### Running Tests
+
+```bash
+# Run all tests
+pytest tests/ -v
+
+# Run specific test file
+pytest tests/test_schema.py -v
+
+# Run with coverage
+pytest tests/ --cov=astroml --cov-report=html
+
+# Run async tests (marked with @pytest.mark.asyncio)
+pytest tests/test_stream.py -v
+```
+
+### Writing Tests
+
+**Test file naming**: `test_<module_name>.py`
+
+```python
+import pytest
+from astroml.features import calculate_asset_diversity
+
+
+class TestAssetDiversity:
+    """Tests for asset diversity feature calculation."""
+    
+    def test_single_asset(self):
+        """Single asset should have diversity = 1."""
+        result = calculate_asset_diversity(['USD'])
+        assert result == 1.0
+    
+    def test_empty_assets(self):
+        """Empty list should raise ValueError."""
+        with pytest.raises(ValueError):
+            calculate_asset_diversity([])
+    
+    @pytest.mark.asyncio
+    async def test_async_feature_extraction(self):
+        """Test async feature pipeline."""
+        result = await extract_features_async([...])
+        assert len(result) > 0
+
+
+@pytest.fixture
+def sample_graph():
+    """Fixture providing sample transaction graph."""
+    import networkx as nx
+    G = nx.DiGraph()
+    G.add_edges_from([('A', 'B'), ('B', 'C')])
+    return G
+```
+
+### Test Checklist
+
+Before submitting a PR:
+
+- [ ] All tests pass: `pytest tests/ -v`
+- [ ] New tests added for new functionality
+- [ ] Edge cases covered (empty inputs, None values, etc.)
+- [ ] Async functions tested with `@pytest.mark.asyncio`
+- [ ] Integration tests verify database interactions
+- [ ] No hardcoded test data paths (use fixtures)
+
+### Testing Different Stages
+
+| Stage | Test Type | Command |
+|-------|-----------|---------|
+| Ingestion | Unit + Integration | `pytest tests/test_*stream*.py` |
+| Graph Building | Unit + Snapshot | `pytest tests/test_snapshot.py` |
+| Features | Unit + Functional | `pytest tests/test_*features*.py` |
+| Models | Unit + Training | `pytest tests/test_*.py -k model` |
+
+---
+
+## PR Process
+
+### Before Opening a PR
+
+1. **Sync with upstream:**
+   ```bash
+   git fetch upstream
+   git rebase upstream/main
+   ```
+
+2. **Run linting & tests locally:**
+   ```bash
+   # Check for obvious issues
+   python -m py_compile astroml/**/*.py
+   
+   # Run full test suite
+   pytest tests/ -v
+   ```
+
+3. **Ensure commits are clean:**
+   - Meaningful commit messages (see [Commit Convention](#commit-convention))
+   - Logical, separated changes
+   - No secrets or credentials
+
+### Commit Convention
+
+```
+<type>(<scope>): <subject>
+
+<body>
+
+<footer>
+```
+
+**Types**: `feat`, `fix`, `docs`, `test`, `refactor`, `chore`, `perf`
+
+**Scope**: `ingestion`, `graph`, `features`, `models`, `training`, `db`
+
+**Examples:**
+
+```
+feat(features): add temporal decay feature extractor
+
+- Implements exponential decay based on transaction age
+- Configured via decay_rate parameter
+- Tested with synthetic graphs
+
+Closes #123
+```
+
+```
+fix(ingestion): handle duplicate transaction deduplication
+
+Fixes idempotency issue when re-running backfill on same ledger range.
+
+Fixes #456
+```
+
+### PR Template
+
+When opening a PR, fill out:
+
+```markdown
+## Description
+Brief description of what this PR does.
+
+## Type of Change
+- [ ] Bug fix
+- [ ] New feature
+- [ ] Breaking change
+- [ ] Documentation update
+
+## Related Issue
+Closes #<issue_number>
+
+## Testing
+- [ ] Unit tests added/updated
+- [ ] Integration tests pass
+- [ ] Tested against sample data
+
+## Checklist
+- [ ] Code follows style guidelines
+- [ ] Self-reviewed the code
+- [ ] Updated documentation
+- [ ] No new warnings generated
+```
+
+### Review Process
+
+**Expectations:**
+
+- Reviewers will provide feedback constructively
+- Critical feedback focuses on the code, not the person
+- Contributors should respond to all feedback (even if just acknowledging)
+- Approval requires at least one maintainer sign-off
+
+**What reviewers check:**
+
+- ✅ Code correctness and logic
+- ✅ Test coverage (especially for pipeline stages)
+- ✅ Reproducibility (configs, seeds, checksums)
+- ✅ Documentation completeness
+- ✅ Alignment with "Research to Production" workflow
+- ✅ Database integrity (for ingestion changes)
+
+---
+
+## Documentation
+
+### Docstring Requirements
+
+All public functions, classes, and modules must have docstrings:
+
+```python
+"""Module for extracting temporal features from transaction graphs.
+
+This module implements exponential decay and recency weighting
+for node features based on transaction timestamps.
+"""
+
+def calculate_temporal_decay(
+    transactions: List[Transaction],
+    decay_rate: float = 0.1,
+) -> pd.DataFrame:
+    """Calculate temporal decay weights for accounts.
+    
+    Uses exponential decay: weight = exp(-decay_rate * age_in_days)
+    
+    Args:
+        transactions: List of Transaction objects (sorted by time)
+        decay_rate: Decay coefficient (higher = faster decay)
+        
+    Returns:
+        DataFrame with columns: [account_id, decay_weight, timestamp]
+        
+    Raises:
+        ValueError: If decay_rate is negative or transactions list is empty
+        
+    Examples:
+        >>> df = calculate_temporal_decay(transactions, decay_rate=0.1)
+        >>> df.shape
+        (1000, 3)
+    """
+```
+
+### README Updates
+
+When adding new features, update [README.md](README.md):
+
+- Add to feature list if it's major functionality
+- Update architecture diagram if pipeline changes
+- Link to new example scripts or documentation
+
+### Example Scripts
+
+For new features, add an example in `examples/`:
+
+```python
+# examples/temporal_decay_example.py
+"""Example: Extract temporal decay features."""
+
+from astroml.features.temporal_decay import calculate_temporal_decay
+from astroml.db.session import get_session
+
+# Fetch transactions
+session = get_session()
+transactions = session.query(Transaction).all()
+
+# Calculate temporal features
+decay_df = calculate_temporal_decay(transactions, decay_rate=0.1)
+
+print(f"Extracted temporal features for {len(decay_df)} accounts")
+print(decay_df.head())
+```
+
+### Configuration Documentation
+
+Document YAML config fields in docstrings:
+
+```python
+"""
+Expected config (config/database.yaml):
+    
+    database:
+      host: localhost
+      port: 5432
+      user: postgres
+      password: ${DB_PASSWORD}  # From environment
+      database: astroml
+"""
+```
+
+---
+
+## Questions & Support
+
+- **Bug reports**: Open an issue on GitHub with reproducible example
+- **Feature requests**: Use GitHub Discussions or open an issue with `[FEATURE]` tag
+- **Questions**: Post in GitHub Discussions or tag with `[QUESTION]`
+- **Security issues**: Email maintainers privately (do not open public issue)
+
+### Getting Help
+
+1. **Check existing issues/discussions** for similar questions
+2. **Search the documentation** in `docs/` and README
+3. **Review example scripts** in `examples/`
+4. **Run the discovery checklist** from [copilot-instructions.md](.github/copilot-instructions.md)
+
+---
+
+## Additional Resources
+
+- [README.md](README.md) - Project overview and quick start
+- [docs/](docs/) - Full documentation
+- [examples/](examples/) - Example scripts for common tasks
+- [alembic/versions/](alembic/versions/) - Database migration history
+- [configs/](configs/) - Example configuration files
+
+---
+
+## Thank You! 🙏
+
+Your contributions make AstroML better for the entire research community. Whether you're fixing bugs, adding features, or improving documentation, every contribution matters.
+
+**Happy coding!**
diff --git a/astroml/db/schema.py b/astroml/db/schema.py
index c638bd5..1de0971 100644
--- a/astroml/db/schema.py
+++ b/astroml/db/schema.py
@@ -1,6 +1,11 @@
-"""SQLAlchemy ORM models for raw Stellar blockchain data.
+"""SQLAlchemy ORM models for AstroML storage.
 
-Five tables model the core Stellar data needed for graph ML:
+The schema has two layers:
+
+- Raw Stellar blockchain storage used by the current ingestion pipeline.
+- A normalized graph-mirror layer for account-centric time-series retrieval.
+
+Five raw tables model the core Stellar data needed for graph ML:
 
 - **ledgers** — temporal anchor; one row per closed ledger (~5-6 s apart).
 - **transactions** — one row per transaction, linked to a ledger.
@@ -19,7 +24,9 @@
 from sqlalchemy import (
     BigInteger,
     Boolean,
+    CheckConstraint,
     ForeignKey,
+    ForeignKeyConstraint,
     Index,
     Integer,
     JSON,
@@ -27,6 +34,7 @@
     SmallInteger,
     String,
     Text,
+    UniqueConstraint,
     func,
 )
 from sqlalchemy.dialects.postgresql import JSONB
@@ -220,6 +228,253 @@ class Asset(Base):
     )
 
 
+# ---------------------------------------------------------------------------
+# Graph mirror accounts
+# ---------------------------------------------------------------------------
+
+GRAPH_EDGE_TYPES = ("transaction", "claim", "payment")
+GRAPH_ID_TYPE = BigInteger().with_variant(Integer(), "sqlite")
+
+
+class GraphAccount(Base):
+    """Canonical graph node table.
+
+    This table is intentionally separate from ``accounts`` because the raw
+    ``accounts`` table stores the latest Stellar account snapshot, while the
+    graph mirror needs stable surrogate keys and observation timestamps for
+    long-lived node/edge analytics.
+    """
+
+    __tablename__ = "graph_accounts"
+
+    id: Mapped[int] = mapped_column(GRAPH_ID_TYPE, primary_key=True, autoincrement=True)
+    account_address: Mapped[str] = mapped_column(String(56), nullable=False, unique=True)
+    account_type: Mapped[Optional[str]] = mapped_column(String(32))
+    first_seen_at: Mapped[datetime] = mapped_column(nullable=False)
+    last_seen_at: Mapped[datetime] = mapped_column(nullable=False)
+    created_at: Mapped[datetime] = mapped_column(nullable=False, server_default=func.now())
+    updated_at: Mapped[datetime] = mapped_column(
+        nullable=False, server_default=func.now(), onupdate=func.now()
+    )
+
+    outgoing_edges: Mapped[list[GraphEdge]] = relationship(
+        foreign_keys="GraphEdge.source_account_id",
+        back_populates="source_account",
+    )
+    incoming_edges: Mapped[list[GraphEdge]] = relationship(
+        foreign_keys="GraphEdge.destination_account_id",
+        back_populates="destination_account",
+    )
+
+    __table_args__ = (
+        Index("ix_graph_accounts_last_seen_at", "last_seen_at"),
+        Index("ix_graph_accounts_account_type", "account_type"),
+    )
+
+
+# ---------------------------------------------------------------------------
+# Graph mirror edges
+# ---------------------------------------------------------------------------
+
+class GraphEdge(Base):
+    """Canonical directed edge table for the PostgreSQL graph mirror.
+
+    Shared edge attributes live here so the table stays narrow and indexable for
+    account timelines. Type-specific attributes move to dedicated detail tables
+    to avoid a single sparse event table.
+    """
+
+    __tablename__ = "graph_edges"
+
+    id: Mapped[int] = mapped_column(GRAPH_ID_TYPE, primary_key=True, autoincrement=True)
+    edge_type: Mapped[str] = mapped_column(String(16), nullable=False)
+    source_account_id: Mapped[int] = mapped_column(
+        GRAPH_ID_TYPE, ForeignKey("graph_accounts.id"), nullable=False
+    )
+    destination_account_id: Mapped[Optional[int]] = mapped_column(
+        GRAPH_ID_TYPE, ForeignKey("graph_accounts.id")
+    )
+    asset_id: Mapped[Optional[int]] = mapped_column(Integer, ForeignKey("assets.id"))
+    occurred_at: Mapped[datetime] = mapped_column(nullable=False)
+    ledger_sequence: Mapped[Optional[int]] = mapped_column(Integer)
+    event_index: Mapped[Optional[int]] = mapped_column(Integer)
+    transaction_hash: Mapped[Optional[str]] = mapped_column(String(64))
+    external_event_id: Mapped[str] = mapped_column(String(128), nullable=False)
+    amount: Mapped[Optional[float]] = mapped_column(Numeric)
+    status: Mapped[Optional[str]] = mapped_column(String(32))
+    created_at: Mapped[datetime] = mapped_column(nullable=False, server_default=func.now())
+
+    source_account: Mapped[GraphAccount] = relationship(
+        foreign_keys=[source_account_id],
+        back_populates="outgoing_edges",
+    )
+    destination_account: Mapped[Optional[GraphAccount]] = relationship(
+        foreign_keys=[destination_account_id],
+        back_populates="incoming_edges",
+    )
+    asset: Mapped[Optional[Asset]] = relationship()
+    transaction_detail: Mapped[Optional[GraphTransactionDetail]] = relationship(
+        back_populates="edge",
+        cascade="all, delete-orphan",
+        uselist=False,
+    )
+    claim_detail: Mapped[Optional[GraphClaimDetail]] = relationship(
+        back_populates="edge",
+        cascade="all, delete-orphan",
+        uselist=False,
+    )
+    payment_detail: Mapped[Optional[GraphPaymentDetail]] = relationship(
+        back_populates="edge",
+        cascade="all, delete-orphan",
+        uselist=False,
+    )
+
+    __table_args__ = (
+        CheckConstraint(
+            "edge_type IN ('transaction', 'claim', 'payment')",
+            name="ck_graph_edges_edge_type",
+        ),
+        CheckConstraint(
+            "source_account_id <> destination_account_id OR destination_account_id IS NULL",
+            name="ck_graph_edges_distinct_accounts",
+        ),
+        UniqueConstraint(
+            "edge_type",
+            "external_event_id",
+            name="uq_graph_edges_type_external_event_id",
+        ),
+        UniqueConstraint("id", "edge_type", name="uq_graph_edges_id_edge_type"),
+        Index("ix_graph_edges_occurred_at", "occurred_at"),
+        Index(
+            "ix_graph_edges_source_occurred_at",
+            "source_account_id",
+            "occurred_at",
+        ),
+        Index(
+            "ix_graph_edges_destination_occurred_at",
+            "destination_account_id",
+            "occurred_at",
+        ),
+        Index("ix_graph_edges_type_occurred_at", "edge_type", "occurred_at"),
+        Index("ix_graph_edges_asset_occurred_at", "asset_id", "occurred_at"),
+        Index(
+            "ix_graph_edges_status_occurred_at",
+            "status",
+            "occurred_at",
+        ),
+        Index(
+            "ix_graph_edges_tx_hash",
+            "transaction_hash",
+            postgresql_where=(transaction_hash.isnot(None)),
+        ),
+        Index(
+            "ix_graph_edges_ledger_event",
+            "ledger_sequence",
+            "event_index",
+        ),
+    )
+
+
+class GraphTransactionDetail(Base):
+    """Subtype table for transaction-specific edge attributes."""
+
+    __tablename__ = "graph_transaction_details"
+
+    edge_id: Mapped[int] = mapped_column(GRAPH_ID_TYPE, primary_key=True)
+    edge_type: Mapped[str] = mapped_column(
+        String(16), nullable=False, server_default="transaction"
+    )
+    successful: Mapped[Optional[bool]] = mapped_column(Boolean)
+    operation_count: Mapped[Optional[int]] = mapped_column(SmallInteger)
+    fee: Mapped[Optional[int]] = mapped_column(BigInteger)
+    memo_type: Mapped[Optional[str]] = mapped_column(String(16))
+    memo: Mapped[Optional[str]] = mapped_column(Text)
+    details: Mapped[Optional[dict]] = mapped_column(
+        JSON().with_variant(JSONB(), "postgresql")
+    )
+
+    edge: Mapped[GraphEdge] = relationship(back_populates="transaction_detail")
+
+    __table_args__ = (
+        CheckConstraint(
+            "edge_type = 'transaction'",
+            name="ck_graph_transaction_details_edge_type",
+        ),
+        ForeignKeyConstraint(
+            ["edge_id", "edge_type"],
+            ["graph_edges.id", "graph_edges.edge_type"],
+            ondelete="CASCADE",
+        ),
+    )
+
+
+class GraphClaimDetail(Base):
+    """Subtype table for claim-specific edge attributes."""
+
+    __tablename__ = "graph_claim_details"
+
+    edge_id: Mapped[int] = mapped_column(GRAPH_ID_TYPE, primary_key=True)
+    edge_type: Mapped[str] = mapped_column(String(16), nullable=False, server_default="claim")
+    claim_reference: Mapped[Optional[str]] = mapped_column(String(128))
+    claim_status: Mapped[Optional[str]] = mapped_column(String(32))
+    expires_at: Mapped[Optional[datetime]] = mapped_column()
+    details: Mapped[Optional[dict]] = mapped_column(
+        JSON().with_variant(JSONB(), "postgresql")
+    )
+
+    edge: Mapped[GraphEdge] = relationship(back_populates="claim_detail")
+
+    __table_args__ = (
+        CheckConstraint(
+            "edge_type = 'claim'",
+            name="ck_graph_claim_details_edge_type",
+        ),
+        ForeignKeyConstraint(
+            ["edge_id", "edge_type"],
+            ["graph_edges.id", "graph_edges.edge_type"],
+            ondelete="CASCADE",
+        ),
+        Index("ix_graph_claim_details_claim_status", "claim_status"),
+    )
+
+
+class GraphPaymentDetail(Base):
+    """Subtype table for payment-specific edge attributes."""
+
+    __tablename__ = "graph_payment_details"
+
+    edge_id: Mapped[int] = mapped_column(GRAPH_ID_TYPE, primary_key=True)
+    edge_type: Mapped[str] = mapped_column(
+        String(16), nullable=False, server_default="payment"
+    )
+    payment_reference: Mapped[Optional[str]] = mapped_column(String(128))
+    payment_status: Mapped[Optional[str]] = mapped_column(String(32))
+    fee_amount: Mapped[Optional[float]] = mapped_column(Numeric)
+    settled_at: Mapped[Optional[datetime]] = mapped_column()
+    details: Mapped[Optional[dict]] = mapped_column(
+        JSON().with_variant(JSONB(), "postgresql")
+    )
+
+    edge: Mapped[GraphEdge] = relationship(back_populates="payment_detail")
+
+    __table_args__ = (
+        CheckConstraint(
+            "edge_type = 'payment'",
+            name="ck_graph_payment_details_edge_type",
+        ),
+        CheckConstraint(
+            "fee_amount >= 0 OR fee_amount IS NULL",
+            name="ck_graph_payment_details_fee_amount_non_negative",
+        ),
+        ForeignKeyConstraint(
+            ["edge_id", "edge_type"],
+            ["graph_edges.id", "graph_edges.edge_type"],
+            ondelete="CASCADE",
+        ),
+        Index("ix_graph_payment_details_payment_status", "payment_status"),
+    )
+
+
 # ---------------------------------------------------------------------------
 # Effects
 # ---------------------------------------------------------------------------
diff --git a/astroml/features/asset_typing.py b/astroml/features/asset_typing.py
new file mode 100644
index 0000000..2e61535
--- /dev/null
+++ b/astroml/features/asset_typing.py
@@ -0,0 +1,46 @@
+"""Multi-asset edge typing for the Stellar transaction graph.
+
+Classifies asset strings (as produced by the normalizer) into three
+canonical edge types used during graph construction:
+
+- XLM        — native Stellar asset
+- STABLECOIN — known fiat-pegged assets (USDC, USDT, EURC, …)
+- CUSTOM     — any other issued asset
+"""
+from __future__ import annotations
+
+from enum import IntEnum
+
+# Known stablecoin asset codes on Stellar (code only, issuer-agnostic).
+# Extend this set as new stablecoins are listed.
+_STABLECOIN_CODES: frozenset[str] = frozenset({
+    "USDC", "USDT", "EURC", "EURT", "BRLT", "NGNT", "IDRT",
+    "ARST", "MXNT", "NGNC",
+})
+
+
+class AssetType(IntEnum):
+    """Canonical edge type for a Stellar asset."""
+    XLM = 0
+    STABLECOIN = 1
+    CUSTOM = 2
+
+
+def classify_asset(asset: str) -> AssetType:
+    """Return the :class:`AssetType` for a normalised asset string.
+
+    Args:
+        asset: Asset string in the form ``'XLM'``, ``'USDC:G...'``, or
+               ``'CODE:ISSUER'`` as produced by the ingestion normalizer.
+
+    Returns:
+        :class:`AssetType` enum value.
+    """
+    if asset == "XLM":
+        return AssetType.XLM
+
+    code = asset.split(":")[0].upper()
+    if code in _STABLECOIN_CODES:
+        return AssetType.STABLECOIN
+
+    return AssetType.CUSTOM
diff --git a/astroml/features/frequency.py b/astroml/features/frequency.py
index 7c26423..8a99892 100644
--- a/astroml/features/frequency.py
+++ b/astroml/features/frequency.py
@@ -4,7 +4,7 @@
 transaction data, including daily activity counts and burstiness metrics.
 Inputs are pandas DataFrames with configurable timestamp and account columns.
 """
-from typing import Union
+from typing import Hashable, Union
 
 import numpy as np
 import pandas as pd
@@ -49,7 +49,6 @@ def _validate_dataframe(
             numeric_timestamps = pd.to_numeric(df[timestamp_col], errors="raise")
             max_abs_value = numeric_timestamps.abs().max()
 
-            # Infer UNIX timestamp unit by magnitude.
             if max_abs_value < 1e11:
                 unit = "s"
             elif max_abs_value < 1e14:
@@ -71,7 +70,7 @@ def _validate_dataframe(
             f"Column '{timestamp_col}' must contain datetime values or parseable timestamps"
         ) from exc
 
-    df.loc[:, timestamp_col] = converted
+    df[timestamp_col] = converted
 
 
 def _extract_daily_counts(
@@ -105,31 +104,19 @@ def _extract_daily_counts(
         >>> counts.tolist()
         [2, 0, 1]
     """
-    # Handle empty timestamps
     if len(timestamps) == 0:
         return np.array([])
 
-    # Convert timestamps to dates (day resolution)
     dates = timestamps.dt.date
 
-    # Handle single timestamp
     if len(timestamps) == 1:
         return np.array([1])
 
-    # Determine first and last transaction dates
     min_date = dates.min()
     max_date = dates.max()
-
-    # Create complete date range from first to last
     date_range = pd.date_range(start=min_date, end=max_date, freq="D")
-
-    # Count transactions per day using value_counts
     daily_counts = dates.value_counts()
-
-    # Fill missing days with 0
     daily_counts = daily_counts.reindex(date_range.date, fill_value=0)
-
-    # Return as numpy array
     return daily_counts.values
 
 
@@ -165,66 +152,24 @@ def _compute_burstiness(mean: float, std: float) -> float:
         >>> _compute_burstiness(5.0, 0.0)
         -1.0
     """
-    # Handle edge case: when mean + std == 0, return 0.0
     if mean + std == 0.0:
         return 0.0
 
-    # Calculate burstiness: (std - mean) / (std + mean)
     return (std - mean) / (std + mean)
 
 
-def compute_account_frequency(
+def _compute_frequency_metrics_for_timestamps(
     timestamps: pd.Series,
-) -> Dict[str, float]:
-    """Compute frequency metrics for a single account's transaction timestamps.
-
-    A per-account convenience function that wraps the internal helpers to
-    produce all three frequency metrics for one set of timestamps at a time.
-    For batch processing of a full DataFrame with multiple accounts, see
-    :func:`compute_frequency_metrics` (available after merging #47).
+) -> dict[str, float]:
+    """Compute frequency metrics for one account's validated timestamp series.
 
     Args:
-        timestamps: Transaction timestamps for a single account. Accepts a
-            ``datetime64`` Series or a numeric (Unix epoch seconds) Series.
-            An empty Series is valid and returns all-zero metrics.
+        timestamps: Validated timestamp series for a single account.
 
     Returns:
-        Dictionary with three keys:
-
-        - ``"mean_tx_per_day"``  – mean number of transactions per calendar
-          day over the account's active window (float)
-        - ``"std_tx_per_day"``   – sample standard deviation (ddof=1) of
-          daily counts; 0.0 for a single-day window (float)
-        - ``"burstiness"``       – normalised clustering metric in ``[-1, 1]``
-          (float)
-
-    Notes:
-        - Uses ``ddof=1`` for standard deviation. Returns ``std=0.0`` for
-          accounts whose entire history falls within a single calendar day
-          (only one data point, so sample std is undefined).
-        - Numeric timestamps are treated as Unix epoch **seconds** and
-          converted via ``pd.to_datetime(..., unit="s")``.
-        - An empty Series returns all-zero metrics by convention.
-
-    Examples:
-        >>> import pandas as pd
-        >>> ts = pd.Series(pd.to_datetime(['2024-01-01', '2024-01-01', '2024-01-03']))
-        >>> result = compute_account_frequency(ts)
-        >>> result['mean_tx_per_day']
-        1.0
-        >>> result['std_tx_per_day']
-        1.0
-        >>> result['burstiness']
-        0.0
-
-        Empty timestamps return all-zero metrics:
-
-        >>> compute_account_frequency(pd.Series([], dtype='datetime64[ns]'))
-        {'mean_tx_per_day': 0.0, 'std_tx_per_day': 0.0, 'burstiness': 0.0}
+        Dictionary containing the mean daily transaction count, the sample
+        standard deviation of daily transaction counts, and burstiness.
     """
-    if pd.api.types.is_numeric_dtype(timestamps):
-        timestamps = pd.to_datetime(timestamps, unit="s")
-
     if len(timestamps) == 0:
         return {"mean_tx_per_day": 0.0, "std_tx_per_day": 0.0, "burstiness": 0.0}
 
@@ -238,3 +183,111 @@ def compute_account_frequency(
         "std_tx_per_day": std,
         "burstiness": burstiness,
     }
+
+
+def compute_frequency_metrics(
+    df: pd.DataFrame,
+    timestamp_col: str = "timestamp",
+    account_col: str = "account",
+) -> pd.DataFrame:
+    """Compute frequency metrics for each account in a transaction DataFrame.
+
+    Args:
+        df: Transaction DataFrame containing account and timestamp columns.
+        timestamp_col: Name of the timestamp column.
+        account_col: Name of the account identifier column.
+
+    Returns:
+        DataFrame with one row per account and these columns:
+        ``account_col``, ``mean_tx_per_day``, ``std_tx_per_day``, and
+        ``burstiness``.
+
+    Notes:
+        Validation and timestamp normalization are delegated to
+        :func:`_validate_dataframe`. Metric formulas are delegated to
+        :func:`_compute_frequency_metrics_for_timestamps` so the batch and
+        single-account paths stay consistent.
+    """
+    working_df = df.copy()
+    _validate_dataframe(working_df, timestamp_col=timestamp_col, account_col=account_col)
+
+    metric_rows = []
+    for account_value, account_df in working_df.groupby(account_col, sort=False):
+        metric_rows.append(
+            {
+                account_col: account_value,
+                **_compute_frequency_metrics_for_timestamps(account_df[timestamp_col]),
+            }
+        )
+
+    return pd.DataFrame(metric_rows)
+
+
+def compute_account_frequency(
+    df: pd.DataFrame,
+    account_id: Hashable,
+    timestamp_col: str = "timestamp",
+    account_col: str = "account",
+) -> dict[str, float]:
+    """Compute transaction-frequency metrics for one specified account.
+
+    Args:
+        df: Transaction DataFrame containing at least the account and timestamp
+            columns expected by the batch computation path.
+        account_id: Account identifier whose frequency metrics should be
+            returned.
+        timestamp_col: Name of the timestamp column. Defaults to
+            ``"timestamp"``.
+        account_col: Name of the account identifier column. Defaults to
+            ``"account"``.
+
+    Returns:
+        Dictionary with exactly these keys:
+        ``"mean_tx_per_day"``, ``"std_tx_per_day"``, and ``"burstiness"``.
+
+    Raises:
+        ValueError: If the requested account does not exist in ``account_col``
+            or if the DataFrame fails batch validation.
+
+    Notes:
+        This is a thin wrapper around :func:`compute_frequency_metrics`. It
+        validates the DataFrame using the same path as the batch function,
+        filters to the requested account, and extracts that account's row from
+        the batch result. Single-day behavior, custom column handling, and any
+        edge-case ``NaN`` values are therefore inherited directly from the
+        batch implementation.
+
+    Examples:
+        >>> import pandas as pd
+        >>> df = pd.DataFrame({
+        ...     "account": ["acct-1", "acct-1", "acct-2"],
+        ...     "timestamp": ["2024-01-01", "2024-01-03", "2024-01-02"],
+        ... })
+        >>> compute_account_frequency(df, "acct-1")
+        {'mean_tx_per_day': 0.6666666666666666, 'std_tx_per_day': 0.5773502691896258, 'burstiness': -0.07179676972449088}
+
+        Custom column names are supported when they match the batch API:
+
+        >>> renamed = df.rename(columns={"account": "acct", "timestamp": "ts"})
+        >>> compute_account_frequency(renamed, "acct-2", account_col="acct", timestamp_col="ts")
+        {'mean_tx_per_day': 1.0, 'std_tx_per_day': 0.0, 'burstiness': -1.0}
+    """
+    working_df = df.copy()
+    _validate_dataframe(working_df, timestamp_col=timestamp_col, account_col=account_col)
+
+    account_df = working_df.loc[working_df[account_col] == account_id]
+    if account_df.empty:
+        raise ValueError(f"Account {account_id!r} not found in column '{account_col}'")
+
+    batch_metrics = compute_frequency_metrics(
+        account_df,
+        timestamp_col=timestamp_col,
+        account_col=account_col,
+    )
+    metric_row = batch_metrics.iloc[0]
+
+    return {
+        "mean_tx_per_day": float(metric_row["mean_tx_per_day"]),
+        "std_tx_per_day": float(metric_row["std_tx_per_day"]),
+        "burstiness": float(metric_row["burstiness"]),
+    }
diff --git a/astroml/features/graph/snapshot.py b/astroml/features/graph/snapshot.py
index 45ec817..bc115bf 100644
--- a/astroml/features/graph/snapshot.py
+++ b/astroml/features/graph/snapshot.py
@@ -1,7 +1,8 @@
 from __future__ import annotations
 
 from dataclasses import dataclass
-from typing import Iterable, List, Sequence, Set, Tuple
+from datetime import datetime, timedelta, timezone
+from typing import Generator, Iterable, List, Optional, Sequence, Set, Tuple
 import bisect
 
 
@@ -86,3 +87,119 @@ def snapshot_last_n_days(
     if start_ts < 0:
         start_ts = 0
     return window_snapshot(edges, start_ts, now_ts, presorted=presorted)
+
+
+# ---------------------------------------------------------------------------
+# DB-backed time-windowed snapshot slicer
+# ---------------------------------------------------------------------------
+
+@dataclass(frozen=True)
+class SnapshotWindow:
+    """A discrete time window slice ready for training."""
+    index: int          # 0-based window index (t_0, t_1, …, t_now)
+    start: datetime
+    end: datetime
+    edges: List[Edge]
+    nodes: Set[str]
+
+
+def _parse_window_size(window: str) -> timedelta:
+    """Parse a window size string like '7d', '24h', '3600s' into a timedelta."""
+    unit = window[-1].lower()
+    value = int(window[:-1])
+    if unit == "d":
+        return timedelta(days=value)
+    if unit == "h":
+        return timedelta(hours=value)
+    if unit == "s":
+        return timedelta(seconds=value)
+    raise ValueError(f"Unknown window unit '{unit}'. Use 'd', 'h', or 's'.")
+
+
+def iter_db_snapshots(
+    window: str = "7d",
+    t0: Optional[datetime] = None,
+    t_now: Optional[datetime] = None,
+    step: Optional[str] = None,
+    session=None,
+) -> Generator[SnapshotWindow, None, None]:
+    """Yield discrete time-windowed graph snapshots from the database.
+
+    Slices ``normalized_transactions`` into non-overlapping (or rolling)
+    windows from ``t0`` to ``t_now``, each of size ``window``.
+
+    Args:
+        window: Window size string, e.g. ``'7d'``, ``'24h'``, ``'3600s'``.
+        t0: Start of the first window. Defaults to the earliest timestamp in DB.
+        t_now: End of the last window. Defaults to ``datetime.now(UTC)``.
+        step: Slide step between windows (defaults to ``window`` for non-overlapping).
+              Set smaller than ``window`` for rolling windows.
+        session: SQLAlchemy session. If None, one is created via ``get_session()``.
+
+    Yields:
+        :class:`SnapshotWindow` instances in chronological order.
+    """
+    from astroml.db.schema import NormalizedTransaction
+    from sqlalchemy import select, func as sqlfunc
+
+    if session is None:
+        from astroml.db.session import get_session
+        session = get_session()
+
+    win_delta = _parse_window_size(window)
+    step_delta = _parse_window_size(step) if step else win_delta
+
+    if t_now is None:
+        t_now = datetime.now(timezone.utc)
+
+    if t0 is None:
+        result = session.execute(
+            select(sqlfunc.min(NormalizedTransaction.timestamp))
+        ).scalar()
+        if result is None:
+            return  # empty DB
+        t0 = result if result.tzinfo else result.replace(tzinfo=timezone.utc)
+
+    if t_now.tzinfo is None:
+        t_now = t_now.replace(tzinfo=timezone.utc)
+    if t0.tzinfo is None:
+        t0 = t0.replace(tzinfo=timezone.utc)
+
+    window_start = t0
+    index = 0
+
+    while window_start < t_now:
+        window_end = min(window_start + win_delta, t_now)
+
+        rows = session.execute(
+            select(
+                NormalizedTransaction.sender,
+                NormalizedTransaction.receiver,
+                NormalizedTransaction.timestamp,
+            ).where(
+                NormalizedTransaction.timestamp >= window_start,
+                NormalizedTransaction.timestamp <= window_end,
+                NormalizedTransaction.receiver.isnot(None),
+                NormalizedTransaction.sender != NormalizedTransaction.receiver,
+            ).order_by(NormalizedTransaction.timestamp)
+        ).all()
+
+        edges = [
+            Edge(src=r.sender, dst=r.receiver, timestamp=int(r.timestamp.timestamp()))
+            for r in rows
+        ]
+        nodes: Set[str] = set()
+        for e in edges:
+            nodes.add(e.src)
+            nodes.add(e.dst)
+
+        yield SnapshotWindow(
+            index=index,
+            start=window_start,
+            end=window_end,
+            edges=edges,
+            nodes=nodes,
+        )
+
+        window_start += step_delta
+        index += 1
diff --git a/astroml/features/transaction_graph.py b/astroml/features/transaction_graph.py
index b0c4f42..fa12913 100644
--- a/astroml/features/transaction_graph.py
+++ b/astroml/features/transaction_graph.py
@@ -4,9 +4,11 @@
 transactions between them. Supports weighted edges, multi-asset transactions,
 and export to NetworkX format.
 """
-from typing import Dict, List, Optional, Tuple, Any
+from typing import Dict, List, Optional, Any
 from collections import defaultdict
 
+from astroml.features.asset_typing import AssetType, classify_asset
+
 
 class TransactionGraph:
     """Directed graph representation of account transactions.
@@ -30,7 +32,10 @@ def add_transaction(
         metadata: Optional[Dict[str, Any]] = None
     ) -> None:
         """Add a transaction edge to the graph.
-        
+
+        Self-loop edges (from_account == to_account) are silently dropped to
+        prevent infinite loops during graph traversal.
+
         Args:
             from_account: Source account identifier
             to_account: Destination account identifier
@@ -38,12 +43,21 @@ def add_transaction(
             asset: Asset type (e.g., 'USD', 'BTC', 'ETH')
             metadata: Optional transaction metadata
         """
+        if from_account == to_account:
+            return
+<<<<<<< feat/multi-asset-edge-typing
+
+        edge_type = classify_asset(asset)
+
+=======
+>>>>>>> main
         self.nodes.add(from_account)
         self.nodes.add(to_account)
-        
+
         transaction = {
             "amount": amount,
             "asset": asset,
+            "edge_type": int(edge_type),   # 0=XLM, 1=STABLECOIN, 2=CUSTOM
             "metadata": metadata or {}
         }
         
@@ -176,11 +190,17 @@ def to_networkx(
             for dst in edge_dict[src]:
                 weight = self.get_edge_weight(src, dst, asset, aggregation)
                 edge_attrs = {"weight": weight}
-                
+
                 if include_metadata:
                     transactions = edge_dict[src][dst]
                     edge_attrs["transaction_count"] = len(transactions)
                     edge_attrs["transactions"] = transactions
+
+                # Attach edge_type from the first transaction on this edge
+                # (all transactions on the same src→dst+asset share the same type)
+                txns = edge_dict[src][dst]
+                if txns:
+                    edge_attrs["edge_type"] = txns[0].get("edge_type", int(classify_asset(txns[0]["asset"])))
                 
                 G.add_edge(src, dst, **edge_attrs)
         
diff --git a/astroml/ingestion/normalizer.py b/astroml/ingestion/normalizer.py
index e5dbda9..34b808c 100644
--- a/astroml/ingestion/normalizer.py
+++ b/astroml/ingestion/normalizer.py
@@ -5,33 +5,38 @@
 
 from astroml.db.schema import NormalizedTransaction
 from astroml.ingestion.parsers import (
+    _PATH_PAYMENT_TYPES,
     _extract_amount,
     _extract_asset,
     _extract_destination,
     _parse_datetime,
+    extract_path_payment_hops,
 )
 
 
 def normalize_operation(data: dict) -> NormalizedTransaction:
-    """Transform raw horizon operation data into a NormalizedTransaction."""
+    """Transform raw horizon operation data into a NormalizedTransaction.
+
+    For path payments use :func:`normalize_path_payment_hops` instead to
+    get one record per hop.
+    """
     op_type = data["type"]
     sender = data["source_account"]
     receiver = _extract_destination(data, op_type)
-    
+
     amount_str = _extract_amount(data)
     amount = float(amount_str) if amount_str is not None else None
-    
+
     asset_code, asset_issuer = _extract_asset(data)
-    
+
     if asset_code == "XLM" and asset_issuer is None:
         normalized_asset = "XLM"
     else:
-        # Default fallback to "UNKNOWN" if no explicit asset info is found
         normalized_asset = f"{asset_code}:{asset_issuer}" if asset_code and asset_issuer else "UNKNOWN"
 
     timestamp = _parse_datetime(data["created_at"])
     transaction_hash = data["transaction_hash"]
-    
+
     return NormalizedTransaction(
         transaction_hash=transaction_hash,
         sender=sender,
@@ -40,3 +45,32 @@ def normalize_operation(data: dict) -> NormalizedTransaction:
         amount=amount,
         timestamp=timestamp,
     )
+
+
+def normalize_path_payment_hops(data: dict) -> list[NormalizedTransaction]:
+    """Return one NormalizedTransaction per hop for a path payment operation.
+
+    Falls back to a single record (via :func:`normalize_operation`) for
+    non-path-payment types so callers can use this function uniformly.
+    """
+    if data.get("type") not in _PATH_PAYMENT_TYPES:
+        return [normalize_operation(data)]
+
+    hops = extract_path_payment_hops(data)
+    if not hops:
+        return [normalize_operation(data)]
+
+    timestamp = _parse_datetime(data["created_at"])
+    transaction_hash = data["transaction_hash"]
+
+    return [
+        NormalizedTransaction(
+            transaction_hash=f"{transaction_hash}_hop{hop['hop_index']}",
+            sender=hop["from_account"],
+            receiver=hop["to_account"],
+            asset=hop["asset"],
+            amount=hop["amount"],
+            timestamp=timestamp,
+        )
+        for hop in hops
+    ]
diff --git a/astroml/ingestion/parsers.py b/astroml/ingestion/parsers.py
index 49b6d47..e8a89c5 100644
--- a/astroml/ingestion/parsers.py
+++ b/astroml/ingestion/parsers.py
@@ -145,6 +145,11 @@ def _extract_amount(data: dict) -> Optional[str]:
         return data["amount"]
     if "starting_balance" in data:
         return data["starting_balance"]
+    # For path payments: prefer destination_amount (what receiver gets)
+    if "destination_amount" in data:
+        return data["destination_amount"]
+    if "source_amount" in data:
+        return data["source_amount"]
     return None
 
 
@@ -154,3 +159,70 @@ def _extract_asset(data: dict) -> tuple[Optional[str], Optional[str]]:
     if asset_type == "native":
         return ("XLM", None)
     return (data.get("asset_code"), data.get("asset_issuer"))
+
+
+def extract_path_payment_hops(data: dict) -> list[dict]:
+    """Decompose a path payment into ordered per-hop dicts.
+
+    Each hop dict has keys: from_account, to_account, asset_code,
+    asset_issuer, amount, hop_index, is_first_hop, is_last_hop.
+
+    Returns an empty list for non-path-payment operations.
+    """
+    if data.get("type") not in _PATH_PAYMENT_TYPES:
+        return []
+
+    sender = data["source_account"]
+    receiver = _extract_destination(data, data["type"])
+    path = data.get("path", [])  # intermediate assets
+
+    # Build asset chain: [source_asset, ...path_assets..., dest_asset]
+    def _asset_str(asset_dict: dict) -> str:
+        if asset_dict.get("asset_type") == "native":
+            return "XLM"
+        code = asset_dict.get("asset_code", "UNKNOWN")
+        issuer = asset_dict.get("asset_issuer", "")
+        return f"{code}:{issuer}" if issuer else code
+
+    src_asset_type = data.get("source_asset_type", data.get("asset_type", ""))
+    if src_asset_type == "native":
+        src_asset = "XLM"
+    else:
+        src_code = data.get("source_asset_code", data.get("asset_code", "UNKNOWN"))
+        src_issuer = data.get("source_asset_issuer", data.get("asset_issuer", ""))
+        src_asset = f"{src_code}:{src_issuer}" if src_issuer else src_code
+
+    dst_asset_type = data.get("asset_type", "")
+    if dst_asset_type == "native":
+        dst_asset = "XLM"
+    else:
+        dst_code = data.get("asset_code", "UNKNOWN")
+        dst_issuer = data.get("asset_issuer", "")
+        dst_asset = f"{dst_code}:{dst_issuer}" if dst_issuer else dst_code
+
+    path_assets = [_asset_str(p) for p in path]
+    asset_chain = [src_asset] + path_assets + [dst_asset]
+
+    # Amounts: source_amount on first hop, destination_amount on last hop,
+    # None for intermediate hops (not exposed by Horizon).
+    src_amount = data.get("source_amount")
+    dst_amount = data.get("destination_amount", data.get("amount"))
+
+    # Intermediate accounts are not exposed by Horizon; use sentinel "__path__"
+    # so the graph builder can distinguish them from real accounts.
+    n_hops = len(asset_chain) - 1
+    hops = []
+    for i in range(n_hops):
+        from_acc = sender if i == 0 else f"__path__{data['transaction_hash']}_{i}"
+        to_acc = receiver if i == n_hops - 1 else f"__path__{data['transaction_hash']}_{i + 1}"
+        amount = src_amount if i == 0 else (dst_amount if i == n_hops - 1 else None)
+        hops.append({
+            "from_account": from_acc,
+            "to_account": to_acc,
+            "asset": asset_chain[i],
+            "amount": float(amount) if amount is not None else None,
+            "hop_index": i,
+            "is_first_hop": i == 0,
+            "is_last_hop": i == n_hops - 1,
+        })
+    return hops
diff --git a/astroml/models/__init__.py b/astroml/models/__init__.py
index 7f2adfb..f901fe7 100644
--- a/astroml/models/__init__.py
+++ b/astroml/models/__init__.py
@@ -3,11 +3,14 @@
 from .deep_svdd import DeepSVDD, DeepSVDDNetwork
 from .deep_svdd_trainer import DeepSVDDTrainer, FraudDetectionDeepSVDD
 from .gcn import GCN
+from .link_prediction import LinkPredictor, GCNEncoder
 
 __all__ = [
     'DeepSVDD',
-    'DeepSVDDNetwork', 
+    'DeepSVDDNetwork',
     'DeepSVDDTrainer',
     'FraudDetectionDeepSVDD',
-    'GCN'
+    'GCN',
+    'GCNEncoder',
+    'LinkPredictor',
 ]
\ No newline at end of file
diff --git a/astroml/models/deep_svdd_trainer.py b/astroml/models/deep_svdd_trainer.py
index 31f7801..1713129 100644
--- a/astroml/models/deep_svdd_trainer.py
+++ b/astroml/models/deep_svdd_trainer.py
@@ -16,6 +16,7 @@
 import seaborn as sns
 
 from .deep_svdd import DeepSVDD, DeepSVDDNetwork
+from astroml.tracking import MLflowTracker
 
 
 class DeepSVDDTrainer:
@@ -26,13 +27,15 @@ def __init__(
         model: DeepSVDD,
         device: str = 'cpu',
         patience: int = 10,
-        min_delta: float = 1e-4
+        min_delta: float = 1e-4,
+        tracker: Optional[MLflowTracker] = None,
     ):
         self.model = model
         self.device = device
         self.patience = patience
         self.min_delta = min_delta
-        
+        self.tracker = tracker  # None → no MLflow logging
+
         self.training_history = {
             'train_loss': [],
             'val_loss': [],
@@ -101,15 +104,25 @@ def train(
             if val_loss is not None:
                 self.training_history['val_loss'].append(val_loss)
             self.training_history['radius'].append(radius)
-            
-            # Logging
+
+            # MLflow per-epoch metrics
+            if self.tracker is not None:
+                step_metrics: Dict[str, float] = {
+                    "train_loss": train_loss,
+                    "svdd_radius": radius,
+                }
+                if val_loss is not None:
+                    step_metrics["val_loss"] = val_loss
+                self.tracker.log_metrics(step_metrics, step=epoch)
+
+            # Console logging
             if epoch % 10 == 0:
                 log_msg = f"Epoch {epoch}: Train Loss = {train_loss:.4f}"
                 if val_loss is not None:
                     log_msg += f", Val Loss = {val_loss:.4f}"
                 log_msg += f", Radius = {radius:.4f}"
                 print(log_msg)
-        
+
         return self.training_history
     
     def _train_epoch(
@@ -254,7 +267,7 @@ def _get_scheduler(
             raise ValueError(f"Unknown scheduler type: {scheduler_type}")
     
     def _save_checkpoint(self):
-        """Save best model checkpoint."""
+        """Save best model checkpoint and log it to MLflow."""
         checkpoint = {
             'model_state_dict': self.model.state_dict(),
             'center': self.model.center,
@@ -262,6 +275,12 @@ def _save_checkpoint(self):
             'training_history': self.training_history
         }
         torch.save(checkpoint, 'best_deep_svdd.pth')
+        if self.tracker is not None:
+            self.tracker.log_model_artifact(
+                self.model,
+                artifact_path="model",
+                checkpoint_path="best_deep_svdd.pth",
+            )
     
     def load_checkpoint(self, checkpoint_path: str):
         """Load model from checkpoint."""
@@ -280,46 +299,57 @@ def evaluate(
         self,
         X: np.ndarray,
         y: np.ndarray,
-        threshold_percentile: float = 95.0
+        threshold_percentile: float = 95.0,
     ) -> Dict[str, float]:
-        """Evaluate model performance."""
-        
+        """Evaluate model performance and log results to MLflow."""
+
         # Get anomaly scores
         scores = self.model.predict(X)
-        
+
         # Determine threshold
         threshold = np.percentile(scores, threshold_percentile)
         predictions = (scores > threshold).astype(int)
-        
+
         # Calculate metrics
-        metrics = {}
-        
+        metrics: Dict[str, float] = {}
+
         # AUC-ROC
         try:
             metrics['auc_roc'] = roc_auc_score(y, scores)
         except ValueError:
             metrics['auc_roc'] = 0.0
-        
+
         # AUC-PR
         try:
             precision, recall, _ = precision_recall_curve(y, scores)
             metrics['auc_pr'] = auc(recall, precision)
         except ValueError:
             metrics['auc_pr'] = 0.0
-        
+
         # Basic classification metrics
         tp = np.sum((predictions == 1) & (y == 1))
         fp = np.sum((predictions == 1) & (y == 0))
         fn = np.sum((predictions == 0) & (y == 1))
         tn = np.sum((predictions == 0) & (y == 0))
-        
+
         metrics['precision'] = tp / (tp + fp) if (tp + fp) > 0 else 0
         metrics['recall'] = tp / (tp + fn) if (tp + fn) > 0 else 0
         metrics['f1'] = 2 * metrics['precision'] * metrics['recall'] / (
             metrics['precision'] + metrics['recall']
         ) if (metrics['precision'] + metrics['recall']) > 0 else 0
         metrics['accuracy'] = (tp + tn) / (tp + fp + fn + tn)
-        
+
+        # Log evaluation metrics (ROC-AUC, precision, recall, f1, accuracy)
+        if self.tracker is not None:
+            self.tracker.log_metrics({
+                "eval_roc_auc": metrics['auc_roc'],
+                "eval_auc_pr": metrics['auc_pr'],
+                "eval_precision": metrics['precision'],
+                "eval_recall": metrics['recall'],
+                "eval_f1": metrics['f1'],
+                "eval_accuracy": metrics['accuracy'],
+            })
+
         return metrics
 
 
diff --git a/astroml/models/gcn.py b/astroml/models/gcn.py
index a9cba66..3575c52 100644
--- a/astroml/models/gcn.py
+++ b/astroml/models/gcn.py
@@ -1,35 +1,22 @@
-import torch
 import torch.nn as nn
 import torch.nn.functional as F
 from torch_geometric.nn import GCNConv
 
 
 class GCN(nn.Module):
-    """
-    Configurable Graph Convolutional Network for node classification.
+    """Standard 2-layer Graph Convolutional Network for node classification.
+
+    Architecture: GCNConv -> ReLU -> Dropout -> GCNConv -> log_softmax
     """
 
-    def __init__(self, input_dim, hidden_dims, output_dim, dropout=0.5):
+    def __init__(self, input_dim: int, hidden_dim: int, output_dim: int, dropout: float = 0.5):
         super().__init__()
-
-        self.convs = nn.ModuleList()
+        self.conv1 = GCNConv(input_dim, hidden_dim)
+        self.conv2 = GCNConv(hidden_dim, output_dim)
         self.dropout = dropout
 
-        # Input layer
-        self.convs.append(GCNConv(input_dim, hidden_dims[0]))
-
-        # Hidden layers
-        for i in range(len(hidden_dims) - 1):
-            self.convs.append(GCNConv(hidden_dims[i], hidden_dims[i + 1]))
-
-        # Output layer
-        self.convs.append(GCNConv(hidden_dims[-1], output_dim))
-
     def forward(self, x, edge_index):
-        for conv in self.convs[:-1]:
-            x = conv(x, edge_index)
-            x = F.relu(x)
-            x = F.dropout(x, p=self.dropout, training=self.training)
-
-        x = self.convs[-1](x, edge_index)
-        return F.log_softmax(x, dim=1)
\ No newline at end of file
+        x = F.relu(self.conv1(x, edge_index))
+        x = F.dropout(x, p=self.dropout, training=self.training)
+        x = self.conv2(x, edge_index)
+        return F.log_softmax(x, dim=1)
diff --git a/astroml/models/link_prediction.py b/astroml/models/link_prediction.py
new file mode 100644
index 0000000..ecd125a
--- /dev/null
+++ b/astroml/models/link_prediction.py
@@ -0,0 +1,196 @@
+"""Self-supervised link prediction model for AstroML.
+
+Predicts whether two accounts will transact in the next N ledgers.
+
+Architecture
+------------
+* **Encoder** — a stack of GCN layers (reuses :class:`~astroml.models.gcn.GCN`
+  internals) that produces one embedding vector per node.
+* **Decoder** — scores a candidate edge (u, v) using either a dot product
+  between the two node embeddings or a small MLP over their concatenation.
+
+The model is trained with a binary cross-entropy objective on positive
+(observed future) edges and randomly sampled negative (non-)edges — a
+standard self-supervised link prediction setup.
+"""
+from __future__ import annotations
+
+from typing import Literal, Optional, Tuple
+
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch_geometric.nn import GCNConv
+
+
+class GCNEncoder(nn.Module):
+    """GCN encoder that produces node embeddings.
+
+    Args:
+        input_dim: Dimension of input node features.
+        hidden_dims: Sizes of intermediate GCN layers.
+        embedding_dim: Size of the final node embedding.
+        dropout: Dropout probability applied between layers.
+    """
+
+    def __init__(
+        self,
+        input_dim: int,
+        hidden_dims: list[int],
+        embedding_dim: int,
+        dropout: float = 0.5,
+    ) -> None:
+        super().__init__()
+        self.dropout = dropout
+
+        dims = [input_dim] + hidden_dims + [embedding_dim]
+        self.convs = nn.ModuleList(
+            [GCNConv(dims[i], dims[i + 1]) for i in range(len(dims) - 1)]
+        )
+
+    def forward(self, x: torch.Tensor, edge_index: torch.Tensor) -> torch.Tensor:
+        """Return node embeddings of shape ``[N, embedding_dim]``."""
+        for conv in self.convs[:-1]:
+            x = conv(x, edge_index)
+            x = F.relu(x)
+            x = F.dropout(x, p=self.dropout, training=self.training)
+        x = self.convs[-1](x, edge_index)
+        return x
+
+
+class LinkPredictor(nn.Module):
+    """Link prediction model for self-supervised training.
+
+    Combines a GCN encoder with a scoring decoder to predict whether
+    two accounts will transact within the next N ledgers.
+
+    Args:
+        input_dim: Dimension of input node features.
+        hidden_dims: Sizes of intermediate GCN encoder layers.
+        embedding_dim: Size of the final node embedding.
+        dropout: Dropout applied inside the encoder.
+        decoder: ``"dot"`` uses a dot product; ``"mlp"`` uses a two-layer
+            MLP over the concatenated pair embeddings.
+
+    Example::
+
+        model = LinkPredictor(input_dim=128, hidden_dims=[64], embedding_dim=32)
+        z = model.encode(x, edge_index)           # [N, 32]
+        scores = model.decode(z, pos_edge_index)  # [E] logits
+        loss = model.loss(z, pos_edge_index, neg_edge_index)
+    """
+
+    def __init__(
+        self,
+        input_dim: int,
+        hidden_dims: list[int],
+        embedding_dim: int,
+        dropout: float = 0.5,
+        decoder: Literal["dot", "mlp"] = "dot",
+    ) -> None:
+        super().__init__()
+        self.decoder_type = decoder
+        self.encoder = GCNEncoder(input_dim, hidden_dims, embedding_dim, dropout)
+
+        if decoder == "mlp":
+            self.mlp = nn.Sequential(
+                nn.Linear(2 * embedding_dim, embedding_dim),
+                nn.ReLU(),
+                nn.Linear(embedding_dim, 1),
+            )
+
+    # ------------------------------------------------------------------
+    # Forward helpers
+    # ------------------------------------------------------------------
+
+    def encode(self, x: torch.Tensor, edge_index: torch.Tensor) -> torch.Tensor:
+        """Run the GCN encoder and return node embeddings ``[N, embedding_dim]``."""
+        return self.encoder(x, edge_index)
+
+    def decode(self, z: torch.Tensor, edge_index: torch.Tensor) -> torch.Tensor:
+        """Score candidate edges.
+
+        Args:
+            z: Node embeddings ``[N, embedding_dim]``.
+            edge_index: Candidate edges ``[2, E]``.
+
+        Returns:
+            Raw logits ``[E]`` (apply sigmoid for probabilities).
+        """
+        src, dst = edge_index[0], edge_index[1]
+        if self.decoder_type == "dot":
+            return (z[src] * z[dst]).sum(dim=-1)
+        else:
+            pair = torch.cat([z[src], z[dst]], dim=-1)
+            return self.mlp(pair).squeeze(-1)
+
+    def decode_all(self, z: torch.Tensor) -> torch.Tensor:
+        """Score every possible node pair (dense, O(N²)).
+
+        Only suitable for small graphs / evaluation.  Returns a ``[N, N]``
+        matrix of raw logits.
+        """
+        if self.decoder_type == "dot":
+            return z @ z.t()
+        else:
+            N = z.size(0)
+            src = torch.arange(N, device=z.device).repeat_interleave(N)
+            dst = torch.arange(N, device=z.device).repeat(N)
+            pair = torch.cat([z[src], z[dst]], dim=-1)
+            return self.mlp(pair).squeeze(-1).view(N, N)
+
+    # ------------------------------------------------------------------
+    # Loss
+    # ------------------------------------------------------------------
+
+    def loss(
+        self,
+        z: torch.Tensor,
+        pos_edge_index: torch.Tensor,
+        neg_edge_index: torch.Tensor,
+    ) -> torch.Tensor:
+        """Binary cross-entropy loss over positive and negative edges.
+
+        Args:
+            z: Node embeddings produced by :meth:`encode`.
+            pos_edge_index: Positive (observed future) edges ``[2, E_pos]``.
+            neg_edge_index: Negative (sampled non-)edges ``[2, E_neg]``.
+
+        Returns:
+            Scalar BCE loss.
+        """
+        pos_scores = self.decode(z, pos_edge_index)
+        neg_scores = self.decode(z, neg_edge_index)
+
+        scores = torch.cat([pos_scores, neg_scores], dim=0)
+        labels = torch.cat(
+            [
+                torch.ones(pos_scores.size(0), device=z.device),
+                torch.zeros(neg_scores.size(0), device=z.device),
+            ],
+            dim=0,
+        )
+        return F.binary_cross_entropy_with_logits(scores, labels)
+
+    # ------------------------------------------------------------------
+    # Convenience forward
+    # ------------------------------------------------------------------
+
+    def forward(
+        self,
+        x: torch.Tensor,
+        edge_index: torch.Tensor,
+        query_edge_index: torch.Tensor,
+    ) -> torch.Tensor:
+        """Encode then decode query edges.
+
+        Args:
+            x: Node features ``[N, input_dim]``.
+            edge_index: Graph connectivity used for message passing ``[2, E]``.
+            query_edge_index: Edges to score ``[2, Q]``.
+
+        Returns:
+            Raw logits ``[Q]``.
+        """
+        z = self.encode(x, edge_index)
+        return self.decode(z, query_edge_index)
diff --git a/astroml/tasks/__init__.py b/astroml/tasks/__init__.py
new file mode 100644
index 0000000..684fc29
--- /dev/null
+++ b/astroml/tasks/__init__.py
@@ -0,0 +1,3 @@
+from .link_prediction_task import LinkPredictionTask, LedgerSplit
+
+__all__ = ["LinkPredictionTask", "LedgerSplit"]
diff --git a/astroml/tasks/link_prediction_task.py b/astroml/tasks/link_prediction_task.py
new file mode 100644
index 0000000..7a1bca1
--- /dev/null
+++ b/astroml/tasks/link_prediction_task.py
@@ -0,0 +1,359 @@
+"""Self-supervised link prediction training task.
+
+Training objective
+------------------
+Given a stream of timestamped transactions (edges) between accounts (nodes):
+
+1. Take a **context window** ending at ledger ``L``.
+2. The **positive set** = edges that appear within the next ``N`` ledgers
+   (the "future" window).
+3. The **negative set** = the same number of (u, v) pairs sampled uniformly
+   from account pairs that did *not* transact in the future window.
+4. Train :class:`~astroml.models.link_prediction.LinkPredictor` to
+   distinguish positives from negatives using binary cross-entropy.
+
+This is a **self-supervised** objective: no manual labels are needed —
+the future transaction graph itself provides supervision.
+
+Key classes
+-----------
+* :class:`LedgerSplit` — dataclass holding one context/future pair of edge
+  sets, plus the node index mapping needed to build ``edge_index`` tensors.
+* :class:`LinkPredictionTask` — orchestrates splitting, negative sampling,
+  and per-step training.
+"""
+from __future__ import annotations
+
+import logging
+import random
+from dataclasses import dataclass, field
+from typing import Dict, List, Optional, Sequence, Set, Tuple
+
+import torch
+import torch.nn as nn
+
+from astroml.features.graph.snapshot import Edge, window_snapshot
+
+logger = logging.getLogger(__name__)
+
+
+# ---------------------------------------------------------------------------
+# Data structures
+# ---------------------------------------------------------------------------
+
+@dataclass
+class LedgerSplit:
+    """One context / future pair used for a single training step.
+
+    Attributes:
+        context_edges: Edges used as the input graph (message-passing graph).
+        future_edges: Edges observed in the next N ledgers (positive labels).
+        node_index: Mapping from account-id string → contiguous integer index.
+    """
+    context_edges: List[Edge]
+    future_edges: List[Edge]
+    node_index: Dict[str, int]
+
+    @property
+    def num_nodes(self) -> int:
+        return len(self.node_index)
+
+    def to_edge_index(self, edges: List[Edge], device: torch.device = torch.device("cpu")) -> torch.Tensor:
+        """Convert an edge list to a PyG-style ``[2, E]`` LongTensor.
+
+        Unknown accounts (not in ``node_index``) are silently skipped.
+        """
+        pairs = [
+            (self.node_index[e.src], self.node_index[e.dst])
+            for e in edges
+            if e.src in self.node_index and e.dst in self.node_index
+        ]
+        if not pairs:
+            return torch.zeros(2, 0, dtype=torch.long, device=device)
+        src, dst = zip(*pairs)
+        return torch.tensor([list(src), list(dst)], dtype=torch.long, device=device)
+
+
+# ---------------------------------------------------------------------------
+# Negative edge sampling
+# ---------------------------------------------------------------------------
+
+def sample_negative_edges(
+    num_nodes: int,
+    positive_set: Set[Tuple[int, int]],
+    num_samples: int,
+    max_attempts: int = 10,
+    rng: Optional[random.Random] = None,
+) -> List[Tuple[int, int]]:
+    """Sample (u, v) pairs that are not in *positive_set*.
+
+    Samples without replacement up to *num_samples* unique non-edges.
+    Gives up after ``num_samples * max_attempts`` draws to avoid infinite
+    loops on very dense graphs, returning however many non-edges were found.
+
+    Args:
+        num_nodes: Total number of nodes.
+        positive_set: Set of ``(src, dst)`` integer pairs to exclude.
+        num_samples: Desired number of negative samples.
+        max_attempts: Multiplier for the draw budget.
+        rng: Optional seeded :class:`random.Random` instance for
+            reproducibility.
+
+    Returns:
+        List of ``(src, dst)`` integer pairs.
+    """
+    if rng is None:
+        rng = random.Random()
+
+    negatives: List[Tuple[int, int]] = []
+    seen: Set[Tuple[int, int]] = set(positive_set)
+    budget = num_samples * max_attempts
+
+    while len(negatives) < num_samples and budget > 0:
+        u = rng.randrange(num_nodes)
+        v = rng.randrange(num_nodes)
+        budget -= 1
+        if u == v or (u, v) in seen:
+            continue
+        negatives.append((u, v))
+        seen.add((u, v))
+
+    return negatives
+
+
+# ---------------------------------------------------------------------------
+# Main task
+# ---------------------------------------------------------------------------
+
+class LinkPredictionTask:
+    """Self-supervised link prediction over a Stellar ledger stream.
+
+    Splits a sorted edge sequence into overlapping (context, future) windows
+    keyed by ledger sequence number.  For each window a training step is run
+    that optimises :class:`~astroml.models.link_prediction.LinkPredictor`
+    to predict whether two accounts will transact in the next ``n_future``
+    ledgers.
+
+    Args:
+        edges: Full edge sequence sorted by ``timestamp`` (ledger sequence).
+        n_future: Number of ledgers ahead to use as the positive label window.
+        context_ledgers: Number of ledgers in each context window.  If
+            ``None`` all edges before the future window are used as context.
+        neg_sampling_ratio: Ratio of negative to positive edges per step.
+        device: PyTorch device for tensor operations.
+        seed: Random seed for reproducible negative sampling.
+
+    Example::
+
+        task = LinkPredictionTask(edges, n_future=10, context_ledgers=100)
+        splits = task.build_splits()
+        for split in splits:
+            loss = task.train_step(model, optimizer, split, node_features)
+    """
+
+    def __init__(
+        self,
+        edges: Sequence[Edge],
+        n_future: int = 10,
+        context_ledgers: Optional[int] = None,
+        neg_sampling_ratio: float = 1.0,
+        device: torch.device = torch.device("cpu"),
+        seed: Optional[int] = None,
+    ) -> None:
+        if n_future < 1:
+            raise ValueError(f"n_future must be >= 1, got {n_future}")
+        if neg_sampling_ratio <= 0:
+            raise ValueError(f"neg_sampling_ratio must be > 0, got {neg_sampling_ratio}")
+
+        self.edges = sorted(edges, key=lambda e: e.timestamp)
+        self.n_future = n_future
+        self.context_ledgers = context_ledgers
+        self.neg_sampling_ratio = neg_sampling_ratio
+        self.device = device
+        self._rng = random.Random(seed)
+
+    # ------------------------------------------------------------------
+    # Build splits
+    # ------------------------------------------------------------------
+
+    def build_splits(self) -> List[LedgerSplit]:
+        """Enumerate non-overlapping (context, future) window pairs.
+
+        Iterates over unique ledger timestamps, using each as the boundary
+        between context and future:
+
+        * **context** = edges with ``timestamp < boundary``  (optionally
+          capped to the last ``context_ledgers`` distinct ledgers)
+        * **future** = edges with ``boundary <= timestamp < boundary + n_future``
+
+        Pairs where either partition is empty are skipped.
+
+        Returns:
+            List of :class:`LedgerSplit` objects, one per boundary ledger.
+        """
+        if not self.edges:
+            return []
+
+        ledger_seqs = sorted({e.timestamp for e in self.edges})
+        splits: List[LedgerSplit] = []
+
+        for i, boundary in enumerate(ledger_seqs):
+            future_end = boundary + self.n_future
+
+            context_edges = [e for e in self.edges if e.timestamp < boundary]
+            future_edges = [e for e in self.edges if boundary <= e.timestamp < future_end]
+
+            if not context_edges or not future_edges:
+                continue
+
+            # Optionally restrict context to last N ledgers.
+            if self.context_ledgers is not None:
+                context_seqs = sorted({e.timestamp for e in context_edges})
+                if len(context_seqs) > self.context_ledgers:
+                    cutoff_seq = context_seqs[-self.context_ledgers]
+                    context_edges = [e for e in context_edges if e.timestamp >= cutoff_seq]
+
+            # Build a shared node index across both windows.
+            accounts: Set[str] = set()
+            for e in context_edges + future_edges:
+                accounts.add(e.src)
+                accounts.add(e.dst)
+            node_index = {acc: idx for idx, acc in enumerate(sorted(accounts))}
+
+            splits.append(LedgerSplit(
+                context_edges=context_edges,
+                future_edges=future_edges,
+                node_index=node_index,
+            ))
+
+        logger.info("Built %d link-prediction splits (n_future=%d)", len(splits), self.n_future)
+        return splits
+
+    # ------------------------------------------------------------------
+    # Negative sampling
+    # ------------------------------------------------------------------
+
+    def sample_negatives(self, split: LedgerSplit) -> torch.Tensor:
+        """Sample negative edges for *split*.
+
+        Returns a ``[2, E_neg]`` LongTensor of (src, dst) pairs that do not
+        appear in ``split.future_edges``.
+        """
+        n_pos = max(1, len(split.future_edges))
+        n_neg = max(1, int(n_pos * self.neg_sampling_ratio))
+
+        pos_set: Set[Tuple[int, int]] = set()
+        for e in split.future_edges:
+            if e.src in split.node_index and e.dst in split.node_index:
+                pos_set.add((split.node_index[e.src], split.node_index[e.dst]))
+
+        neg_pairs = sample_negative_edges(
+            num_nodes=split.num_nodes,
+            positive_set=pos_set,
+            num_samples=n_neg,
+            rng=self._rng,
+        )
+
+        if not neg_pairs:
+            return torch.zeros(2, 0, dtype=torch.long, device=self.device)
+
+        src, dst = zip(*neg_pairs)
+        return torch.tensor([list(src), list(dst)], dtype=torch.long, device=self.device)
+
+    # ------------------------------------------------------------------
+    # Training step
+    # ------------------------------------------------------------------
+
+    def train_step(
+        self,
+        model: nn.Module,
+        optimizer: torch.optim.Optimizer,
+        split: LedgerSplit,
+        node_features: torch.Tensor,
+    ) -> float:
+        """Run one gradient update step on *split*.
+
+        Args:
+            model: :class:`~astroml.models.link_prediction.LinkPredictor`
+                instance.
+            optimizer: Torch optimizer (e.g. Adam).
+            split: One :class:`LedgerSplit` produced by :meth:`build_splits`.
+            node_features: Node feature matrix ``[split.num_nodes, F]`` on
+                the correct device.
+
+        Returns:
+            Scalar loss value for this step.
+        """
+        model.train()
+        optimizer.zero_grad()
+
+        context_edge_index = split.to_edge_index(split.context_edges, device=self.device)
+        pos_edge_index = split.to_edge_index(split.future_edges, device=self.device)
+        neg_edge_index = self.sample_negatives(split)
+
+        if pos_edge_index.size(1) == 0:
+            return 0.0
+
+        z = model.encode(node_features, context_edge_index)
+        loss = model.loss(z, pos_edge_index, neg_edge_index)
+        loss.backward()
+        optimizer.step()
+
+        return loss.item()
+
+    # ------------------------------------------------------------------
+    # Evaluation
+    # ------------------------------------------------------------------
+
+    def evaluate(
+        self,
+        model: nn.Module,
+        split: LedgerSplit,
+        node_features: torch.Tensor,
+    ) -> Dict[str, float]:
+        """Evaluate link prediction on *split*.
+
+        Computes:
+        * **auc** — area under the ROC curve
+        * **avg_precision** — average precision (area under PR curve)
+
+        Args:
+            model: Trained :class:`~astroml.models.link_prediction.LinkPredictor`.
+            split: A held-out :class:`LedgerSplit`.
+            node_features: Node feature matrix ``[split.num_nodes, F]``.
+
+        Returns:
+            Dict with ``"auc"`` and ``"avg_precision"`` keys.
+        """
+        from sklearn.metrics import roc_auc_score, average_precision_score
+
+        model.eval()
+        with torch.no_grad():
+            context_edge_index = split.to_edge_index(split.context_edges, device=self.device)
+            pos_edge_index = split.to_edge_index(split.future_edges, device=self.device)
+            neg_edge_index = self.sample_negatives(split)
+
+            z = model.encode(node_features, context_edge_index)
+
+            pos_scores = torch.sigmoid(model.decode(z, pos_edge_index)).cpu().numpy()
+            neg_scores = torch.sigmoid(model.decode(z, neg_edge_index)).cpu().numpy()
+
+        import numpy as np
+        scores = np.concatenate([pos_scores, neg_scores])
+        labels = np.concatenate([
+            np.ones(len(pos_scores)),
+            np.zeros(len(neg_scores)),
+        ])
+
+        metrics: Dict[str, float] = {}
+        try:
+            metrics["auc"] = float(roc_auc_score(labels, scores))
+        except ValueError:
+            metrics["auc"] = 0.5
+
+        try:
+            metrics["avg_precision"] = float(average_precision_score(labels, scores))
+        except ValueError:
+            metrics["avg_precision"] = 0.0
+
+        return metrics
diff --git a/astroml/tracking/__init__.py b/astroml/tracking/__init__.py
new file mode 100644
index 0000000..e2b03d5
--- /dev/null
+++ b/astroml/tracking/__init__.py
@@ -0,0 +1,3 @@
+from .mlflow_tracker import MLflowTracker
+
+__all__ = ["MLflowTracker"]
diff --git a/astroml/tracking/mlflow_tracker.py b/astroml/tracking/mlflow_tracker.py
new file mode 100644
index 0000000..e4bcc5d
--- /dev/null
+++ b/astroml/tracking/mlflow_tracker.py
@@ -0,0 +1,129 @@
+"""MLflow experiment tracking integration for AstroML."""
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import Any, Dict, Optional
+
+import numpy as np
+import torch
+import torch.nn as nn
+
+logger = logging.getLogger(__name__)
+
+
+class MLflowTracker:
+    """Thin MLflow wrapper used by training scripts.
+
+    Gracefully degrades to a no-op when MLflow is not installed or
+    when ``enabled=False`` so training still works without the dependency.
+    """
+
+    def __init__(
+        self,
+        enabled: bool = True,
+        tracking_uri: str = "mlruns",
+        experiment_name: str = "astroml_experiment",
+        run_name: Optional[str] = None,
+        log_model_weights: bool = True,
+    ):
+        self.enabled = enabled
+        self.log_model_weights = log_model_weights
+        self._run = None
+
+        if not self.enabled:
+            return
+
+        try:
+            import mlflow
+
+            self._mlflow = mlflow
+            mlflow.set_tracking_uri(tracking_uri)
+            mlflow.set_experiment(experiment_name)
+            self._run = mlflow.start_run(run_name=run_name)
+            logger.info(
+                "MLflow run started | experiment=%s run_id=%s",
+                experiment_name,
+                self._run.info.run_id,
+            )
+        except ImportError:
+            logger.warning(
+                "mlflow package not found — tracking disabled. "
+                "Install it with: pip install mlflow"
+            )
+            self.enabled = False
+
+    # ------------------------------------------------------------------
+    # Public helpers
+    # ------------------------------------------------------------------
+
+    def log_params(self, params: Dict[str, Any]) -> None:
+        """Log a flat dictionary of hyper-parameters."""
+        if not self.enabled or self._run is None:
+            return
+        self._mlflow.log_params(params)
+
+    def log_metric(self, key: str, value: float, step: Optional[int] = None) -> None:
+        """Log a single scalar metric."""
+        if not self.enabled or self._run is None:
+            return
+        self._mlflow.log_metric(key, value, step=step)
+
+    def log_metrics(self, metrics: Dict[str, float], step: Optional[int] = None) -> None:
+        """Log multiple scalar metrics at once."""
+        if not self.enabled or self._run is None:
+            return
+        self._mlflow.log_metrics(metrics, step=step)
+
+    def log_model_artifact(
+        self,
+        model: nn.Module,
+        artifact_path: str = "model",
+        checkpoint_path: Optional[str] = None,
+    ) -> None:
+        """Log model weights as an MLflow artifact.
+
+        Saves ``model.state_dict()`` to a temporary ``.pth`` file and
+        uploads it.  If *checkpoint_path* already exists on disk it is
+        uploaded directly (avoids a redundant save).
+        """
+        if not self.enabled or self._run is None or not self.log_model_weights:
+            return
+
+        import tempfile, os
+
+        if checkpoint_path and Path(checkpoint_path).exists():
+            self._mlflow.log_artifact(checkpoint_path, artifact_path=artifact_path)
+        else:
+            with tempfile.NamedTemporaryFile(suffix=".pth", delete=False) as tmp:
+                torch.save(model.state_dict(), tmp.name)
+                self._mlflow.log_artifact(tmp.name, artifact_path=artifact_path)
+                os.unlink(tmp.name)
+
+    def log_roc_auc(self, y_true: np.ndarray, y_score: np.ndarray, step: Optional[int] = None) -> None:
+        """Compute and log ROC-AUC."""
+        if not self.enabled or self._run is None:
+            return
+        try:
+            from sklearn.metrics import roc_auc_score
+
+            auc = roc_auc_score(y_true, y_score)
+            self.log_metric("roc_auc", auc, step=step)
+        except Exception as exc:
+            logger.warning("Could not compute ROC-AUC: %s", exc)
+
+    def end(self) -> None:
+        """End the active MLflow run."""
+        if self.enabled and self._run is not None:
+            self._mlflow.end_run()
+            logger.info("MLflow run ended.")
+
+    # ------------------------------------------------------------------
+    # Context-manager support
+    # ------------------------------------------------------------------
+
+    def __enter__(self) -> "MLflowTracker":
+        return self
+
+    def __exit__(self, *_: Any) -> None:
+        self.end()
diff --git a/astroml/training/__init__.py b/astroml/training/__init__.py
index e69de29..7b20318 100644
--- a/astroml/training/__init__.py
+++ b/astroml/training/__init__.py
@@ -0,0 +1,9 @@
+from . import temporal_split
+from .temporal_split import TemporalSplitter, temporal_graph_split, validate_graph_split
+
+__all__ = [
+    "temporal_split",
+    "TemporalSplitter",
+    "temporal_graph_split",
+    "validate_graph_split",
+]
diff --git a/astroml/training/temporal_split.py b/astroml/training/temporal_split.py
new file mode 100644
index 0000000..589fd95
--- /dev/null
+++ b/astroml/training/temporal_split.py
@@ -0,0 +1,289 @@
+"""Temporal train/test split utilities for AstroML.
+
+Ensures strict "past-trains-on-future" ordering with no data leakage for
+both flat tabular data (pandas DataFrames) and graph edge data.
+
+Three public entry-points:
+
+* :func:`temporal_train_test_split` — DataFrame splitter (re-exported from
+  :mod:`astroml.validation.leakage` for convenience).
+* :func:`temporal_graph_split` — splits a sequence of
+  :class:`~astroml.features.graph.snapshot.Edge` objects into train/test
+  edge sets with strict temporal ordering.
+* :class:`TemporalSplitter` — thin config-driven wrapper that dispatches
+  to the correct function and validates the result.
+"""
+from __future__ import annotations
+
+import warnings
+from dataclasses import dataclass
+from typing import Any, List, Optional, Sequence, Tuple
+
+import numpy as np
+
+
+# ---------------------------------------------------------------------------
+# Re-export DataFrame splitter so callers can import from one place
+# ---------------------------------------------------------------------------
+from astroml.validation.leakage import (  # noqa: F401
+    temporal_train_test_split,
+    validate_temporal_split,
+    LeakageError,
+)
+
+try:
+    from astroml.features.graph.snapshot import Edge
+except ImportError:  # allow import without torch-geometric
+    Edge = Any  # type: ignore[misc,assignment]
+
+
+# ---------------------------------------------------------------------------
+# Graph temporal split
+# ---------------------------------------------------------------------------
+
+@dataclass
+class GraphSplitResult:
+    """Holds the output of :func:`temporal_graph_split`.
+
+    Attributes:
+        train_edges: Edges whose timestamp is strictly before the cutoff.
+        test_edges: Edges whose timestamp is >= the cutoff (or the last
+            ``test_ratio`` fraction when no explicit cutoff is given).
+        cutoff: The timestamp value used as the boundary.
+    """
+    train_edges: List[Any]
+    test_edges: List[Any]
+    cutoff: Any
+
+
+def temporal_graph_split(
+    edges: Sequence[Any],
+    *,
+    cutoff: Optional[Any] = None,
+    train_ratio: float = 0.8,
+    time_attr: str = "timestamp",
+) -> GraphSplitResult:
+    """Split graph edges into temporal train/test partitions.
+
+    Edges are split so that **all training edges precede all test edges**
+    in time — no future information leaks into training.
+
+    Two modes:
+
+    * **Cutoff mode** (``cutoff`` is provided): edges with
+      ``edge.{time_attr} < cutoff`` → train; the rest → test.
+    * **Ratio mode** (default): edges are sorted by *time_attr* and split
+      at ``int(len(edges) * train_ratio)``.
+
+    Args:
+        edges: Sequence of objects with a numeric or comparable
+            *time_attr* attribute (e.g.
+            :class:`~astroml.features.graph.snapshot.Edge` instances).
+        cutoff: Explicit temporal boundary.  When provided, *train_ratio*
+            is ignored.
+        train_ratio: Fraction of (sorted) edges assigned to training when
+            no *cutoff* is given.  Must be in ``(0, 1)``.
+        time_attr: Attribute name on each edge object used as the
+            timestamp.  Defaults to ``"timestamp"``.
+
+    Returns:
+        :class:`GraphSplitResult` with ``train_edges``, ``test_edges``,
+        and the resolved ``cutoff``.
+
+    Raises:
+        ValueError: If *edges* is empty, *train_ratio* is out of range, or
+            any edge is missing *time_attr*.
+        LeakageError: If the resulting partitions overlap temporally (only
+            possible in cutoff mode if the caller supplies a degenerate
+            cutoff).
+    """
+    edges = list(edges)
+    if not edges:
+        return GraphSplitResult(train_edges=[], test_edges=[], cutoff=cutoff)
+
+    # Validate that every edge has the expected attribute.
+    for e in edges:
+        if not hasattr(e, time_attr):
+            raise ValueError(
+                f"Edge object {e!r} has no attribute '{time_attr}'"
+            )
+
+    if cutoff is not None:
+        train_edges = [e for e in edges if getattr(e, time_attr) < cutoff]
+        test_edges = [e for e in edges if getattr(e, time_attr) >= cutoff]
+        resolved_cutoff = cutoff
+    else:
+        if not (0 < train_ratio < 1):
+            raise ValueError(
+                f"train_ratio must be in (0, 1), got {train_ratio}"
+            )
+        sorted_edges = sorted(edges, key=lambda e: getattr(e, time_attr))
+        split_idx = int(len(sorted_edges) * train_ratio)
+        train_edges = sorted_edges[:split_idx]
+        test_edges = sorted_edges[split_idx:]
+        # Resolved cutoff = first timestamp in the test set (or None if empty).
+        resolved_cutoff = (
+            getattr(test_edges[0], time_attr) if test_edges else None
+        )
+
+    # Warn on empty partitions.
+    if not train_edges:
+        warnings.warn(
+            "train_edges is empty — cutoff may be before all edge timestamps",
+            UserWarning,
+            stacklevel=2,
+        )
+    if not test_edges:
+        warnings.warn(
+            "test_edges is empty — cutoff may be after all edge timestamps",
+            UserWarning,
+            stacklevel=2,
+        )
+
+    # Hard leakage check.
+    if train_edges and test_edges:
+        train_max = max(getattr(e, time_attr) for e in train_edges)
+        test_min = min(getattr(e, time_attr) for e in test_edges)
+        if train_max >= test_min:
+            raise LeakageError(
+                f"Temporal overlap in graph split: train max ({train_max}) "
+                f">= test min ({test_min})"
+            )
+
+    return GraphSplitResult(
+        train_edges=train_edges,
+        test_edges=test_edges,
+        cutoff=resolved_cutoff,
+    )
+
+
+def validate_graph_split(result: GraphSplitResult, time_attr: str = "timestamp") -> bool:
+    """Assert that a :class:`GraphSplitResult` has no temporal overlap.
+
+    Args:
+        result: Output of :func:`temporal_graph_split`.
+        time_attr: Attribute name used as the timestamp.
+
+    Returns:
+        ``True`` if the split is clean.
+
+    Raises:
+        LeakageError: If overlap is detected.
+    """
+    if not result.train_edges or not result.test_edges:
+        return True
+
+    train_max = max(getattr(e, time_attr) for e in result.train_edges)
+    test_min = min(getattr(e, time_attr) for e in result.test_edges)
+
+    if train_max >= test_min:
+        raise LeakageError(
+            f"Temporal overlap in graph split: train max ({train_max}) "
+            f">= test min ({test_min})"
+        )
+    return True
+
+
+# ---------------------------------------------------------------------------
+# High-level config-driven splitter
+# ---------------------------------------------------------------------------
+
+class TemporalSplitter:
+    """Config-driven temporal train/test splitter.
+
+    Supports both DataFrame and graph-edge inputs.  Validates the result
+    automatically and raises :exc:`LeakageError` on any detected overlap.
+
+    Args:
+        train_ratio: Default fraction for train set when no cutoff is given.
+        cutoff: Optional explicit temporal boundary.  Overrides *train_ratio*
+            when set.
+        time_col: Column/attribute name used as timestamp.
+
+    Example — DataFrame usage::
+
+        splitter = TemporalSplitter(train_ratio=0.8, time_col="closed_at")
+        train_df, test_df = splitter.split_dataframe(transactions_df)
+
+    Example — Graph edge usage::
+
+        splitter = TemporalSplitter(train_ratio=0.8)
+        result = splitter.split_edges(edges)
+        # result.train_edges, result.test_edges
+    """
+
+    def __init__(
+        self,
+        train_ratio: float = 0.8,
+        cutoff: Optional[Any] = None,
+        time_col: str = "timestamp",
+    ):
+        if not (0 < train_ratio < 1):
+            raise ValueError(
+                f"train_ratio must be in (0, 1), got {train_ratio}"
+            )
+        self.train_ratio = train_ratio
+        self.cutoff = cutoff
+        self.time_col = time_col
+
+    # ------------------------------------------------------------------
+    # DataFrame split
+    # ------------------------------------------------------------------
+
+    def split_dataframe(
+        self,
+        df: Any,  # pd.DataFrame
+        time_col: Optional[str] = None,
+    ) -> Tuple[Any, Any]:
+        """Split a DataFrame temporally and validate the result.
+
+        Args:
+            df: Input ``pd.DataFrame``.
+            time_col: Override the instance-level *time_col*.
+
+        Returns:
+            ``(train_df, test_df)`` tuple.
+
+        Raises:
+            LeakageError: If the resulting split has temporal overlap.
+        """
+        col = time_col or self.time_col
+        train_df, test_df = temporal_train_test_split(
+            df,
+            col,
+            cutoff=self.cutoff,
+            train_ratio=self.train_ratio,
+        )
+        validate_temporal_split(train_df, test_df, col)
+        return train_df, test_df
+
+    # ------------------------------------------------------------------
+    # Graph edge split
+    # ------------------------------------------------------------------
+
+    def split_edges(
+        self,
+        edges: Sequence[Any],
+        time_attr: Optional[str] = None,
+    ) -> GraphSplitResult:
+        """Split graph edges temporally and validate the result.
+
+        Args:
+            edges: Sequence of edge objects.
+            time_attr: Override the instance-level *time_col*.
+
+        Returns:
+            :class:`GraphSplitResult`.
+
+        Raises:
+            LeakageError: If the resulting split has temporal overlap.
+        """
+        attr = time_attr or self.time_col
+        result = temporal_graph_split(
+            edges,
+            cutoff=self.cutoff,
+            train_ratio=self.train_ratio,
+            time_attr=attr,
+        )
+        validate_graph_split(result, time_attr=attr)
+        return result
diff --git a/astroml/training/train_gcn.py b/astroml/training/train_gcn.py
index 755dca8..9e9780c 100644
--- a/astroml/training/train_gcn.py
+++ b/astroml/training/train_gcn.py
@@ -2,6 +2,7 @@
 import torch.nn.functional as F
 from torch_geometric.datasets import Planetoid
 from torch_geometric.transforms import NormalizeFeatures
+
 from astroml.models.gcn import GCN
 
 
@@ -13,15 +14,15 @@ def train():
 
     model = GCN(
         input_dim=dataset.num_node_features,
-        hidden_dims=[64],
+        hidden_dim=16,
         output_dim=dataset.num_classes,
         dropout=0.5,
     ).to(device)
 
     optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)
 
-    model.train()
-    for epoch in range(200):
+    for epoch in range(1, 201):
+        model.train()
         optimizer.zero_grad()
         out = model(data.x, data.edge_index)
         loss = F.nll_loss(out[data.train_mask], data.y[data.train_mask])
@@ -29,20 +30,18 @@ def train():
         optimizer.step()
 
         if epoch % 20 == 0:
-            print(f"Epoch {epoch}, Loss: {loss.item():.4f}")
+            val_acc = _accuracy(model, data, data.val_mask)
+            print(f"Epoch {epoch:3d} | Loss: {loss.item():.4f} | Val Acc: {val_acc:.4f}")
 
-    test(model, data)
+    print(f"Test Accuracy: {_accuracy(model, data, data.test_mask):.4f}")
 
 
-def test(model, data):
+def _accuracy(model: GCN, data, mask) -> float:
     model.eval()
-    out = model(data.x, data.edge_index)
-    pred = out.argmax(dim=1)
-
-    correct = (pred[data.test_mask] == data.y[data.test_mask]).sum()
-    acc = int(correct) / int(data.test_mask.sum())
-    print(f"Test Accuracy: {acc:.4f}")
+    with torch.no_grad():
+        pred = model(data.x, data.edge_index).argmax(dim=1)
+    return float((pred[mask] == data.y[mask]).sum()) / float(mask.sum())
 
 
 if __name__ == "__main__":
-    train()
\ No newline at end of file
+    train()
diff --git a/configs/config.yaml b/configs/config.yaml
index e400ba1..afb7974 100644
--- a/configs/config.yaml
+++ b/configs/config.yaml
@@ -13,6 +13,14 @@ experiment:
   save_dir: "outputs"
   log_level: "INFO"
 
+# MLflow tracking settings
+mlflow:
+  enabled: true
+  tracking_uri: "mlruns"   # local directory; set to a remote URI for a tracking server
+  experiment_name: "${experiment.name}"
+  run_name: null            # auto-generated when null
+  log_model_weights: true   # log model artifact at the end of training
+
 # Hydra settings
 hydra:
   run:
diff --git a/configs/training/default.yaml b/configs/training/default.yaml
index 7736585..83c5a64 100644
--- a/configs/training/default.yaml
+++ b/configs/training/default.yaml
@@ -16,6 +16,13 @@ val_split: 0.1
 test_split: 0.1
 shuffle: true
 
+# Temporal split — set shuffle: false when enabled to prevent leakage
+temporal_split:
+  enabled: false        # set true to enforce past-trains-on-future ordering
+  time_col: "timestamp" # DataFrame column or Edge attribute used as timestamp
+  train_ratio: 0.8      # fraction of (sorted) data assigned to train
+  cutoff: null          # optional explicit cutoff value; overrides train_ratio
+
 # Logging
 log_interval: 20
 save_best_only: true
diff --git a/docs/schema.md b/docs/schema.md
index f781698..5777b94 100644
--- a/docs/schema.md
+++ b/docs/schema.md
@@ -1,152 +1,239 @@
-# AstroML Raw Data Storage Schema
+# AstroML Database Schema
 
 ## Overview
 
-AstroML stores raw Stellar blockchain data in PostgreSQL. The schema models the five core entities needed for dynamic graph ML: **ledgers**, **transactions**, **operations**, **accounts**, and **assets**.
+AstroML now has two complementary PostgreSQL schema layers:
 
-The graph mapping is:
+1. Raw Stellar ingestion tables used by the current stream pipeline.
+2. A normalized graph-mirror schema for account timelines and graph analytics.
 
-| Blockchain Concept | Graph Representation | Table |
-|--------------------|---------------------|-------|
-| Accounts | Nodes | `accounts` |
-| Operations | Directed edges | `operations` |
-| Assets | Edge types | `assets` |
-| Time (ledger close) | Dynamic dimension | `ledgers` |
+The split is intentional. The raw layer preserves source fidelity, while the
+graph mirror provides stable surrogate keys, typed edges, and reviewer-friendly
+constraints for time-series retrieval.
 
-## ER Diagram
+## Raw Storage Layer
+
+The existing raw schema remains unchanged:
+
+| Blockchain concept | Graph meaning | Table |
+|--------------------|---------------|-------|
+| Ledger close       | Temporal anchor | `ledgers` |
+| Transaction        | Container for operations | `transactions` |
+| Operation          | Raw directed edge/event | `operations` |
+| Account snapshot   | Latest observed node state | `accounts` |
+| Asset registry     | Canonical asset dimension | `assets` |
+
+These tables continue to support ingestion from Horizon and raw feature
+extraction.
+
+## Graph Mirror Layer
+
+The new graph mirror is optimized for normalized, account-centric reads:
+
+| Mirror concept | Purpose | Table |
+|----------------|---------|-------|
+| Canonical node | One row per unique account | `graph_accounts` |
+| Shared edge/event | Time-series fact table for transactions, claims, and payments | `graph_edges` |
+| Transaction subtype | Transaction-only attributes | `graph_transaction_details` |
+| Claim subtype | Claim-only attributes | `graph_claim_details` |
+| Payment subtype | Payment-only attributes | `graph_payment_details` |
+
+`graph_edges.asset_id` reuses the existing `assets` table so asset identity is
+not duplicated across the raw and mirror layers.
+
+## Relationship Diagram
 
 ```mermaid
 erDiagram
-    ledgers ||--o{ transactions : contains
-    transactions ||--o{ operations : contains
-
-    ledgers {
-        int sequence PK
-        varchar hash UK
-        varchar prev_hash
-        timestamptz closed_at
-        int successful_transaction_count
-        int failed_transaction_count
-        int operation_count
-        numeric total_coins
-        numeric fee_pool
-        int base_fee_in_stroops
-        int protocol_version
-    }
-
-    transactions {
-        varchar hash PK
-        int ledger_sequence FK
-        varchar source_account
+    assets ||--o{ graph_edges : categorizes
+    graph_accounts ||--o{ graph_edges : source
+    graph_accounts ||--o{ graph_edges : destination
+    graph_edges ||--o| graph_transaction_details : specializes
+    graph_edges ||--o| graph_claim_details : specializes
+    graph_edges ||--o| graph_payment_details : specializes
+
+    graph_accounts {
+        bigint id PK
+        varchar account_address UK
+        varchar account_type
+        timestamptz first_seen_at
+        timestamptz last_seen_at
         timestamptz created_at
-        bigint fee
-        smallint operation_count
-        boolean successful
-        varchar memo_type
-        text memo
+        timestamptz updated_at
     }
 
-    operations {
+    graph_edges {
         bigint id PK
-        varchar transaction_hash FK
-        smallint application_order
-        varchar type
-        varchar source_account
-        varchar destination_account
+        varchar edge_type
+        bigint source_account_id FK
+        bigint destination_account_id FK
+        int asset_id FK
+        timestamptz occurred_at
+        int ledger_sequence
+        int event_index
+        varchar transaction_hash
+        varchar external_event_id
         numeric amount
-        varchar asset_code
-        varchar asset_issuer
+        varchar status
         timestamptz created_at
-        jsonb details
     }
 
-    accounts {
-        varchar account_id PK
-        numeric balance
-        bigint sequence
-        varchar home_domain
-        int flags
-        int last_modified_ledger
-        timestamptz created_at
-        timestamptz updated_at
+    graph_transaction_details {
+        bigint edge_id PK,FK
+        varchar edge_type
+        boolean successful
+        smallint operation_count
+        bigint fee
+        varchar memo_type
+        text memo
+        jsonb details
     }
 
-    assets {
-        int id PK
-        varchar asset_type
-        varchar asset_code
-        varchar asset_issuer
-        int first_seen_ledger
+    graph_claim_details {
+        bigint edge_id PK,FK
+        varchar edge_type
+        varchar claim_reference
+        varchar claim_status
+        timestamptz expires_at
+        jsonb details
     }
-```
-
-## Table Details
-
-### `ledgers`
-
-Temporal anchor — one row per closed Stellar ledger (~5-6 seconds apart).
-
-**Indexes:**
-- `PK` on `sequence`
-- `UNIQUE` on `hash`
-- `ix_ledgers_closed_at` on `closed_at`
-
-### `transactions`
-
-One row per Stellar transaction. Linked to a ledger via `ledger_sequence`.
-
-**Indexes:**
-- `PK` on `hash`
-- `ix_transactions_source_account_created_at` on `(source_account, created_at)` — composite index for account+timestamp queries
-- `ix_transactions_ledger_sequence` on `ledger_sequence`
-
-### `operations`
-
-One row per operation — the primary graph-edge table. Common columns (`source_account`, `destination_account`, `amount`, `asset_code`, `asset_issuer`) cover the majority of graph-relevant operation types. The `details` JSONB column stores type-specific fields.
-
-`created_at` is denormalized from the parent transaction to support efficient temporal range queries without JOINs.
-
-**Indexes:**
-- `PK` on `id`
-- `ix_operations_source_created_at` on `(source_account, created_at)` — composite index for account+timestamp queries
-- `ix_operations_dest_created_at` on `(destination_account, created_at)` — partial index (WHERE destination_account IS NOT NULL)
-- `ix_operations_transaction_hash` on `transaction_hash`
-- `ix_operations_type` on `type`
 
-### `accounts`
-
-Latest known state of a Stellar account.
-
-**Indexes:**
-- `PK` on `account_id`
-- `ix_accounts_updated_at` on `updated_at`
-
-### `assets`
-
-Asset registry — unique by (code, issuer). Native XLM has `asset_issuer = NULL`.
-
-**Indexes:**
-- `PK` on `id`
-- `ix_assets_code_issuer` on `(asset_code, COALESCE(asset_issuer, ''))` — unique expression index handling NULL issuer for native XLM
-
-## Relationships
-
-```
-ledgers  1 ──< N  transactions  (ledger_sequence → sequence)
-transactions  1 ──< N  operations  (transaction_hash → hash)
+    graph_payment_details {
+        bigint edge_id PK,FK
+        varchar edge_type
+        varchar payment_reference
+        varchar payment_status
+        numeric fee_amount
+        timestamptz settled_at
+        jsonb details
+    }
 ```
 
-`accounts` and `assets` are reference tables — not FK-constrained from operations to keep bulk ingestion fast and avoid ordering dependencies.
+## Table-by-Table Notes
+
+### `graph_accounts`
+
+Canonical node table for the mirror.
+
+- `id` is a surrogate key for compact foreign keys and stable joins.
+- `account_address` is the natural key and is unique.
+- `first_seen_at` and `last_seen_at` track observation windows for the account
+  inside the mirror, not on-chain account creation semantics.
+- `account_type` is optional because some upstream sources may classify
+  accounts while others may not.
+
+### `graph_edges`
+
+Shared event table for graph traversal and timeline queries.
+
+- `edge_type` is constrained to `transaction`, `claim`, or `payment`.
+- `source_account_id` is required; `destination_account_id` is nullable only to
+  support edge types where the counterparty is not yet known or not applicable.
+- `occurred_at` is the business time for ordering and time-window filtering.
+- `created_at` is ingestion time for operational observability.
+- `external_event_id` is unique per `edge_type` and is the intended
+  idempotency/upsert key.
+- `ledger_sequence` and `event_index` provide deterministic tie-breaking for
+  sources that expose ledger order.
+
+### Detail Tables
+
+The mirror keeps the shared edge table narrow and pushes subtype attributes into
+1:1 detail tables.
+
+- `graph_transaction_details` stores fields such as `fee`, `memo`, and
+  `successful`.
+- `graph_claim_details` stores claim lifecycle fields such as
+  `claim_reference`, `claim_status`, and `expires_at`.
+- `graph_payment_details` stores payment lifecycle fields such as
+  `payment_reference`, `payment_status`, `fee_amount`, and `settled_at`.
+- Each detail table has a composite foreign key to `(graph_edges.id,
+  graph_edges.edge_type)` plus a check constraint on its fixed subtype. This
+  prevents a payment detail row from attaching to a claim or transaction edge.
+
+The optional `details` JSONB column in each subtype table is reserved for
+source-specific attributes that do not justify new columns yet.
+
+## Indexing Strategy
+
+The graph mirror is tuned for the expected read paths:
+
+- `ix_graph_edges_occurred_at`
+  Supports global time-range scans.
+- `ix_graph_edges_source_occurred_at`
+  Supports outbound activity for one account ordered by time.
+- `ix_graph_edges_destination_occurred_at`
+  Supports inbound activity for one account over a time window.
+- `ix_graph_edges_type_occurred_at`
+  Supports per-type timelines for `transaction`, `claim`, or `payment`.
+- `ix_graph_edges_asset_occurred_at`
+  Supports asset-filtered timelines and rollups.
+- `ix_graph_edges_status_occurred_at`
+  Supports state-aware filtering in a time window.
+- `ix_graph_edges_tx_hash`
+  Supports reverse lookup from a known transaction hash.
+- `ix_graph_edges_ledger_event`
+  Supports deterministic replay and incremental ingestion windows.
+- `ix_graph_accounts_last_seen_at`
+  Supports recency-oriented node lookups.
+
+PostgreSQL btree indexes can be scanned in reverse order, so the composite
+indexes remain effective for `ORDER BY occurred_at DESC` queries without making
+the migration noisier.
+
+## Query Paths Supported
+
+This design is meant to support queries such as:
+
+- all activity for one account ordered by `occurred_at`
+- inbound or outbound edges over a time window
+- all claims, payments, or transactions in a time range
+- latest activity for a specific account
+- filtering by `asset_id`, `edge_type`, or `status` within a window
+- incremental ingestion keyed by `(edge_type, external_event_id)` and replay
+  ordered by `(ledger_sequence, event_index)`
+
+## Normalization Rationale
+
+- Account identity is stored once in `graph_accounts`.
+- Shared edge attributes live once in `graph_edges`.
+- Type-specific attributes are isolated in dedicated detail tables instead of a
+  single sparse table with many nullable columns.
+- Asset identity is reused from the existing `assets` dimension.
+
+This keeps the schema normalized while still practical for operational reads.
+
+## Ingestion and Idempotency Notes
+
+- Use `external_event_id` together with `edge_type` as the upsert key.
+- When the source does not expose a single immutable event ID, the ingestion
+  layer should derive one deterministically, for example from transaction hash
+  plus operation or event order.
+- `occurred_at` and `created_at` are intentionally separate and should not be
+  conflated in application code.
+- `destination_account_id` should remain null only when the source truly lacks a
+  destination relationship.
+
+## Assumptions
+
+- The project remains Stellar-oriented, so account identifiers use the existing
+  56-character account-address convention.
+- `assets` remains the canonical asset dimension for both raw and mirror data.
+- Claim and payment sources may vary, so subtype tables expose a few strongly
+  typed lifecycle columns plus a JSONB extension field for source-specific data.
+
+## Future Recommendations
+
+- Add monthly or quarterly partitioning on `graph_edges.occurred_at` once row
+  counts justify it.
+- Consider materialized daily rollups for high-volume account analytics.
+- If ingestion begins mirroring directly from the raw `operations` table, add a
+  dedicated lineage column or foreign key back to the raw record that produced
+  each graph edge.
 
 ## Running Migrations
 
 ```bash
-# Apply all migrations
 alembic upgrade head
-
-# Rollback all migrations
 alembic downgrade base
-
-# Create a new migration
-alembic revision --autogenerate -m "description"
 ```
diff --git a/migrations/versions/002_graph_mirror_schema.py b/migrations/versions/002_graph_mirror_schema.py
new file mode 100644
index 0000000..7f9b582
--- /dev/null
+++ b/migrations/versions/002_graph_mirror_schema.py
@@ -0,0 +1,262 @@
+"""Add normalized PostgreSQL graph mirror schema.
+
+Revision ID: 002
+Revises: 001
+Create Date: 2026-03-24
+
+Adds a normalized graph mirror alongside the existing raw Stellar tables:
+
+- graph_accounts: canonical account nodes
+- graph_edges: shared directed edge/event rows
+- graph_transaction_details: transaction-only attributes
+- graph_claim_details: claim-only attributes
+- graph_payment_details: payment-only attributes
+"""
+from typing import Sequence, Union
+
+from alembic import op
+import sqlalchemy as sa
+from sqlalchemy.dialects import postgresql
+
+# revision identifiers, used by Alembic.
+revision: str = "002"
+down_revision: Union[str, None] = "001"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    op.create_table(
+        "graph_accounts",
+        sa.Column("id", sa.BigInteger(), autoincrement=True, nullable=False),
+        sa.Column("account_address", sa.String(length=56), nullable=False),
+        sa.Column("account_type", sa.String(length=32), nullable=True),
+        sa.Column("first_seen_at", sa.DateTime(timezone=True), nullable=False),
+        sa.Column("last_seen_at", sa.DateTime(timezone=True), nullable=False),
+        sa.Column(
+            "created_at",
+            sa.DateTime(timezone=True),
+            nullable=False,
+            server_default=sa.text("CURRENT_TIMESTAMP"),
+        ),
+        sa.Column(
+            "updated_at",
+            sa.DateTime(timezone=True),
+            nullable=False,
+            server_default=sa.text("CURRENT_TIMESTAMP"),
+        ),
+        sa.PrimaryKeyConstraint("id"),
+        sa.UniqueConstraint("account_address"),
+    )
+    op.create_index(
+        "ix_graph_accounts_last_seen_at",
+        "graph_accounts",
+        ["last_seen_at"],
+    )
+    op.create_index(
+        "ix_graph_accounts_account_type",
+        "graph_accounts",
+        ["account_type"],
+    )
+
+    op.create_table(
+        "graph_edges",
+        sa.Column("id", sa.BigInteger(), autoincrement=True, nullable=False),
+        sa.Column("edge_type", sa.String(length=16), nullable=False),
+        sa.Column("source_account_id", sa.BigInteger(), nullable=False),
+        sa.Column("destination_account_id", sa.BigInteger(), nullable=True),
+        sa.Column("asset_id", sa.Integer(), nullable=True),
+        sa.Column("occurred_at", sa.DateTime(timezone=True), nullable=False),
+        sa.Column("ledger_sequence", sa.Integer(), nullable=True),
+        sa.Column("event_index", sa.Integer(), nullable=True),
+        sa.Column("transaction_hash", sa.String(length=64), nullable=True),
+        sa.Column("external_event_id", sa.String(length=128), nullable=False),
+        sa.Column("amount", sa.Numeric(), nullable=True),
+        sa.Column("status", sa.String(length=32), nullable=True),
+        sa.Column(
+            "created_at",
+            sa.DateTime(timezone=True),
+            nullable=False,
+            server_default=sa.text("CURRENT_TIMESTAMP"),
+        ),
+        sa.CheckConstraint(
+            "edge_type IN ('transaction', 'claim', 'payment')",
+            name="ck_graph_edges_edge_type",
+        ),
+        sa.CheckConstraint(
+            "source_account_id <> destination_account_id OR destination_account_id IS NULL",
+            name="ck_graph_edges_distinct_accounts",
+        ),
+        sa.ForeignKeyConstraint(
+            ["asset_id"],
+            ["assets.id"],
+        ),
+        sa.ForeignKeyConstraint(
+            ["destination_account_id"],
+            ["graph_accounts.id"],
+        ),
+        sa.ForeignKeyConstraint(
+            ["source_account_id"],
+            ["graph_accounts.id"],
+        ),
+        sa.PrimaryKeyConstraint("id"),
+        sa.UniqueConstraint(
+            "edge_type",
+            "external_event_id",
+            name="uq_graph_edges_type_external_event_id",
+        ),
+        sa.UniqueConstraint(
+            "id",
+            "edge_type",
+            name="uq_graph_edges_id_edge_type",
+        ),
+    )
+    op.create_index("ix_graph_edges_occurred_at", "graph_edges", ["occurred_at"])
+    op.create_index(
+        "ix_graph_edges_source_occurred_at",
+        "graph_edges",
+        ["source_account_id", "occurred_at"],
+    )
+    op.create_index(
+        "ix_graph_edges_destination_occurred_at",
+        "graph_edges",
+        ["destination_account_id", "occurred_at"],
+    )
+    op.create_index(
+        "ix_graph_edges_type_occurred_at",
+        "graph_edges",
+        ["edge_type", "occurred_at"],
+    )
+    op.create_index(
+        "ix_graph_edges_asset_occurred_at",
+        "graph_edges",
+        ["asset_id", "occurred_at"],
+    )
+    op.create_index(
+        "ix_graph_edges_status_occurred_at",
+        "graph_edges",
+        ["status", "occurred_at"],
+    )
+    op.create_index(
+        "ix_graph_edges_tx_hash",
+        "graph_edges",
+        ["transaction_hash"],
+        postgresql_where=sa.text("transaction_hash IS NOT NULL"),
+    )
+    op.create_index(
+        "ix_graph_edges_ledger_event",
+        "graph_edges",
+        ["ledger_sequence", "event_index"],
+    )
+
+    op.create_table(
+        "graph_transaction_details",
+        sa.Column("edge_id", sa.BigInteger(), nullable=False),
+        sa.Column(
+            "edge_type",
+            sa.String(length=16),
+            nullable=False,
+            server_default="transaction",
+        ),
+        sa.Column("successful", sa.Boolean(), nullable=True),
+        sa.Column("operation_count", sa.SmallInteger(), nullable=True),
+        sa.Column("fee", sa.BigInteger(), nullable=True),
+        sa.Column("memo_type", sa.String(length=16), nullable=True),
+        sa.Column("memo", sa.Text(), nullable=True),
+        sa.Column("details", postgresql.JSONB(), nullable=True),
+        sa.CheckConstraint(
+            "edge_type = 'transaction'",
+            name="ck_graph_transaction_details_edge_type",
+        ),
+        sa.ForeignKeyConstraint(
+            ["edge_id", "edge_type"],
+            ["graph_edges.id", "graph_edges.edge_type"],
+            ondelete="CASCADE",
+        ),
+        sa.PrimaryKeyConstraint("edge_id"),
+    )
+
+    op.create_table(
+        "graph_claim_details",
+        sa.Column("edge_id", sa.BigInteger(), nullable=False),
+        sa.Column(
+            "edge_type",
+            sa.String(length=16),
+            nullable=False,
+            server_default="claim",
+        ),
+        sa.Column("claim_reference", sa.String(length=128), nullable=True),
+        sa.Column("claim_status", sa.String(length=32), nullable=True),
+        sa.Column("expires_at", sa.DateTime(timezone=True), nullable=True),
+        sa.Column("details", postgresql.JSONB(), nullable=True),
+        sa.CheckConstraint(
+            "edge_type = 'claim'",
+            name="ck_graph_claim_details_edge_type",
+        ),
+        sa.ForeignKeyConstraint(
+            ["edge_id", "edge_type"],
+            ["graph_edges.id", "graph_edges.edge_type"],
+            ondelete="CASCADE",
+        ),
+        sa.PrimaryKeyConstraint("edge_id"),
+    )
+    op.create_index(
+        "ix_graph_claim_details_claim_status",
+        "graph_claim_details",
+        ["claim_status"],
+    )
+
+    op.create_table(
+        "graph_payment_details",
+        sa.Column("edge_id", sa.BigInteger(), nullable=False),
+        sa.Column(
+            "edge_type",
+            sa.String(length=16),
+            nullable=False,
+            server_default="payment",
+        ),
+        sa.Column("payment_reference", sa.String(length=128), nullable=True),
+        sa.Column("payment_status", sa.String(length=32), nullable=True),
+        sa.Column("fee_amount", sa.Numeric(), nullable=True),
+        sa.Column("settled_at", sa.DateTime(timezone=True), nullable=True),
+        sa.Column("details", postgresql.JSONB(), nullable=True),
+        sa.CheckConstraint(
+            "edge_type = 'payment'",
+            name="ck_graph_payment_details_edge_type",
+        ),
+        sa.CheckConstraint(
+            "fee_amount >= 0 OR fee_amount IS NULL",
+            name="ck_graph_payment_details_fee_amount_non_negative",
+        ),
+        sa.ForeignKeyConstraint(
+            ["edge_id", "edge_type"],
+            ["graph_edges.id", "graph_edges.edge_type"],
+            ondelete="CASCADE",
+        ),
+        sa.PrimaryKeyConstraint("edge_id"),
+    )
+    op.create_index(
+        "ix_graph_payment_details_payment_status",
+        "graph_payment_details",
+        ["payment_status"],
+    )
+
+
+def downgrade() -> None:
+    op.drop_index("ix_graph_payment_details_payment_status", table_name="graph_payment_details")
+    op.drop_table("graph_payment_details")
+    op.drop_index("ix_graph_claim_details_claim_status", table_name="graph_claim_details")
+    op.drop_table("graph_claim_details")
+    op.drop_table("graph_transaction_details")
+    op.drop_index("ix_graph_edges_ledger_event", table_name="graph_edges")
+    op.drop_index("ix_graph_edges_tx_hash", table_name="graph_edges")
+    op.drop_index("ix_graph_edges_status_occurred_at", table_name="graph_edges")
+    op.drop_index("ix_graph_edges_asset_occurred_at", table_name="graph_edges")
+    op.drop_index("ix_graph_edges_type_occurred_at", table_name="graph_edges")
+    op.drop_index("ix_graph_edges_destination_occurred_at", table_name="graph_edges")
+    op.drop_index("ix_graph_edges_source_occurred_at", table_name="graph_edges")
+    op.drop_index("ix_graph_edges_occurred_at", table_name="graph_edges")
+    op.drop_table("graph_edges")
+    op.drop_index("ix_graph_accounts_account_type", table_name="graph_accounts")
+    op.drop_index("ix_graph_accounts_last_seen_at", table_name="graph_accounts")
+    op.drop_table("graph_accounts")
diff --git a/requirements.txt b/requirements.txt
index 548e1d7..2baf6f0 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,4 +1,5 @@
 
+mlflow>=2.10.0
 torch>=2.0.0
 torch-geometric>=2.3.0
 
diff --git a/tests/test_frequency.py b/tests/test_frequency.py
index caa3003..373328e 100644
--- a/tests/test_frequency.py
+++ b/tests/test_frequency.py
@@ -2,11 +2,14 @@
 import pandas as pd
 import pytest
 from hypothesis import given, strategies as st
+from hypothesis.extra.pandas import column, data_frames, range_indexes
 
 from astroml.features.frequency import (
     _compute_burstiness,
     _extract_daily_counts,
     _validate_dataframe,
+    compute_account_frequency,
+    compute_frequency_metrics,
 )
 
 
@@ -278,3 +281,179 @@ def test_invalid_timestamps_raise_value_error(self):
 
         with pytest.raises(ValueError, match="must contain datetime values"):
             _validate_dataframe(df, timestamp_col="timestamp", account_col="account")
+
+
+class TestComputeAccountFrequency:
+    """Unit tests for compute_account_frequency."""
+
+    def test_valid_account_id_returns_expected_metrics(self):
+        """Should return the required keys and expected values for a known account."""
+        df = pd.DataFrame({
+            "account": ["acct-1", "acct-1", "acct-1", "acct-2"],
+            "timestamp": pd.to_datetime([
+                "2024-01-01 10:00:00",
+                "2024-01-01 14:00:00",
+                "2024-01-03 09:00:00",
+                "2024-01-02 08:00:00",
+            ]),
+        })
+
+        result = compute_account_frequency(df, "acct-1")
+
+        assert isinstance(result, dict)
+        assert set(result.keys()) == {
+            "mean_tx_per_day",
+            "std_tx_per_day",
+            "burstiness",
+        }
+        assert result["mean_tx_per_day"] == pytest.approx(1.0)
+        assert result["std_tx_per_day"] == pytest.approx(np.sqrt(1.0))
+        assert result["burstiness"] == pytest.approx(0.0)
+
+    def test_invalid_account_id_raises_value_error(self):
+        """Should raise ValueError when the account is absent."""
+        df = pd.DataFrame({
+            "account": ["acct-1"],
+            "timestamp": ["2024-01-01"],
+        })
+
+        with pytest.raises(ValueError, match="not found"):
+            compute_account_frequency(df, "acct-missing")
+
+    def test_single_day_transactions_match_batch_behavior(self):
+        """Single-day histories should match the batch path exactly."""
+        df = pd.DataFrame({
+            "account": ["acct-1", "acct-1", "acct-1", "acct-2"],
+            "timestamp": pd.to_datetime([
+                "2024-01-01 10:00:00",
+                "2024-01-01 12:00:00",
+                "2024-01-01 18:00:00",
+                "2024-01-02 09:00:00",
+            ]),
+        })
+
+        batch_row = compute_frequency_metrics(df).set_index("account").loc["acct-1"]
+        single = compute_account_frequency(df, "acct-1")
+
+        assert single["mean_tx_per_day"] == pytest.approx(float(batch_row["mean_tx_per_day"]))
+        assert single["std_tx_per_day"] == pytest.approx(float(batch_row["std_tx_per_day"]))
+        assert single["burstiness"] == pytest.approx(float(batch_row["burstiness"]))
+
+    def test_custom_column_names_are_supported(self):
+        """Custom account and timestamp columns should follow the batch API."""
+        df = pd.DataFrame({
+            "wallet": ["acct-1", "acct-1", "acct-2"],
+            "block_time": ["2024-01-01", "2024-01-03", "2024-01-02"],
+        })
+
+        result = compute_account_frequency(
+            df,
+            "acct-1",
+            timestamp_col="block_time",
+            account_col="wallet",
+        )
+
+        assert set(result.keys()) == {
+            "mean_tx_per_day",
+            "std_tx_per_day",
+            "burstiness",
+        }
+        expected_std = float(np.std(np.array([1, 0, 1]), ddof=1))
+        assert result["mean_tx_per_day"] == pytest.approx(2.0 / 3.0)
+        assert result["std_tx_per_day"] == pytest.approx(expected_std)
+        assert result["burstiness"] == pytest.approx(
+            (expected_std - (2.0 / 3.0)) / (expected_std + (2.0 / 3.0))
+        )
+
+    def test_batch_and_single_account_consistency(self):
+        """Single-account output should match the corresponding batch row."""
+        df = pd.DataFrame({
+            "account": ["acct-1", "acct-1", "acct-2", "acct-2", "acct-3"],
+            "timestamp": pd.to_datetime([
+                "2024-01-01",
+                "2024-01-03",
+                "2024-01-01",
+                "2024-01-02",
+                "2024-01-04",
+            ]),
+        })
+
+        batch = compute_frequency_metrics(df).set_index("account")
+
+        for account_id in ["acct-1", "acct-2", "acct-3"]:
+            single = compute_account_frequency(df, account_id)
+            batch_row = batch.loc[account_id]
+            assert single == pytest.approx({
+                "mean_tx_per_day": float(batch_row["mean_tx_per_day"]),
+                "std_tx_per_day": float(batch_row["std_tx_per_day"]),
+                "burstiness": float(batch_row["burstiness"]),
+            })
+
+
+@st.composite
+def transaction_data_frames(draw):
+    """Generate realistic transaction DataFrames for frequency tests."""
+    frame = draw(
+        data_frames(
+            index=range_indexes(min_size=1, max_size=12),
+            columns=[
+                column("account", elements=st.sampled_from(["acct-1", "acct-2", "acct-3"])),
+                column(
+                    "timestamp",
+                    elements=st.datetimes(
+                        min_value=pd.Timestamp("2024-01-01").to_pydatetime(),
+                        max_value=pd.Timestamp("2024-01-10").to_pydatetime(),
+                    ),
+                ),
+            ],
+        )
+    )
+    return frame
+
+
+class TestComputeAccountFrequencyProperties:
+    """Property-based tests for compute_account_frequency."""
+
+    @given(transaction_data_frames())
+    def test_single_account_matches_batch_output(self, df):
+        """Property: single-account metrics equal the matching batch row."""
+        target_account = df["account"].iloc[0]
+
+        single = compute_account_frequency(df, target_account)
+        batch_row = compute_frequency_metrics(df).set_index("account").loc[target_account]
+
+        assert single == pytest.approx({
+            "mean_tx_per_day": float(batch_row["mean_tx_per_day"]),
+            "std_tx_per_day": float(batch_row["std_tx_per_day"]),
+            "burstiness": float(batch_row["burstiness"]),
+        })
+
+    @given(transaction_data_frames())
+    def test_custom_columns_preserve_consistency(self, df):
+        """Property: renamed account/timestamp columns still behave consistently."""
+        renamed = df.rename(columns={"account": "wallet", "timestamp": "block_time"})
+        target_account = renamed["wallet"].iloc[0]
+
+        single = compute_account_frequency(
+            renamed,
+            target_account,
+            timestamp_col="block_time",
+            account_col="wallet",
+        )
+        batch_row = compute_frequency_metrics(
+            renamed,
+            timestamp_col="block_time",
+            account_col="wallet",
+        ).set_index("wallet").loc[target_account]
+
+        assert single == pytest.approx({
+            "mean_tx_per_day": float(batch_row["mean_tx_per_day"]),
+            "std_tx_per_day": float(batch_row["std_tx_per_day"]),
+            "burstiness": float(batch_row["burstiness"]),
+        })
+
+    @given(transaction_data_frames())
+    def test_missing_account_always_raises_value_error(self, df):
+        """Property: absent accounts are rejected consistently."""
+        with pytest.raises(ValueError, match="not found"):
+            compute_account_frequency(df, "acct-missing")
diff --git a/tests/test_link_prediction.py b/tests/test_link_prediction.py
new file mode 100644
index 0000000..9ab5238
--- /dev/null
+++ b/tests/test_link_prediction.py
@@ -0,0 +1,257 @@
+"""Tests for self-supervised link prediction task and model."""
+from __future__ import annotations
+
+import random
+from dataclasses import dataclass
+from typing import List
+
+import pytest
+import torch
+
+from astroml.features.graph.snapshot import Edge
+from astroml.tasks.link_prediction_task import (
+    LedgerSplit,
+    LinkPredictionTask,
+    sample_negative_edges,
+)
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _make_edges(n: int = 20, max_ledger: int = 10) -> List[Edge]:
+    """Return n edges with ledger timestamps spread across max_ledger values."""
+    rng = random.Random(42)
+    accounts = [f"account_{i}" for i in range(6)]
+    edges = []
+    for i in range(n):
+        src = rng.choice(accounts)
+        dst = rng.choice([a for a in accounts if a != src])
+        ts = rng.randint(1, max_ledger)
+        edges.append(Edge(src=src, dst=dst, timestamp=ts))
+    return edges
+
+
+# ---------------------------------------------------------------------------
+# LedgerSplit
+# ---------------------------------------------------------------------------
+
+class TestLedgerSplit:
+    def test_num_nodes(self):
+        edges = _make_edges(10)
+        accounts = {e.src for e in edges} | {e.dst for e in edges}
+        node_index = {a: i for i, a in enumerate(sorted(accounts))}
+        split = LedgerSplit(context_edges=edges, future_edges=[], node_index=node_index)
+        assert split.num_nodes == len(accounts)
+
+    def test_to_edge_index_shape(self):
+        edges = [Edge("a", "b", 1), Edge("b", "c", 2)]
+        node_index = {"a": 0, "b": 1, "c": 2}
+        split = LedgerSplit(context_edges=edges, future_edges=[], node_index=node_index)
+        ei = split.to_edge_index(edges)
+        assert ei.shape == (2, 2)
+        assert ei.dtype == torch.long
+
+    def test_to_edge_index_unknown_accounts_skipped(self):
+        edges = [Edge("a", "b", 1), Edge("x", "y", 2)]  # x, y not in index
+        node_index = {"a": 0, "b": 1}
+        split = LedgerSplit(context_edges=edges, future_edges=[], node_index=node_index)
+        ei = split.to_edge_index(edges)
+        assert ei.shape == (2, 1)  # only the a→b edge
+
+    def test_to_edge_index_empty(self):
+        node_index = {"a": 0, "b": 1}
+        split = LedgerSplit(context_edges=[], future_edges=[], node_index=node_index)
+        ei = split.to_edge_index([])
+        assert ei.shape == (2, 0)
+
+
+# ---------------------------------------------------------------------------
+# sample_negative_edges
+# ---------------------------------------------------------------------------
+
+class TestSampleNegativeEdges:
+    def test_no_positives_in_negatives(self):
+        pos_set = {(0, 1), (1, 2)}
+        negs = sample_negative_edges(num_nodes=5, positive_set=pos_set, num_samples=10, rng=random.Random(0))
+        for u, v in negs:
+            assert (u, v) not in pos_set
+
+    def test_no_self_loops(self):
+        negs = sample_negative_edges(num_nodes=10, positive_set=set(), num_samples=20, rng=random.Random(0))
+        for u, v in negs:
+            assert u != v
+
+    def test_returns_at_most_num_samples(self):
+        negs = sample_negative_edges(num_nodes=4, positive_set=set(), num_samples=100, rng=random.Random(0))
+        # 4 nodes → at most 4*3=12 directed non-self-loop pairs
+        assert len(negs) <= 12
+
+    def test_uniqueness(self):
+        negs = sample_negative_edges(num_nodes=20, positive_set=set(), num_samples=30, rng=random.Random(1))
+        assert len(negs) == len(set(negs))
+
+
+# ---------------------------------------------------------------------------
+# LinkPredictionTask.build_splits
+# ---------------------------------------------------------------------------
+
+class TestBuildSplits:
+    def test_splits_are_produced(self):
+        edges = _make_edges(30, max_ledger=15)
+        task = LinkPredictionTask(edges, n_future=3, seed=0)
+        splits = task.build_splits()
+        assert len(splits) > 0
+
+    def test_context_strictly_before_future(self):
+        edges = _make_edges(30, max_ledger=15)
+        task = LinkPredictionTask(edges, n_future=3, seed=0)
+        for split in task.build_splits():
+            ctx_max = max(e.timestamp for e in split.context_edges)
+            fut_min = min(e.timestamp for e in split.future_edges)
+            assert ctx_max < fut_min, (
+                f"Context max={ctx_max} is not < future min={fut_min}"
+            )
+
+    def test_future_window_bounded_by_n_future(self):
+        edges = _make_edges(40, max_ledger=20)
+        n_future = 4
+        task = LinkPredictionTask(edges, n_future=n_future, seed=0)
+        for split in task.build_splits():
+            fut_min = min(e.timestamp for e in split.future_edges)
+            fut_max = max(e.timestamp for e in split.future_edges)
+            assert fut_max < fut_min + n_future
+
+    def test_context_ledgers_restricts_window(self):
+        edges = _make_edges(50, max_ledger=20)
+        task = LinkPredictionTask(edges, n_future=3, context_ledgers=2, seed=0)
+        for split in task.build_splits():
+            ctx_ledgers = {e.timestamp for e in split.context_edges}
+            assert len(ctx_ledgers) <= 2
+
+    def test_empty_edges_returns_no_splits(self):
+        task = LinkPredictionTask([], n_future=5, seed=0)
+        assert task.build_splits() == []
+
+    def test_node_index_covers_all_accounts(self):
+        edges = _make_edges(20, max_ledger=10)
+        task = LinkPredictionTask(edges, n_future=3, seed=0)
+        for split in task.build_splits():
+            all_accounts = (
+                {e.src for e in split.context_edges} |
+                {e.dst for e in split.context_edges} |
+                {e.src for e in split.future_edges} |
+                {e.dst for e in split.future_edges}
+            )
+            assert all_accounts <= set(split.node_index.keys())
+
+    def test_invalid_n_future_raises(self):
+        with pytest.raises(ValueError, match="n_future"):
+            LinkPredictionTask(_make_edges(), n_future=0)
+
+    def test_invalid_neg_ratio_raises(self):
+        with pytest.raises(ValueError, match="neg_sampling_ratio"):
+            LinkPredictionTask(_make_edges(), neg_sampling_ratio=0.0)
+
+
+# ---------------------------------------------------------------------------
+# LinkPredictionTask.sample_negatives
+# ---------------------------------------------------------------------------
+
+class TestSampleNegatives:
+    def _make_split(self) -> LedgerSplit:
+        edges = [Edge("a", "b", 1), Edge("b", "c", 1)]
+        future = [Edge("a", "c", 2)]
+        node_index = {"a": 0, "b": 1, "c": 2}
+        return LedgerSplit(context_edges=edges, future_edges=future, node_index=node_index)
+
+    def test_returns_tensor_shape(self):
+        task = LinkPredictionTask(_make_edges(), n_future=3, seed=0)
+        split = self._make_split()
+        neg = task.sample_negatives(split)
+        assert neg.dim() == 2
+        assert neg.shape[0] == 2
+
+    def test_negatives_not_in_future_edges(self):
+        task = LinkPredictionTask(_make_edges(), n_future=3, seed=42)
+        split = self._make_split()
+        neg = task.sample_negatives(split)
+        future_pairs = {(0, 2)}  # a→c in node_index
+        for i in range(neg.size(1)):
+            assert (neg[0, i].item(), neg[1, i].item()) not in future_pairs
+
+
+# ---------------------------------------------------------------------------
+# LinkPredictor model (unit tests — no GCN forward, just decoder logic)
+# ---------------------------------------------------------------------------
+
+class TestLinkPredictorDecoder:
+    """Test the decoder and loss without running a full GCN forward."""
+
+    def _dummy_embeddings(self, n=8, dim=16) -> torch.Tensor:
+        torch.manual_seed(0)
+        return torch.randn(n, dim)
+
+    def test_dot_decoder_shape(self):
+        try:
+            from astroml.models.link_prediction import LinkPredictor
+        except ImportError:
+            pytest.skip("torch_geometric not installed")
+
+        model = LinkPredictor(input_dim=8, hidden_dims=[16], embedding_dim=16, decoder="dot")
+        z = self._dummy_embeddings()
+        edge_index = torch.tensor([[0, 1, 2], [1, 2, 3]], dtype=torch.long)
+        scores = model.decode(z, edge_index)
+        assert scores.shape == (3,)
+
+    def test_mlp_decoder_shape(self):
+        try:
+            from astroml.models.link_prediction import LinkPredictor
+        except ImportError:
+            pytest.skip("torch_geometric not installed")
+
+        model = LinkPredictor(input_dim=8, hidden_dims=[16], embedding_dim=16, decoder="mlp")
+        z = self._dummy_embeddings()
+        edge_index = torch.tensor([[0, 1], [1, 2]], dtype=torch.long)
+        scores = model.decode(z, edge_index)
+        assert scores.shape == (2,)
+
+    def test_loss_is_scalar(self):
+        try:
+            from astroml.models.link_prediction import LinkPredictor
+        except ImportError:
+            pytest.skip("torch_geometric not installed")
+
+        model = LinkPredictor(input_dim=8, hidden_dims=[16], embedding_dim=16)
+        z = self._dummy_embeddings()
+        pos = torch.tensor([[0, 1], [1, 2]], dtype=torch.long)
+        neg = torch.tensor([[0, 2], [3, 4]], dtype=torch.long)
+        loss = model.loss(z, pos, neg)
+        assert loss.dim() == 0
+        assert loss.item() > 0
+
+    def test_loss_decreases_with_training(self):
+        try:
+            from astroml.models.link_prediction import LinkPredictor
+        except ImportError:
+            pytest.skip("torch_geometric not installed")
+
+        torch.manual_seed(1)
+        model = LinkPredictor(input_dim=8, hidden_dims=[16], embedding_dim=16)
+        optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
+
+        # Fixed embeddings simulating encoder output
+        z = self._dummy_embeddings()
+        pos = torch.tensor([[0, 1, 2], [1, 2, 3]], dtype=torch.long)
+        neg = torch.tensor([[4, 5, 6], [5, 6, 7]], dtype=torch.long)
+
+        losses = []
+        for _ in range(20):
+            optimizer.zero_grad()
+            loss = model.loss(z, pos, neg)
+            loss.backward()
+            optimizer.step()
+            losses.append(loss.item())
+
+        assert losses[-1] < losses[0], "Loss should decrease over training steps"
diff --git a/tests/test_schema.py b/tests/test_schema.py
index 08c25cd..c51d33e 100644
--- a/tests/test_schema.py
+++ b/tests/test_schema.py
@@ -8,12 +8,18 @@
 
 import pytest
 from sqlalchemy import create_engine, inspect, text
+from sqlalchemy.exc import IntegrityError
 from sqlalchemy.orm import Session
 
 from astroml.db.schema import (
     Account,
     Asset,
     Base,
+    GraphAccount,
+    GraphClaimDetail,
+    GraphEdge,
+    GraphPaymentDetail,
+    GraphTransactionDetail,
     Ledger,
     Operation,
     Transaction,
@@ -45,7 +51,18 @@ def session(engine):
 
 def test_models_importable():
     """All five model classes import cleanly."""
-    for cls in (Ledger, Transaction, Operation, Account, Asset):
+    for cls in (
+        Ledger,
+        Transaction,
+        Operation,
+        Account,
+        Asset,
+        GraphAccount,
+        GraphEdge,
+        GraphTransactionDetail,
+        GraphClaimDetail,
+        GraphPaymentDetail,
+    ):
         assert hasattr(cls, "__tablename__")
 
 
@@ -53,7 +70,20 @@ def test_create_all_tables(engine):
     """metadata.create_all() succeeds and produces the expected tables."""
     inspector = inspect(engine)
     table_names = set(inspector.get_table_names())
-    assert table_names == {"ledgers", "transactions", "operations", "accounts", "assets", "normalized_transactions"}
+    assert table_names == {
+        "accounts",
+        "assets",
+        "effects",
+        "graph_accounts",
+        "graph_claim_details",
+        "graph_edges",
+        "graph_payment_details",
+        "graph_transaction_details",
+        "ledgers",
+        "normalized_transactions",
+        "operations",
+        "transactions",
+    }
 
 
 def test_table_names():
@@ -63,6 +93,8 @@ def test_table_names():
     assert Operation.__tablename__ == "operations"
     assert Account.__tablename__ == "accounts"
     assert Asset.__tablename__ == "assets"
+    assert GraphAccount.__tablename__ == "graph_accounts"
+    assert GraphEdge.__tablename__ == "graph_edges"
 
 
 # ---------------------------------------------------------------------------
@@ -135,6 +167,65 @@ def test_asset_columns(engine):
     assert expected <= cols
 
 
+def test_graph_account_columns(engine):
+    inspector = inspect(engine)
+    cols = {c["name"] for c in inspector.get_columns("graph_accounts")}
+    expected = {
+        "id",
+        "account_address",
+        "account_type",
+        "first_seen_at",
+        "last_seen_at",
+        "created_at",
+        "updated_at",
+    }
+    assert expected <= cols
+
+
+def test_graph_edge_columns(engine):
+    inspector = inspect(engine)
+    cols = {c["name"] for c in inspector.get_columns("graph_edges")}
+    expected = {
+        "id",
+        "edge_type",
+        "source_account_id",
+        "destination_account_id",
+        "asset_id",
+        "occurred_at",
+        "ledger_sequence",
+        "event_index",
+        "transaction_hash",
+        "external_event_id",
+        "amount",
+        "status",
+        "created_at",
+    }
+    assert expected <= cols
+
+    fks = inspector.get_foreign_keys("graph_edges")
+    assert any(
+        fk["referred_table"] == "graph_accounts"
+        and fk["referred_columns"] == ["id"]
+        for fk in fks
+    )
+    assert any(
+        fk["referred_table"] == "assets"
+        and fk["referred_columns"] == ["id"]
+        for fk in fks
+    )
+
+
+def test_graph_detail_columns(engine):
+    inspector = inspect(engine)
+    transaction_cols = {c["name"] for c in inspector.get_columns("graph_transaction_details")}
+    claim_cols = {c["name"] for c in inspector.get_columns("graph_claim_details")}
+    payment_cols = {c["name"] for c in inspector.get_columns("graph_payment_details")}
+
+    assert {"edge_id", "edge_type", "successful", "operation_count", "fee", "memo_type", "memo", "details"} <= transaction_cols
+    assert {"edge_id", "edge_type", "claim_reference", "claim_status", "expires_at", "details"} <= claim_cols
+    assert {"edge_id", "edge_type", "payment_reference", "payment_status", "fee_amount", "settled_at", "details"} <= payment_cols
+
+
 # ---------------------------------------------------------------------------
 # Relationships
 # ---------------------------------------------------------------------------
@@ -174,6 +265,60 @@ def test_relationships(session):
     assert tx.ledger is ledger
 
 
+def test_graph_relationships(session):
+    """Graph edges connect graph accounts and subtype detail rows 1:1."""
+    now = datetime.now(timezone.utc)
+
+    asset = Asset(asset_type="native", asset_code="XLM")
+    source = GraphAccount(
+        account_address="G" + "S" * 55,
+        account_type="wallet",
+        first_seen_at=now,
+        last_seen_at=now,
+    )
+    destination = GraphAccount(
+        account_address="G" + "R" * 55,
+        account_type="merchant",
+        first_seen_at=now,
+        last_seen_at=now,
+    )
+
+    session.add_all([asset, source, destination])
+    session.flush()
+
+    edge = GraphEdge(
+        edge_type="payment",
+        source_account_id=source.id,
+        destination_account_id=destination.id,
+        asset_id=asset.id,
+        occurred_at=now,
+        ledger_sequence=123,
+        event_index=1,
+        transaction_hash="e" * 64,
+        external_event_id="payment:e" + "1" * 16,
+        amount=25,
+        status="settled",
+    )
+    session.add(edge)
+    session.flush()
+
+    detail = GraphPaymentDetail(
+        edge_id=edge.id,
+        payment_reference="invoice-42",
+        payment_status="settled",
+        fee_amount=1.5,
+    )
+    session.add(detail)
+    session.flush()
+
+    session.refresh(edge)
+
+    assert edge.source_account is source
+    assert edge.destination_account is destination
+    assert edge.payment_detail is detail
+    assert detail.edge is edge
+
+
 # ---------------------------------------------------------------------------
 # Round-trip insert & query
 # ---------------------------------------------------------------------------
@@ -235,3 +380,69 @@ def test_insert_and_query(session):
     assert session.get(Operation, op.id) is op
     assert session.get(Account, "G" + "D" * 55) is account
     assert session.get(Asset, asset.id) is asset
+
+
+def test_graph_edge_external_event_id_must_be_unique_per_type(session):
+    """Composite uniqueness supports idempotent edge ingestion."""
+    now = datetime.now(timezone.utc)
+
+    source = GraphAccount(
+        account_address="G" + "E" * 55,
+        first_seen_at=now,
+        last_seen_at=now,
+    )
+    destination = GraphAccount(
+        account_address="G" + "F" * 55,
+        first_seen_at=now,
+        last_seen_at=now,
+    )
+    session.add_all([source, destination])
+    session.flush()
+
+    first = GraphEdge(
+        edge_type="transaction",
+        source_account_id=source.id,
+        destination_account_id=destination.id,
+        occurred_at=now,
+        external_event_id="tx:duplicate-key",
+    )
+    duplicate = GraphEdge(
+        edge_type="transaction",
+        source_account_id=source.id,
+        destination_account_id=destination.id,
+        occurred_at=now,
+        external_event_id="tx:duplicate-key",
+    )
+
+    session.add(first)
+    session.flush()
+    session.add(duplicate)
+
+    with pytest.raises(IntegrityError):
+        session.flush()
+
+
+def test_graph_edge_detail_type_constraints_are_declared():
+    """Subtype tables pin detail rows to a single graph edge type."""
+    transaction_checks = {c.name for c in GraphTransactionDetail.__table__.constraints}
+    claim_checks = {c.name for c in GraphClaimDetail.__table__.constraints}
+    payment_checks = {c.name for c in GraphPaymentDetail.__table__.constraints}
+
+    assert "ck_graph_transaction_details_edge_type" in transaction_checks
+    assert "ck_graph_claim_details_edge_type" in claim_checks
+    assert "ck_graph_payment_details_edge_type" in payment_checks
+
+
+def test_graph_edge_indexes_cover_time_series_paths():
+    """The graph mirror exposes the expected timeline-oriented indexes."""
+    index_names = {index.name for index in GraphEdge.__table__.indexes}
+    assert {
+        "ix_graph_edges_occurred_at",
+        "ix_graph_edges_source_occurred_at",
+        "ix_graph_edges_destination_occurred_at",
+        "ix_graph_edges_type_occurred_at",
+        "ix_graph_edges_asset_occurred_at",
+        "ix_graph_edges_status_occurred_at",
+        "ix_graph_edges_tx_hash",
+        "ix_graph_edges_ledger_event",
+    } <= index_names
diff --git a/tests/test_temporal_split.py b/tests/test_temporal_split.py
new file mode 100644
index 0000000..896871d
--- /dev/null
+++ b/tests/test_temporal_split.py
@@ -0,0 +1,253 @@
+"""Tests for astroml.training.temporal_split."""
+import warnings
+from dataclasses import dataclass
+from typing import Any
+
+import pytest
+
+from astroml.training.temporal_split import (
+    GraphSplitResult,
+    LeakageError,
+    TemporalSplitter,
+    temporal_graph_split,
+    validate_graph_split,
+)
+
+
+# ---------------------------------------------------------------------------
+# Minimal edge stub (mirrors astroml.features.graph.snapshot.Edge)
+# ---------------------------------------------------------------------------
+
+@dataclass(frozen=True)
+class FakeEdge:
+    src: str
+    dst: str
+    timestamp: int
+
+
+def _make_edges(n: int = 10) -> list:
+    """Return n sequential edges with timestamps 0..n-1."""
+    return [FakeEdge(src=f"a{i}", dst=f"b{i}", timestamp=i) for i in range(n)]
+
+
+# ---------------------------------------------------------------------------
+# temporal_graph_split — ratio mode
+# ---------------------------------------------------------------------------
+
+class TestTemporalGraphSplitRatio:
+    def test_basic_split(self):
+        edges = _make_edges(10)
+        result = temporal_graph_split(edges, train_ratio=0.8)
+
+        assert len(result.train_edges) == 8
+        assert len(result.test_edges) == 2
+
+    def test_train_strictly_before_test(self):
+        edges = _make_edges(10)
+        result = temporal_graph_split(edges, train_ratio=0.7)
+
+        train_max = max(e.timestamp for e in result.train_edges)
+        test_min = min(e.timestamp for e in result.test_edges)
+        assert train_max < test_min
+
+    def test_shuffled_input_still_splits_temporally(self):
+        import random
+        edges = _make_edges(20)
+        random.seed(0)
+        random.shuffle(edges)
+
+        result = temporal_graph_split(edges, train_ratio=0.8)
+
+        train_max = max(e.timestamp for e in result.train_edges)
+        test_min = min(e.timestamp for e in result.test_edges)
+        assert train_max < test_min
+
+    def test_invalid_ratio_raises(self):
+        edges = _make_edges(5)
+        with pytest.raises(ValueError, match="train_ratio"):
+            temporal_graph_split(edges, train_ratio=0.0)
+        with pytest.raises(ValueError, match="train_ratio"):
+            temporal_graph_split(edges, train_ratio=1.0)
+
+    def test_empty_edges_returns_empty(self):
+        result = temporal_graph_split([], train_ratio=0.8)
+        assert result.train_edges == []
+        assert result.test_edges == []
+
+
+# ---------------------------------------------------------------------------
+# temporal_graph_split — cutoff mode
+# ---------------------------------------------------------------------------
+
+class TestTemporalGraphSplitCutoff:
+    def test_cutoff_partitioning(self):
+        edges = _make_edges(10)
+        result = temporal_graph_split(edges, cutoff=5)
+
+        assert all(e.timestamp < 5 for e in result.train_edges)
+        assert all(e.timestamp >= 5 for e in result.test_edges)
+        assert len(result.train_edges) + len(result.test_edges) == 10
+
+    def test_cutoff_before_all_warns(self):
+        edges = _make_edges(5)
+        with warnings.catch_warnings(record=True) as caught:
+            warnings.simplefilter("always")
+            result = temporal_graph_split(edges, cutoff=0)
+        assert result.train_edges == []
+        assert any("empty" in str(w.message).lower() for w in caught)
+
+    def test_cutoff_after_all_warns(self):
+        edges = _make_edges(5)
+        with warnings.catch_warnings(record=True) as caught:
+            warnings.simplefilter("always")
+            result = temporal_graph_split(edges, cutoff=999)
+        assert result.test_edges == []
+        assert any("empty" in str(w.message).lower() for w in caught)
+
+    def test_resolved_cutoff_stored(self):
+        edges = _make_edges(10)
+        result = temporal_graph_split(edges, cutoff=7)
+        assert result.cutoff == 7
+
+
+# ---------------------------------------------------------------------------
+# temporal_graph_split — edge validation
+# ---------------------------------------------------------------------------
+
+class TestTemporalGraphSplitValidation:
+    def test_missing_time_attr_raises(self):
+        @dataclass
+        class BadEdge:
+            src: str
+            dst: str
+            # no timestamp
+
+        edges = [BadEdge(src="a", dst="b")]
+        with pytest.raises(ValueError, match="no attribute"):
+            temporal_graph_split(edges)
+
+    def test_custom_time_attr(self):
+        @dataclass(frozen=True)
+        class TimedEdge:
+            src: str
+            dst: str
+            created_at: int
+
+        edges = [TimedEdge(src=f"a{i}", dst=f"b{i}", created_at=i) for i in range(10)]
+        result = temporal_graph_split(edges, train_ratio=0.7, time_attr="created_at")
+
+        assert len(result.train_edges) == 7
+        assert len(result.test_edges) == 3
+
+
+# ---------------------------------------------------------------------------
+# validate_graph_split
+# ---------------------------------------------------------------------------
+
+class TestValidateGraphSplit:
+    def test_clean_split_returns_true(self):
+        edges = _make_edges(10)
+        result = temporal_graph_split(edges, train_ratio=0.8)
+        assert validate_graph_split(result) is True
+
+    def test_overlap_raises_leakage_error(self):
+        edges = _make_edges(10)
+        # Manually construct an overlapping result.
+        bad = GraphSplitResult(
+            train_edges=edges[:7],   # timestamps 0-6
+            test_edges=edges[5:],    # timestamps 5-9 — overlap at 5, 6
+            cutoff=5,
+        )
+        with pytest.raises(LeakageError, match="overlap"):
+            validate_graph_split(bad)
+
+    def test_empty_partitions_are_valid(self):
+        edges = _make_edges(5)
+        empty_result = GraphSplitResult(train_edges=[], test_edges=edges, cutoff=0)
+        assert validate_graph_split(empty_result) is True
+
+
+# ---------------------------------------------------------------------------
+# TemporalSplitter — DataFrame
+# ---------------------------------------------------------------------------
+
+class TestTemporalSplitterDataFrame:
+    def test_dataframe_split(self):
+        import pandas as pd
+        import numpy as np
+
+        df = pd.DataFrame({
+            "timestamp": pd.date_range("2024-01-01", periods=10, freq="D"),
+            "value": np.arange(10, dtype=float),
+        })
+        splitter = TemporalSplitter(train_ratio=0.8, time_col="timestamp")
+        train, test = splitter.split_dataframe(df)
+
+        assert len(train) == 8
+        assert len(test) == 2
+        assert train["timestamp"].max() < test["timestamp"].min()
+
+    def test_dataframe_cutoff(self):
+        import pandas as pd
+        import numpy as np
+
+        df = pd.DataFrame({
+            "ts": pd.date_range("2024-01-01", periods=10, freq="D"),
+            "v": np.arange(10),
+        })
+        cutoff = pd.Timestamp("2024-01-06")
+        splitter = TemporalSplitter(cutoff=cutoff, time_col="ts")
+        train, test = splitter.split_dataframe(df)
+
+        assert (train["ts"] < cutoff).all()
+        assert (test["ts"] >= cutoff).all()
+
+    def test_dataframe_overlap_raises(self):
+        """TemporalSplitter validates and re-raises LeakageError."""
+        import pandas as pd
+        import numpy as np
+        from astroml.training.temporal_split import LeakageError, validate_temporal_split
+
+        df = pd.DataFrame({
+            "timestamp": pd.date_range("2024-01-01", periods=10, freq="D"),
+            "v": np.arange(10),
+        })
+        # Directly call validate_temporal_split with overlapping frames.
+        train = df.iloc[:7].copy()
+        test = df.iloc[5:].copy()
+        with pytest.raises(LeakageError, match="overlap"):
+            validate_temporal_split(train, test, "timestamp")
+
+
+# ---------------------------------------------------------------------------
+# TemporalSplitter — graph edges
+# ---------------------------------------------------------------------------
+
+class TestTemporalSplitterEdges:
+    def test_edge_split_via_splitter(self):
+        edges = _make_edges(20)
+        splitter = TemporalSplitter(train_ratio=0.75)
+        result = splitter.split_edges(edges)
+
+        assert len(result.train_edges) == 15
+        assert len(result.test_edges) == 5
+        assert validate_graph_split(result) is True
+
+    def test_splitter_invalid_ratio(self):
+        with pytest.raises(ValueError, match="train_ratio"):
+            TemporalSplitter(train_ratio=1.5)
+
+    def test_no_leakage_guarantee(self):
+        """Core property: no test edge timestamp precedes any train timestamp."""
+        import random
+        edges = _make_edges(100)
+        random.seed(42)
+        random.shuffle(edges)
+
+        splitter = TemporalSplitter(train_ratio=0.8)
+        result = splitter.split_edges(edges)
+
+        train_timestamps = {e.timestamp for e in result.train_edges}
+        test_timestamps = {e.timestamp for e in result.test_edges}
+        # The maximum training timestamp must be strictly less than every test timestamp.
+        assert max(train_timestamps) < min(test_timestamps)
diff --git a/train.py b/train.py
index 86b5d4c..d6d46d3 100644
--- a/train.py
+++ b/train.py
@@ -21,6 +21,8 @@
 from hydra.utils import instantiate, get_original_cwd
 
 from astroml.models.gcn import GCN
+from astroml.tracking import MLflowTracker
+from astroml.training.temporal_split import TemporalSplitter
 
 # Set up logging
 logging.basicConfig(level=logging.INFO)
@@ -38,6 +40,69 @@ def set_device(device_config: str) -> torch.device:
     return device
 
 
+def apply_temporal_masks(data: Any, cfg: DictConfig) -> Any:
+    """Replace dataset masks with strict temporal train/val/test splits.
+
+    When ``training.temporal_split.enabled`` is true the existing random masks
+    on *data* are discarded and rebuilt so that:
+
+    * ``train_mask`` covers the earliest ``train_ratio`` fraction of nodes
+      (sorted by node index as a proxy for ingestion order when no explicit
+      timestamp is available on the graph data object).
+    * ``val_mask`` covers the next ``val_split`` fraction.
+    * ``test_mask`` covers the remaining nodes.
+
+    If the graph data carries a ``node_timestamps`` attribute it is used for
+    sorting instead of node index.
+    """
+    split_cfg = cfg.training.get("temporal_split", {})
+    if not split_cfg.get("enabled", False):
+        return data
+
+    n = data.num_nodes
+    train_ratio = split_cfg.get("train_ratio", 0.8)
+    val_split = cfg.training.get("val_split", 0.1)
+
+    # Determine sort order: prefer an explicit timestamp attribute, else use index.
+    if hasattr(data, "node_timestamps") and data.node_timestamps is not None:
+        order = data.node_timestamps.argsort()
+        logger.info("Temporal split: sorting nodes by node_timestamps attribute")
+    else:
+        order = torch.arange(n)
+        logger.info(
+            "Temporal split: no node_timestamps found — using node index as "
+            "temporal proxy (assumes nodes were appended in time order)"
+        )
+
+    train_end = int(n * train_ratio)
+    val_end = train_end + int(n * val_split)
+
+    train_nodes = order[:train_end]
+    val_nodes = order[train_end:val_end]
+    test_nodes = order[val_end:]
+
+    train_mask = torch.zeros(n, dtype=torch.bool)
+    val_mask = torch.zeros(n, dtype=torch.bool)
+    test_mask = torch.zeros(n, dtype=torch.bool)
+
+    train_mask[train_nodes] = True
+    val_mask[val_nodes] = True
+    test_mask[test_nodes] = True
+
+    data.train_mask = train_mask
+    data.val_mask = val_mask
+    data.test_mask = test_mask
+
+    logger.info(
+        "Temporal masks applied: train=%d val=%d test=%d (total %d nodes)",
+        train_mask.sum().item(),
+        val_mask.sum().item(),
+        test_mask.sum().item(),
+        n,
+    )
+    return data
+
+
 def load_dataset(cfg: DictConfig) -> Any:
     """Load and prepare the dataset."""
     logger.info(f"Loading dataset: {cfg.data.name}")
@@ -125,33 +190,69 @@ def train(cfg: DictConfig) -> Dict[str, Any]:
     """Main training function."""
     # Set up device
     device = set_device(cfg.experiment.device)
-    
+
+    # Build MLflow tracker (no-op when disabled)
+    mlflow_cfg = cfg.get("mlflow", {})
+    tracker = MLflowTracker(
+        enabled=mlflow_cfg.get("enabled", False),
+        tracking_uri=mlflow_cfg.get("tracking_uri", "mlruns"),
+        experiment_name=mlflow_cfg.get("experiment_name", cfg.experiment.name),
+        run_name=mlflow_cfg.get("run_name", None),
+        log_model_weights=mlflow_cfg.get("log_model_weights", True),
+    )
+
+    # Log hyper-parameters once
+    tracker.log_params({
+        "model": cfg.model.get("_target_", "gcn"),
+        "hidden_dims": str(cfg.model.get("hidden_dims", [])),
+        "dropout": cfg.model.get("dropout", None),
+        "optimizer": cfg.training.optimizer,
+        "lr": cfg.training.lr,
+        "weight_decay": cfg.training.weight_decay,
+        "epochs": cfg.training.epochs,
+        "seed": cfg.experiment.seed,
+    })
+
     # Load dataset
     dataset, data = load_dataset(cfg)
+    # Apply temporal masks before moving to device (masks are CPU tensors).
+    data = apply_temporal_masks(data, cfg)
     data = data.to(device)
-    
+
     # Create model
     model = create_model(cfg, dataset)
     model = model.to(device)
-    
+
     # Create optimizer
     optimizer = create_optimizer(cfg, model)
-    
+
     # Training loop
     logger.info(f"Starting training for {cfg.training.epochs} epochs")
-    
+
     best_val_acc = 0.0
     patience_counter = 0
-    
+    best_model_path = Path(cfg.experiment.save_dir) / "best_model.pth"
+
     for epoch in range(cfg.training.epochs):
         # Train
         train_loss = train_epoch(model, data, optimizer, device)
-        
+
         # Evaluate
         train_metrics = evaluate(model, data, device, "train_mask")
         val_metrics = evaluate(model, data, device, "val_mask")
-        
-        # Log progress
+
+        # Log metrics to MLflow every epoch
+        tracker.log_metrics(
+            {
+                "train_loss": train_loss,
+                "train_acc": train_metrics["accuracy"],
+                "val_loss": val_metrics["loss"],
+                "val_acc": val_metrics["accuracy"],
+            },
+            step=epoch,
+        )
+
+        # Log progress to console at intervals
         if epoch % cfg.training.log_interval == 0:
             logger.info(
                 f"Epoch {epoch:3d} | "
@@ -159,41 +260,53 @@ def train(cfg: DictConfig) -> Dict[str, Any]:
                 f"Train Acc: {train_metrics['accuracy']:.4f} | "
                 f"Val Acc: {val_metrics['accuracy']:.4f}"
             )
-        
+
         # Early stopping
         if val_metrics['accuracy'] > best_val_acc:
             best_val_acc = val_metrics['accuracy']
             patience_counter = 0
-            
+
             # Save best model
             if cfg.training.save_best_only:
-                torch.save(model.state_dict(), 
-                          Path(cfg.experiment.save_dir) / "best_model.pth")
+                torch.save(model.state_dict(), best_model_path)
         else:
             patience_counter += 1
-        
-        if (cfg.training.early_stopping.patience > 0 and 
-            patience_counter >= cfg.training.early_stopping.patience):
+
+        if (cfg.training.early_stopping.patience > 0 and
+                patience_counter >= cfg.training.early_stopping.patience):
             logger.info(f"Early stopping at epoch {epoch}")
             break
-    
+
     # Final evaluation
     test_metrics = evaluate(model, data, device, "test_mask")
     logger.info(f"Test Accuracy: {test_metrics['accuracy']:.4f}")
-    
+
+    # Log final test metrics
+    tracker.log_metrics({
+        "test_acc": test_metrics["accuracy"],
+        "test_loss": test_metrics["loss"],
+        "best_val_acc": best_val_acc,
+    })
+
     # Save final model
+    last_model_path = Path(cfg.experiment.save_dir) / "last_model.pth"
     if cfg.training.save_last:
-        torch.save(model.state_dict(), 
-                  Path(cfg.experiment.save_dir) / "last_model.pth")
-    
+        torch.save(model.state_dict(), last_model_path)
+
+    # Log model artifact
+    checkpoint = best_model_path if best_model_path.exists() else last_model_path
+    tracker.log_model_artifact(model, artifact_path="model", checkpoint_path=str(checkpoint))
+
     # Save configuration
     OmegaConf.save(cfg, Path(cfg.experiment.save_dir) / "config.yaml")
-    
+
+    tracker.end()
+
     return {
         "test_accuracy": test_metrics['accuracy'],
         "test_loss": test_metrics['loss'],
         "best_val_accuracy": best_val_acc,
-        "epochs_trained": epoch + 1
+        "epochs_trained": epoch + 1,
     }