Skip to content

Transconnectome/digital-phenotyping-fm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Digital Phenotyping Foundation Model (DPFM)

Foundation model for lifelog time series β€” representation learning, behavioral prediction, multi-modal alignment, and clinical outcome prediction.

Executive Summary

DPFM Vision

Core Hypotheses

4-Stage Pipeline

Data: 6 Modalities

3 Research Designs

SOTA vs DPFM

Roadmap 2026-2028


πŸ“š Knowledge Base (765 sources, 3 NotebookLM notebooks)

This project is backed by a systematically curated knowledge base spanning 482 papers and 765 total sources across three NotebookLM notebooks.

Notebook Sources Focus
Allostasis Theory 424 Theoretical foundation β€” allostatic regulation, interoception, autonomic control, brain-body interaction
Lifelog AI/FM Research 322 Computational methods β€” time-series FM, wearable FM, digital phenotyping, multimodal health AI, SSL, clinical prediction, lifestyle-omics, edge AI, missingness handling, smart ring
Predictive Coding 19 Cognitive measurement β€” prediction error, precision weighting, EMA methodology

482-paper literature survey covers 11 categories (A-K) from top venues: NeurIPS, ICML, ICLR, Nature Medicine, Lancet Digital Health, npj Digital Medicine, IEEE JBHI, IMWUT/UbiComp. See literature/references.bib and the 30-query NLM synthesis for the full SOTA analysis and Top 10 research questions.


🧬 Why Lifelog Data Matters: The Allostasis Framework

Allostasis β€” "stability through change" β€” is the body's continuous process of predicting metabolic needs and mobilizing resources before they are needed. Unlike homeostasis (reactive correction), allostasis is predictive regulation: the brain constantly generates forecasts about the body's upcoming energy demands and adjusts physiology proactively.

A smartwatch captures this allostatic regulation in real time:

Watch Signal What It Reflects
Heart rate & HRV Autonomic regulation, cardiac allostatic control
Sleep architecture Restorative prediction, metabolic recovery cycles
Stress score Sympathetic-parasympathetic balance
Activity/Steps Energy expenditure and behavioral regulation
SpO2 Respiratory-metabolic coupling
Body composition Long-term energy balance outcomes

When allostatic regulation works well, the body efficiently adapts β€” HR recovers quickly after stress, sleep architecture is resilient, circadian rhythms are stable. When it breaks down (allostatic overload), metabolic dysregulation accumulates, leading to hypertension, diabetes, cardiovascular disease, and other chronic conditions.

This gives us a principled scientific basis: lifelog time series are not arbitrary sensor streams β€” they are a continuous readout of the body's allostatic regulation. A foundation model trained on this data learns representations of how well (or poorly) an individual's regulatory system operates.

🧠 Behavioral Prediction: Predictive Coding Framework

Building on allostasis theory, we introduce the Predictive Coding EMA (Ecological Momentary Assessment) module β€” the first Galaxy Watch-based system for measuring cognitive prediction abilities in daily life.

Core Hypothesis

Allostatic regulation and cognitive prediction share fundamental mechanisms β€” both are predictive processes that minimize future uncertainty. If the body's allostatic system breaks down, does cognitive prediction also become less precise?

Predictive Coding Microtasks (Galaxy Watch 7 Ultra)

Task Input Duration Measures
Trajectory Prediction Rotating Bezel ~45s Spatial prediction error, precision weighting
Temporal Prediction Haptic + Tap ~40s Temporal accuracy, rhythm internalization
Sequence Prediction Bezel ~35s Statistical learning, volatility tracking
Sensorimotor Tracking Bezel ~35s Online prediction, sensorimotor integration
Oddball Detection Tap + Haptic ~25s Deviance detection (behavioral MMN)

EMA Protocol: 16 hourly sessions/day (08:00-23:00), ~30-60 seconds each Theoretical Basis: Friston (2005), Clark (2013) β€” brain as Bayesian prediction machine

Predictive Coding ↔ Allostasis Integration

Level Predictive Coding Allostatic Regulation
Temporal Scale Milliseconds-seconds Minutes-hours
Prediction Target Sensory input Metabolic demand
Error Signal Prediction error (PE) Allostatic load
Update Mechanism Precision weighting Autonomic adjustment
Pathology Aberrant precision Allostatic overload

πŸ”¬ Research: AI on Lifelog + Behavioral Data

Core Problem

Given continuous lifelog time series + behavioral prediction data from wearable devices:

  1. Learn general-purpose representations that capture allostatic regulation patterns
  2. Measure cognitive prediction abilities through ecological momentary assessment
  3. Align these representations with genetic predisposition (multi-omics) and clinical state
  4. Predict clinical outcomes and quantify how lifestyle + cognitive factors affect disease risk

Four-Stage Pipeline

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Stage 1: PRETRAIN β€” Self-supervised lifelog foundation model
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

 24h watch data β†’ 15-min patches (96 tokens/day) β†’ Temporal Transformer
 Objective: Masked Patch Modeling (reconstruct masked time segments)
 Output: z_lifelog β€” per-subject, per-day representation

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Stage 2: BEHAVIORAL β€” Predictive coding assessment  πŸ†•
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

 EMA tasks β†’ Prediction Error metrics β†’ z_behavioral representation
 Constructs: PE magnitude, precision weighting, learning rate, temporal accuracy
 Integration: z_behavioral ↔ z_lifelog circadian correlation

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Stage 3: ALIGN β€” Cross-modal contrastive alignment
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

 z_lifelog    ←─┐
 z_behavioral ←─┼── InfoNCE contrastive loss ──→ Shared representation space
 z_omics      ←──
 z_clinical   β†β”€β”˜

 Question: How does daily regulation (lifelog) + cognitive prediction (behavioral)
           relate to genetic predisposition (omics) and health state (clinical)?

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
 Stage 4: PREDICT β€” Clinical outcome & future risk
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

 [z_lifelog βŠ• z_behavioral βŠ• z_omics βŠ• z_clinical] β†’ Gated Fusion β†’ Risk Prediction
 Targets: HTN | T2DM | ASCVD | Dementia | Depression | Insomnia | Obesity
 + Lifestyle + Cognitive intervention effect quantification (Causal Forest)

Research Questions

# Question Method
RQ1 Can a foundation model learn meaningful representations of allostatic regulation? Self-supervised pretraining + probing tasks
RQ2 Are cognitive prediction abilities correlated with allostatic regulation patterns? z_lifelog ↔ z_behavioral circadian correlation
RQ3 How do lifelog + behavioral representations relate to genetic risk profiles? Contrastive alignment (multimodal)
RQ4 Can aligned representations predict clinical phenotype trajectories? Downstream finetuning on longitudinal outcomes
RQ5 Which omics markers are sensitive to lifestyle + cognitive changes? Longitudinal marker analysis + causal inference
RQ6 Can we quantify the effect of behavioral intervention per individual? Causal forest, what-if simulation

πŸ“Š Data Modalities

Modality Source Dimensionality Sampling
Lifelog Samsung Health Watch 10+ channels Minute-resolution, continuous
Behavioral Galaxy Watch 7 Ultra EMA 5 tasks Γ— 16 sessions/day Hourly microtasks
Genomics WGS + PGS 287+ East Asian polygenic scores One-time
Proteomics Olink Explore HT ~5,400 protein markers (NPX) Periodic
Microbiome 16s rRNA OTU abundance profiles Periodic
Clinical SMC Health Checkup InBody, BP, CGM, blood chemistry Periodic

Cohort: n=1,250, longitudinal follow-up (Samsung Medical Center, 2025–2028)

Behavioral Data Schema (Predictive Coding EMA)

trajectory_prediction:
  prediction_error: {unit: degrees, range: [0, 180]}
  response_time: {unit: ms, range: [500, 3000]}
  adjustment_count: {description: "uncertainty proxy"}
  learning_rate: {description: "Rescorla-Wagner alpha"}

temporal_prediction:
  temporal_error: {unit: ms, description: "produced - expected interval"}
  tap_variability: {unit: ms, description: "temporal precision"}
  tempo_condition: {values: [fast_500ms, medium_750ms, slow_1000ms]}

derived_metrics:
  precision_weighting: {description: "inverse variance of PE"}
  circadian_pe_variation: {description: "PE variation across day"}
  
context_integration:
  heart_rate_at_task: {unit: bpm}
  stress_score_pre_task: {range: [0, 100]}
  hours_since_wake: {unit: hours}

🎯 Target Clinical Phenotypes

Category Target Key Measurements Behavioral Hypothesis
Primary Hypertension Office BP, Ambulatory BP Impaired cardiovascular prediction
Type 2 Diabetes FBS, HbA1c, CGM TIR, HOMA-IR Metabolic prediction dysregulation
ASCVD ASCVD risk estimator Vascular allostatic overload
Secondary Dementia risk Memory functional Q, Emocog Cognitive prediction decline
Depression/Anxiety Mental Health Q Aberrant precision weighting
Insomnia Watch sleep + PSQI Circadian prediction disruption
Dyslipidemia TC, TG, HDL, LDL Lipid regulatory prediction
Obesity BMI, BIA, WHR, VFA Energy balance prediction

πŸ—οΈ Project Structure

digital-phenotyping-fm/
β”œβ”€β”€ literature/               # Paper study, NotebookLM knowledge bases
β”‚   β”œβ”€β”€ notes/               # Literature review notes  
β”‚   β”œβ”€β”€ reviews/             # Review summaries
β”‚   β”œβ”€β”€ references.bib       # Bibliography
β”‚   β”œβ”€β”€ seeds/              # πŸ†• Predictive coding core papers
β”‚   └── notebooks.md        # πŸ†• NotebookLM KB index
β”œβ”€β”€ src/dpfm/               # Core ML library
β”‚   β”œβ”€β”€ data/               # Data processors (lifelog, omics, clinical)
β”‚   β”œβ”€β”€ behavioral/         # πŸ†• Predictive coding EMA module
β”‚   β”‚   β”œβ”€β”€ analysis/       # PE metrics, time series analysis
β”‚   β”‚   β”œβ”€β”€ tasks/          # EMA task engine (watch integration)
β”‚   β”‚   └── visualization/  # Behavioral data visualization
β”‚   β”œβ”€β”€ models/             # Foundation model, alignment, predictors
β”‚   β”œβ”€β”€ training/           # Lightning modules (pretrain, align, finetune)
β”‚   └── evaluation/         # Metrics and benchmarks
β”œβ”€β”€ watch-app/              # πŸ†• Galaxy Watch 7 Ultra EMA app
β”‚   β”œβ”€β”€ app/src/           # Kotlin + Jetpack Compose for Wear OS
β”‚   └── gradle/            # Android build configuration
β”œβ”€β”€ configs/                # Hydra YAML configs
β”‚   β”œβ”€β”€ default.yaml       # Main configuration
β”‚   └── behavioral/        # πŸ†• EMA protocol configs
β”œβ”€β”€ data/                   # Data (gitignored except schemas/)
β”‚   └── schemas/           # Data dictionary (tracked)
β”œβ”€β”€ experiments/            # Per-experiment directories
β”œβ”€β”€ notebooks/              # Jupyter (exploration, analysis, figures)
β”‚   β”œβ”€β”€ analysis/          # Primary analysis notebooks
β”‚   β”œβ”€β”€ exploration/       # Exploratory data analysis
β”‚   β”œβ”€β”€ behavioral/        # πŸ†• Predictive coding analysis
β”‚   └── figures/           # Publication figures
β”œβ”€β”€ reports/                # Presentations, progress, papers
β”œβ”€β”€ scripts/                # CLI entry points
└── tests/                 # Unit tests

πŸš€ Setup & Installation

Core Package

# Clone and install
git clone https://github.com/Transconnectome/digital-phenotyping-fm.git
cd digital-phenotyping-fm
pip install -e ".[dev]"

# Optional: Install behavioral analysis module
pip install -e ".[behavioral]"

# Optional: Install omics processing dependencies
pip install -e ".[omics]"

Galaxy Watch App Development

# Install Android SDK and JDK 17
# Build watch app
cd watch-app
./gradlew build

NotebookLM Knowledge Bases (765 sources across 3 notebooks)

Notebook ID Sources Focus
Allostasis Theory 1846219f-a072-4544-9721-65a6aa89904f 424 Brain-body regulation, interoception, autonomic control
Lifelog AI/FM Research ebbba35c-09c6-4e2d-8e13-103c1b3a3676 322 11 categories: TS-FM, wearable FM, digital phenotyping, multimodal health, SSL, clinical prediction, lifestyle-omics, edge AI, missingness, smart ring
Predictive Coding b4946642-4c70-4d14-9758-82573eead20a 19 Prediction error, precision weighting, EMA methodology

Query using CLI or MCP:

# Allostasis framework
nlm notebook query 1846219f-a072-4544-9721-65a6aa89904f "allostasis wearable lifelog"

# Lifelog AI/FM research (482 papers indexed)
nlm notebook query ebbba35c-09c6-4e2d-8e13-103c1b3a3676 "wearable foundation model health prediction"

# Predictive coding theory
nlm notebook query b4946642-4c70-4d14-9758-82573eead20a "prediction error precision weighting EMA"

πŸ§ͺ Usage Examples

1. Lifelog Foundation Model Training

dpfm-train configs/pretrain.yaml

2. Behavioral Analysis

# Analyze EMA session data
dpfm-behavioral-analyze --data-path ./data/behavioral/sessions.json

# Plot circadian prediction error patterns
python -m dpfm.behavioral.visualization --plot-type circadian

3. Multi-Modal Alignment

dpfm-align configs/align.yaml

4. Clinical Prediction

dpfm-predict configs/predict.yaml

πŸ”¬ Scientific Foundations & Knowledge Bases

Three Pillars (765 sources)

1. Allostasis β€” 424-source KB

The body's predictive regulation system. Lifelog time series = continuous readout of allostatic regulation. Foundation model learns representations of regulatory quality.

  • Sterling (2012), McEwen (1998), Kleckner+ (2017), Barrett (2017)

2. Lifelog AI/FM β€” 322-source KB

482-paper systematic survey across 11 categories. SOTA: behavioral representation > raw sensor (ICML 2025), LSM-2/ECGFounder/SleepFM leading. 30-query NLM synthesis produced Top 10 research questions and 3 actionable research designs.

  • Top 5 open FM: PaPaGei, Pulse-PPG, NormWear, ECG-FM, Step2Heart
  • See literature/notes/nlm_query_synthesis.md for full synthesis

3. Predictive Coding β€” 19-source KB

Brain as Bayesian prediction machine. EMA microtasks measure prediction error and precision weighting in daily life. Links cognitive prediction to health regulation.

  • Friston (2005), Clark (2013), Rao & Ballard (1999), Shiffman+ (2008)

πŸ“ˆ Timeline & Development

Year Phase Focus
2026 Q1-Q2 Foundation Data infrastructure, EMA app deployment, lifelog FM pretrain
2026 Q3-Q4 Behavioral EMA data collection, predictive coding analysis, z_behavioral
2027 Integration Cross-modal alignment, lifestyle↔omics validation, longitudinal analysis
2028 Prediction Clinical outcome models, intervention quantification, PoC API

🀝 Collaboration

Samsung Medical Center Γ— Samsung MX Health (2025.10–2028.06)

  • Principal Investigator: SNU Connectome Lab
  • Cohort: n=1,250 participants
  • Data: Lifelog + Behavioral + Multi-omics + Clinical longitudinal

DPFM represents the first integration of allostatic regulation theory with predictive coding assessment in a lifelog foundation model framework. This unified approach enables unprecedented insights into the relationship between physiological regulation, cognitive prediction, and health outcomes.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors