Foundation model for lifelog time series β representation learning, behavioral prediction, multi-modal alignment, and clinical outcome prediction.
This project is backed by a systematically curated knowledge base spanning 482 papers and 765 total sources across three NotebookLM notebooks.
| Notebook | Sources | Focus |
|---|---|---|
| Allostasis Theory | 424 | Theoretical foundation β allostatic regulation, interoception, autonomic control, brain-body interaction |
| Lifelog AI/FM Research | 322 | Computational methods β time-series FM, wearable FM, digital phenotyping, multimodal health AI, SSL, clinical prediction, lifestyle-omics, edge AI, missingness handling, smart ring |
| Predictive Coding | 19 | Cognitive measurement β prediction error, precision weighting, EMA methodology |
482-paper literature survey covers 11 categories (A-K) from top venues: NeurIPS, ICML, ICLR, Nature Medicine, Lancet Digital Health, npj Digital Medicine, IEEE JBHI, IMWUT/UbiComp. See literature/references.bib and the 30-query NLM synthesis for the full SOTA analysis and Top 10 research questions.
Allostasis β "stability through change" β is the body's continuous process of predicting metabolic needs and mobilizing resources before they are needed. Unlike homeostasis (reactive correction), allostasis is predictive regulation: the brain constantly generates forecasts about the body's upcoming energy demands and adjusts physiology proactively.
A smartwatch captures this allostatic regulation in real time:
| Watch Signal | What It Reflects |
|---|---|
| Heart rate & HRV | Autonomic regulation, cardiac allostatic control |
| Sleep architecture | Restorative prediction, metabolic recovery cycles |
| Stress score | Sympathetic-parasympathetic balance |
| Activity/Steps | Energy expenditure and behavioral regulation |
| SpO2 | Respiratory-metabolic coupling |
| Body composition | Long-term energy balance outcomes |
When allostatic regulation works well, the body efficiently adapts β HR recovers quickly after stress, sleep architecture is resilient, circadian rhythms are stable. When it breaks down (allostatic overload), metabolic dysregulation accumulates, leading to hypertension, diabetes, cardiovascular disease, and other chronic conditions.
This gives us a principled scientific basis: lifelog time series are not arbitrary sensor streams β they are a continuous readout of the body's allostatic regulation. A foundation model trained on this data learns representations of how well (or poorly) an individual's regulatory system operates.
Building on allostasis theory, we introduce the Predictive Coding EMA (Ecological Momentary Assessment) module β the first Galaxy Watch-based system for measuring cognitive prediction abilities in daily life.
Allostatic regulation and cognitive prediction share fundamental mechanisms β both are predictive processes that minimize future uncertainty. If the body's allostatic system breaks down, does cognitive prediction also become less precise?
| Task | Input | Duration | Measures |
|---|---|---|---|
| Trajectory Prediction | Rotating Bezel | ~45s | Spatial prediction error, precision weighting |
| Temporal Prediction | Haptic + Tap | ~40s | Temporal accuracy, rhythm internalization |
| Sequence Prediction | Bezel | ~35s | Statistical learning, volatility tracking |
| Sensorimotor Tracking | Bezel | ~35s | Online prediction, sensorimotor integration |
| Oddball Detection | Tap + Haptic | ~25s | Deviance detection (behavioral MMN) |
EMA Protocol: 16 hourly sessions/day (08:00-23:00), ~30-60 seconds each Theoretical Basis: Friston (2005), Clark (2013) β brain as Bayesian prediction machine
| Level | Predictive Coding | Allostatic Regulation |
|---|---|---|
| Temporal Scale | Milliseconds-seconds | Minutes-hours |
| Prediction Target | Sensory input | Metabolic demand |
| Error Signal | Prediction error (PE) | Allostatic load |
| Update Mechanism | Precision weighting | Autonomic adjustment |
| Pathology | Aberrant precision | Allostatic overload |
Given continuous lifelog time series + behavioral prediction data from wearable devices:
- Learn general-purpose representations that capture allostatic regulation patterns
- Measure cognitive prediction abilities through ecological momentary assessment
- Align these representations with genetic predisposition (multi-omics) and clinical state
- Predict clinical outcomes and quantify how lifestyle + cognitive factors affect disease risk
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Stage 1: PRETRAIN β Self-supervised lifelog foundation model
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
24h watch data β 15-min patches (96 tokens/day) β Temporal Transformer
Objective: Masked Patch Modeling (reconstruct masked time segments)
Output: z_lifelog β per-subject, per-day representation
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Stage 2: BEHAVIORAL β Predictive coding assessment π
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
EMA tasks β Prediction Error metrics β z_behavioral representation
Constructs: PE magnitude, precision weighting, learning rate, temporal accuracy
Integration: z_behavioral β z_lifelog circadian correlation
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Stage 3: ALIGN β Cross-modal contrastive alignment
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
z_lifelog βββ
z_behavioral βββΌββ InfoNCE contrastive loss βββ Shared representation space
z_omics βββ€
z_clinical βββ
Question: How does daily regulation (lifelog) + cognitive prediction (behavioral)
relate to genetic predisposition (omics) and health state (clinical)?
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Stage 4: PREDICT β Clinical outcome & future risk
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
[z_lifelog β z_behavioral β z_omics β z_clinical] β Gated Fusion β Risk Prediction
Targets: HTN | T2DM | ASCVD | Dementia | Depression | Insomnia | Obesity
+ Lifestyle + Cognitive intervention effect quantification (Causal Forest)
| # | Question | Method |
|---|---|---|
| RQ1 | Can a foundation model learn meaningful representations of allostatic regulation? | Self-supervised pretraining + probing tasks |
| RQ2 | Are cognitive prediction abilities correlated with allostatic regulation patterns? | z_lifelog β z_behavioral circadian correlation |
| RQ3 | How do lifelog + behavioral representations relate to genetic risk profiles? | Contrastive alignment (multimodal) |
| RQ4 | Can aligned representations predict clinical phenotype trajectories? | Downstream finetuning on longitudinal outcomes |
| RQ5 | Which omics markers are sensitive to lifestyle + cognitive changes? | Longitudinal marker analysis + causal inference |
| RQ6 | Can we quantify the effect of behavioral intervention per individual? | Causal forest, what-if simulation |
| Modality | Source | Dimensionality | Sampling |
|---|---|---|---|
| Lifelog | Samsung Health Watch | 10+ channels | Minute-resolution, continuous |
| Behavioral | Galaxy Watch 7 Ultra EMA | 5 tasks Γ 16 sessions/day | Hourly microtasks |
| Genomics | WGS + PGS | 287+ East Asian polygenic scores | One-time |
| Proteomics | Olink Explore HT | ~5,400 protein markers (NPX) | Periodic |
| Microbiome | 16s rRNA | OTU abundance profiles | Periodic |
| Clinical | SMC Health Checkup | InBody, BP, CGM, blood chemistry | Periodic |
Cohort: n=1,250, longitudinal follow-up (Samsung Medical Center, 2025β2028)
trajectory_prediction:
prediction_error: {unit: degrees, range: [0, 180]}
response_time: {unit: ms, range: [500, 3000]}
adjustment_count: {description: "uncertainty proxy"}
learning_rate: {description: "Rescorla-Wagner alpha"}
temporal_prediction:
temporal_error: {unit: ms, description: "produced - expected interval"}
tap_variability: {unit: ms, description: "temporal precision"}
tempo_condition: {values: [fast_500ms, medium_750ms, slow_1000ms]}
derived_metrics:
precision_weighting: {description: "inverse variance of PE"}
circadian_pe_variation: {description: "PE variation across day"}
context_integration:
heart_rate_at_task: {unit: bpm}
stress_score_pre_task: {range: [0, 100]}
hours_since_wake: {unit: hours}| Category | Target | Key Measurements | Behavioral Hypothesis |
|---|---|---|---|
| Primary | Hypertension | Office BP, Ambulatory BP | Impaired cardiovascular prediction |
| Type 2 Diabetes | FBS, HbA1c, CGM TIR, HOMA-IR | Metabolic prediction dysregulation | |
| ASCVD | ASCVD risk estimator | Vascular allostatic overload | |
| Secondary | Dementia risk | Memory functional Q, Emocog | Cognitive prediction decline |
| Depression/Anxiety | Mental Health Q | Aberrant precision weighting | |
| Insomnia | Watch sleep + PSQI | Circadian prediction disruption | |
| Dyslipidemia | TC, TG, HDL, LDL | Lipid regulatory prediction | |
| Obesity | BMI, BIA, WHR, VFA | Energy balance prediction |
digital-phenotyping-fm/
βββ literature/ # Paper study, NotebookLM knowledge bases
β βββ notes/ # Literature review notes
β βββ reviews/ # Review summaries
β βββ references.bib # Bibliography
β βββ seeds/ # π Predictive coding core papers
β βββ notebooks.md # π NotebookLM KB index
βββ src/dpfm/ # Core ML library
β βββ data/ # Data processors (lifelog, omics, clinical)
β βββ behavioral/ # π Predictive coding EMA module
β β βββ analysis/ # PE metrics, time series analysis
β β βββ tasks/ # EMA task engine (watch integration)
β β βββ visualization/ # Behavioral data visualization
β βββ models/ # Foundation model, alignment, predictors
β βββ training/ # Lightning modules (pretrain, align, finetune)
β βββ evaluation/ # Metrics and benchmarks
βββ watch-app/ # π Galaxy Watch 7 Ultra EMA app
β βββ app/src/ # Kotlin + Jetpack Compose for Wear OS
β βββ gradle/ # Android build configuration
βββ configs/ # Hydra YAML configs
β βββ default.yaml # Main configuration
β βββ behavioral/ # π EMA protocol configs
βββ data/ # Data (gitignored except schemas/)
β βββ schemas/ # Data dictionary (tracked)
βββ experiments/ # Per-experiment directories
βββ notebooks/ # Jupyter (exploration, analysis, figures)
β βββ analysis/ # Primary analysis notebooks
β βββ exploration/ # Exploratory data analysis
β βββ behavioral/ # π Predictive coding analysis
β βββ figures/ # Publication figures
βββ reports/ # Presentations, progress, papers
βββ scripts/ # CLI entry points
βββ tests/ # Unit tests
# Clone and install
git clone https://github.com/Transconnectome/digital-phenotyping-fm.git
cd digital-phenotyping-fm
pip install -e ".[dev]"
# Optional: Install behavioral analysis module
pip install -e ".[behavioral]"
# Optional: Install omics processing dependencies
pip install -e ".[omics]"# Install Android SDK and JDK 17
# Build watch app
cd watch-app
./gradlew build| Notebook | ID | Sources | Focus |
|---|---|---|---|
| Allostasis Theory | 1846219f-a072-4544-9721-65a6aa89904f |
424 | Brain-body regulation, interoception, autonomic control |
| Lifelog AI/FM Research | ebbba35c-09c6-4e2d-8e13-103c1b3a3676 |
322 | 11 categories: TS-FM, wearable FM, digital phenotyping, multimodal health, SSL, clinical prediction, lifestyle-omics, edge AI, missingness, smart ring |
| Predictive Coding | b4946642-4c70-4d14-9758-82573eead20a |
19 | Prediction error, precision weighting, EMA methodology |
Query using CLI or MCP:
# Allostasis framework
nlm notebook query 1846219f-a072-4544-9721-65a6aa89904f "allostasis wearable lifelog"
# Lifelog AI/FM research (482 papers indexed)
nlm notebook query ebbba35c-09c6-4e2d-8e13-103c1b3a3676 "wearable foundation model health prediction"
# Predictive coding theory
nlm notebook query b4946642-4c70-4d14-9758-82573eead20a "prediction error precision weighting EMA"dpfm-train configs/pretrain.yaml# Analyze EMA session data
dpfm-behavioral-analyze --data-path ./data/behavioral/sessions.json
# Plot circadian prediction error patterns
python -m dpfm.behavioral.visualization --plot-type circadiandpfm-align configs/align.yamldpfm-predict configs/predict.yaml1. Allostasis β 424-source KB
The body's predictive regulation system. Lifelog time series = continuous readout of allostatic regulation. Foundation model learns representations of regulatory quality.
- Sterling (2012), McEwen (1998), Kleckner+ (2017), Barrett (2017)
2. Lifelog AI/FM β 322-source KB
482-paper systematic survey across 11 categories. SOTA: behavioral representation > raw sensor (ICML 2025), LSM-2/ECGFounder/SleepFM leading. 30-query NLM synthesis produced Top 10 research questions and 3 actionable research designs.
- Top 5 open FM: PaPaGei, Pulse-PPG, NormWear, ECG-FM, Step2Heart
- See
literature/notes/nlm_query_synthesis.mdfor full synthesis
3. Predictive Coding β 19-source KB
Brain as Bayesian prediction machine. EMA microtasks measure prediction error and precision weighting in daily life. Links cognitive prediction to health regulation.
- Friston (2005), Clark (2013), Rao & Ballard (1999), Shiffman+ (2008)
| Year | Phase | Focus |
|---|---|---|
| 2026 Q1-Q2 | Foundation | Data infrastructure, EMA app deployment, lifelog FM pretrain |
| 2026 Q3-Q4 | Behavioral | EMA data collection, predictive coding analysis, z_behavioral |
| 2027 | Integration | Cross-modal alignment, lifestyleβomics validation, longitudinal analysis |
| 2028 | Prediction | Clinical outcome models, intervention quantification, PoC API |
Samsung Medical Center Γ Samsung MX Health (2025.10β2028.06)
- Principal Investigator: SNU Connectome Lab
- Cohort: n=1,250 participants
- Data: Lifelog + Behavioral + Multi-omics + Clinical longitudinal
DPFM represents the first integration of allostatic regulation theory with predictive coding assessment in a lifelog foundation model framework. This unified approach enables unprecedented insights into the relationship between physiological regulation, cognitive prediction, and health outcomes.






