This repository contains code and synthetic data for the manuscript:
Joo, Y. Y., Kim, B. G., Kim, G., Lee, E., Seo, J., & Cha, J. (2025).
Polygenic architecture of brain structure and function, behaviors, and psychopathologies in children.
- Overview
- Repository Structure
- Getting Started
- Data Availability
- Usage
- Authors and Contributors
- License
- Contact
- Citation
This study examines the genetic architecture of brain structure, function, behaviors, and psychopathologies in a large cohort of preadolescent children. Using genome-wide polygenic scores (GPSs) across multiple traits and advanced multivariate analyses, we uncover key connections between genetic risk profiles and a wide range of brain imaging-derived phenotypes, cognitive measures, and psychological traits.
- code/: Jupyter Notebooks and scripts for reproducing key aspects of the analysis.
- Note: For heritability calculations, only sample code is included due to restrictions on sharing real SNP data.
We do not host raw participant data from the ABCD Study due to data use agreements.
-
Clone the repository
git clone https://github.com/your-username/abcd-gps.git
-
Install required packages
- Refer to the
code/requirements.txt(or any environment file) for a list of Python/R packages. - We recommend using a virtual environment or Conda environment to avoid dependency conflicts.
- Refer to the
-
Reproduce analyses with synthetic data
- Each main analysis step is in a separate notebook or script in
code/. - Since only synthetic data are provided, results will not match those in the manuscript exactly, but the workflow and structure remain the same.
- Each main analysis step is in a separate notebook or script in
-
ABCD Study: Real data can be accessed from the Adolescent Brain Cognitive Development (ABCD) Study upon approval.
-
Synthetic Data: This repository uses synthetic data generated from the real cohort using CTGAN and other generative methods.
- The synthetic dataset consists of 100 randomly sampled participants and mimics the structure of the ABCD dataset.
- Due to GitHub file size limits, data are externally hosted and can be downloaded from the following link:
-
GWAS Summary Statistics: Publicly available GWAS summary statistics are referenced in the manuscript. However, only sample code is provided here for heritability-related analyses due to privacy and data-use constraints.
-
Preprocessing
- Scripts for basic data cleaning, quality control, and merging across modalities in the synthetic dataset.
-
Analysis
- Main analyses (e.g., SGCCA, heritability scripts, GPS-based predictions) are demonstrated with sample code in
code/.
- Main analyses (e.g., SGCCA, heritability scripts, GPS-based predictions) are demonstrated with sample code in
-
Visualization
- Example scripts for generating figures from the synthetic dataset.
Please note that the synthetic data do not reflect actual results or distributions of the ABCD Study participants.
- Yoonjung (Yoonie) Joo
- Bo-Gyeom Kim
- Gakyung Kim
- Eunji Lee
- Jungwoo Seo
- Jiook Cha (Corresponding Author)
For questions, please contact Jiook Cha.
This project is licensed under the terms of the MIT License.
Please see the LICENSE file for details.
For inquiries or suggestions related to this repository, please reach out to:
Jiook Cha, Ph.D.
connectome@snu.ac.kr
If you use this code (or any part of it) for your research, please cite our work: