Automated data ingestion pipelines for ARCHIMEDES study using the loris-php-api-client library.
This repository provides production-ready pipelines for clinical and imaging data ingestion, with features like bulk CSV upload, automated candidate creation, email notifications, and comprehensive logging.
The pipelines are expected to be installed on the predefined data mount for the collection of projects, which follows a fixed directory structure, and each project must include its own project.json file.
- Clinical Data Ingestion - Automated CSV processing and upload
- Clinical Instrument Install - Install REDCap or LINST instruments
- Bulk Operations - Process multiple files and projects
- Email Notifications - Success/failure reports via email
- Comprehensive Logging - Detailed execution logs with rotation
- Dry Run Mode - Test without making actual changes
- Imaging Data Ingestion - BIDS dataset ingestion
- Multi-Project Support - Handle multiple projects and collections
- PHP >= 8.1
- Composer
- loris-php-api-client (installed automatically)
- MySQL/MariaDB (for database fallback operations)
- Extensions:
curl,json,pdo,mbstring
cd /opt
git clone https://github.com/aces/archimedes-pipelines.git
cd archimedes-pipelines
composer installThis will automatically install:
aces/loris-php-api-client- Auto-generated LORIS API clientguzzlehttp/guzzle- HTTP clientmonolog/monolog- Loggingphpmailer/phpmailer- Email notifications
Copy the example config file and edit your LORIS credentials and collections:
cp config/loris_client_config.json.example config/loris_client_config.json
nano config/loris_client_config.jsonEach project requires a project.json file at its root. See config/project.json.example for reference.
The clinical pipeline follows this process:
1. Load Collections from Config
└── Read collections array from loris_client_config.json
├── Collection A
│ ├── Project 1 (enabled)
│ └── Project 2 (disabled)
└── Collection B
└── Project 1 (enabled)
2. For each enabled Collection:
└── For each enabled Project:
├── Load project.json configuration
├── Check if modality (clinical) is enabled
└── Continue to instrument processing
3. For each Instrument:
├── Check if instrument is installed in LORIS
├── If NOT installed:
│ ├── Look for definition in documentation/data_dictionary/
│ ├── Find .linst file OR REDCap data dictionary CSV
│ └── Install instrument via API
└── If installed:
└── Continue to data ingestion
4. Data Ingestion:
├── Read CSV from deidentified-raw/clinical/
├── Validate data against instrument schema
├── Create candidates (if not exist)
├── Create visits (if not exist)
└── Upload instrument data via API
5. Post-Processing:
├── Move processed files to processed/clinical/
├── Log results
└── Send email notification (if enabled)
Collections and projects are defined in loris_client_config.json. Each collection has a base path and a list of projects that can be individually enabled or disabled. See config/loris_client_config.json.example for reference.
Instrument definitions should be placed in the project's documentation/data_dictionary/ folder. The pipeline automatically detects the format (LINST or REDCap CSV) and installs accordingly.
php scripts/run_clinical_pipeline.php --all --dry-run --verbosephp scripts/run_clinical_pipeline.php --allphp scripts/run_clinical_pipeline.php --collection=COLLECTION_NAME --project=PROJECT_NAMEphp scripts/run_clinical_pipeline.php --collection=COLLECTION_NAME --project=PROJECT_NAME --instrument=INSTRUMENT_NAME| Option | Description |
|---|---|
--all |
Process all projects |
--collection=NAME |
Specific collection |
--project=NAME |
Specific project |
--instrument=NAME |
Specific instrument |
--dry-run |
Test without changes |
--verbose |
Detailed output |
--help |
Show help |
{collection_base_path}/{ProjectName}/
├── project.json # Project configuration
│
├── deidentified-raw/ # De-identified participant data
│ ├── clinical/ # Clinical instrument CSVs
│ ├── imaging/
│ │ └── dicoms/
│ ├── bids/
│ └── genomics/
│
├── deidentified-lorisid/ # LORIS-relabelled data
│ ├── clinical/
│ ├── imaging/
│ ├── bids/
│ └── genomics/
│
├── processed/ # Pipeline outputs
│ ├── clinical/
│ ├── imaging/
│ ├── bids/
│ │ └── derivatives/
│ └── freesurfer-output/
│
├── logs/ # Execution logs
│
└── documentation/
├── data_dictionary/ # Instrument definitions (.linst, REDCap CSV)
└── readme.txt
Logs are stored in each project's logs/ directory.
# View today's log
tail -f logs/clinical_$(date +%Y-%m-%d).log
# Search for errors
grep "ERROR" logs/clinical_YYYY-MM-DD.log