very slow start time when parallelizing #167

Remi-Gau · 2023-09-18T02:46:05Z

Testing it on our large test nodes, the commands seem to work quite well for a single subject
would like to parallelize them to process my entire study.
participants each have around 30 sessions.
Attempting to parallelize each subject on our GPU clusters appears to fail, the jobs keep getting killed due to being out of memory. In fact, BIDSMREYE seems to take an extremely long time just to begin, about several hours for the job to begin.

#!/bin/bash -l

#SBATCH --job-name=[bidsmreye]
#SBATCH -o log/bidsmreye_%a.txt
#SBATCH -e log/bidsmreye_%a.err
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem-per-cpu=8G
#SBATCH --account=DBIC
#SBATCH --partition=gpuq
#SBATCH --gres=gpu:2
#SBATCH --time=7-01:00:00
#SBATCH --mail-type=FAIL,END
#SBATCH --requeue
#SBATCH --array=0-11

# Output and error log directories
output_log_dir="log"
error_log_dir="log"

# Create the directories if they don't exist
mkdir -p "$output_log_dir"
mkdir -p "$error_log_dir"

# Must run on a GPU node
module load cuda
module load TensorRT
nvidia-smi
echo $CUDA_VISIBLE_DEVICES
hostname

# bidsmreye requires input fmridata (fmriprep outputs) to be at least realigned
# Filenames and structure that conforms to a BIDS derivative dataset

# Had to add these lines to initialize conda
conda init bash
source ~/.bashrc
conda activate deepmreye

# Check if SLURM_ARRAY_TASK_ID is not set or is empty
if [ -z "$SLURM_ARRAY_TASK_ID" ]; then
    # Set SLURM_ARRAY_TASK_ID to a default value, e.g., 1
    SLURM_ARRAY_TASK_ID=0
fi

bids_dir="/dartfs-hpc/rc/lab/C/CANlab/labdata/data/WASABI/derivatives/fmriprep-try2"
output_dir="/dartfs-hpc/rc/lab/C/CANlab/labdata/data/WASABI/derivatives/deepmreye"
SUBJECTS=(SID000002 SID000743 SID001567 SID001651 SID001804 SID001907 SID001641 SID001684 SID001852 SID002035 SID002263 SID002328)
SUBJ=${SUBJECTS[$SLURM_ARRAY_TASK_ID]}
echo "processing bidsmreye for ${SUBJ}..."

# Preparing the data, then Computing the eye movements (action prepare; action generalize)
# Prepare: registers the data to MNI if this is not the case already, registers the data the the deepmreye template, extracts data from the eyes mask
bidsmreye --action all \
    ${bids_dir} \
    ${output_dir} \
    participant --participant_label ${SUBJ} 
    
# Group Level Summary
bidsmreye --action qc \
    ${bids_dir} \
    ${output_dir} \
    participant --participant_label ${SUBJ} 

echo "processing complete"

github-actions · 2023-09-18T02:46:27Z

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

Michael-Sun · 2023-09-18T16:52:09Z

To further clarify this issue, this occurs when using the conda environment installed bidsmreye. The following messages appear before processing begins:

2023-09-18 12:41:18.717612: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-09-18 12:41:25.070354: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT

Remi-Gau changed the title ~~very slow start time when paralelizing~~ very slow start time when paralellizing Sep 18, 2023

Remi-Gau changed the title ~~very slow start time when paralellizing~~ very slow start time when parallelizing Sep 18, 2023

Remi-Gau added this to the 0.4.0 milestone Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

very slow start time when parallelizing #167

very slow start time when parallelizing #167

Remi-Gau commented Sep 18, 2023 •

edited

Loading

github-actions bot commented Sep 18, 2023

Michael-Sun commented Sep 18, 2023

very slow start time when parallelizing #167

very slow start time when parallelizing #167

Comments

Remi-Gau commented Sep 18, 2023 • edited Loading

github-actions bot commented Sep 18, 2023

Michael-Sun commented Sep 18, 2023

Remi-Gau commented Sep 18, 2023 •

edited

Loading