Skip to content

Add Multi-Node Parallelism with MPI #22

@helvecioneto

Description

@helvecioneto

Description

Currently, pyfortracc.track(name_list, read_function) runs using multiprocessing.Pool, which limits parallelism to a single node. This feature request aims to add an option for users to run pyfortracc.track in a cluster environment with multi-node parallelism using MPI.

Proposed Solution

Introduce a parameter in name_list (e.g., name_list['mpi'] = True) that enables MPI-based parallelism. When this flag is set, the workload will be distributed across multiple nodes using mpi4py.

Implementation Example

Modify the processing logic to include MPI when name_list['mpi'] is True:

if args.multi_node:
    # MPI: Distribute files among MPI processes
    from mpi4py import MPI
    comm = MPI.COMM_WORLD
    size = comm.Get_size()   # Total number of processes
    rank = comm.Get_rank()   # Current process rank

    # Distribute files among processes
    local_files = [f for i, f in enumerate(input_files) if i % size == rank]

    # Each process independently processes its assigned files
    for file_path in local_files:
        process_file(file_path)

Usage

Users can enable multi-node parallelism by setting name_list['mpi'] = True:

import pyfortracc

name_list = {}
name_list['mpi'] = True

def read_function(path):
    return example_read(path)

pyfortracc.track(name_list, read_function)

Expected Benefits

  • Scalability: Ability to distribute workload across multiple nodes.
  • Performance: Faster processing for large datasets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions