Skip to content

Consistent C++ Memory Alloc Error #4871

@Sanderson1887

Description

@Sanderson1887

Summary

When attempting to use a deepmd trained model with LAMMPS on a HPC system, using any more than a few computing cores will consistently lead to a "C++ memory allocation failed: std::bad_alloc" error.
If the total processor count is very low (less than about 20) the LAMMPS run will commence fine, but not many more than that.

DeePMD-kit Version

DeePMD-kit v3.0.2

Backend and its version

2.15.0

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

Python 3.9.2
CUDA 12.2
GCC 9.1.0
LAMMPS 29 Aug 2024 - Update 1

Details

I am using a model with 5 different elements. Type_one_side is set to it's default value of FALSE and, when the model is compressed, the step size is also set to default. These values give a (seemingly) fairly large potential file of ~300 Megabytes. I assume this is a likely cause of the following error.

When attempting to use this trained potential on a LAMMPS script, I cannot use more than ~20 compute nodes without getting a C++ Memory Alloc error. This occurs during the first step when the model would be called (minimization, run, etc. etc.) Might this be a DeePMD error due to file size? Or is it more likely an error on the HPC side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions