Consistent C++ Memory Alloc Error

### Summary

When attempting to use a deepmd trained model with LAMMPS on a HPC system, using any more than a few computing cores will consistently lead to a "C++ memory allocation failed: std::bad_alloc" error.
If the total processor count is very low (less than about 20) the LAMMPS run will commence fine, but not many more than that.

### DeePMD-kit Version

DeePMD-kit v3.0.2

### Backend and its version

2.15.0

### Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

Python 3.9.2
CUDA 12.2
GCC 9.1.0
LAMMPS 29 Aug 2024 - Update 1

### Details

I am using a model with 5 different elements. Type_one_side is set to it's default value of FALSE and, when the model is compressed, the step size is also set to default. These values give a (seemingly) fairly large potential file of ~300 Megabytes. I assume this is a likely cause of the following error.

When attempting to use this trained potential on a LAMMPS script, I cannot use more than ~20 compute nodes without getting a C++ Memory Alloc error. This occurs during the first step when the model would be called (minimization, run, etc. etc.) Might this be a DeePMD error due to file size? Or is it more likely an error on the HPC side.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consistent C++ Memory Alloc Error #4871

Summary

DeePMD-kit Version

Backend and its version

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consistent C++ Memory Alloc Error #4871

Description

Summary

DeePMD-kit Version

Backend and its version

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions