-
Notifications
You must be signed in to change notification settings - Fork 574
Description
Summary
When attempting to use a deepmd trained model with LAMMPS on a HPC system, using any more than a few computing cores will consistently lead to a "C++ memory allocation failed: std::bad_alloc" error.
If the total processor count is very low (less than about 20) the LAMMPS run will commence fine, but not many more than that.
DeePMD-kit Version
DeePMD-kit v3.0.2
Backend and its version
2.15.0
Python Version, CUDA Version, GCC Version, LAMMPS Version, etc
Python 3.9.2
CUDA 12.2
GCC 9.1.0
LAMMPS 29 Aug 2024 - Update 1
Details
I am using a model with 5 different elements. Type_one_side is set to it's default value of FALSE and, when the model is compressed, the step size is also set to default. These values give a (seemingly) fairly large potential file of ~300 Megabytes. I assume this is a likely cause of the following error.
When attempting to use this trained potential on a LAMMPS script, I cannot use more than ~20 compute nodes without getting a C++ Memory Alloc error. This occurs during the first step when the model would be called (minimization, run, etc. etc.) Might this be a DeePMD error due to file size? Or is it more likely an error on the HPC side.