-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GFN-FF timing information, optimization for MD #1195
Comments
I have CPU efficiency about 80% for 24 CPU cores for pure GFN-FF for 800 atomic system. The stupid way to check it is to run your MD for couple minutes with time command and get something like:
Then, you can divide user time to real, and divide result on # of CPUs. For me, it is: Could you please do the same trick for your calc? |
Thank you for your reply - this is a helpful start. My system is reporting high CPU utilization, similar to yours ~80%, but the time to evaluate the GFNFF whole system (the bottleneck) is not well accelerated with more cores. The runtime is the same, whether I use 4 cores or 24 cores (around 7 seconds). In other words, it appears GFNFF on the whole system is using the cores, but somehow not benefitting from the usage. Could you please share some details about environment variables and xtb build instructions I can share with my computing center specialists? Here is an example job I am running to benchmark (xtb version 6.6.1) where ncores = 24 or 4: `ncores = 24 export OMP_STACKSIZE=20G xtb my_6000_atoms.pdb --oniom gfn2:gfnff my_inner_region_indices --alpb water --verbose` |
Looks like almost all time xtb spends on data sharing/context switching between threads. Especially for larger number of threads. For 4 cores, I have:
Again, with 80% CPU efficiency. So, you can try to find an optimal number of threads. I'm using 4 threads always and submit more tasks instead to occupy the whole node. |
Ok this is good to know. I guess I should use 4 or fewer cores for my workload. If it helps for benchmarking/knowledge purposes here are are some detailed timings for GFNFF energy/gradient evaluation:
If I have time I can look into the implementation, but is there any particular reason why specific terms in the hamiltonian benefit from threading or not? |
I did not update |
Good to know, I also notice that the SHAKE algorithm dominates the memory usage. shake=2 (all bonds) segfaults on 32GB memory, shake=1 or 0 are fine it seems, even on a modest 16GB memory. Is this what you would expect? In your opinion, how challenging would it be to parallelize these bottleneck terms like D3, GBSA, and EEQ terms? I may try do so with my code. |
For D3 and EEQ, we have library implementations with much better code quality. Correspondingly, I am fairly certain that the parallelization is also much better. In the long-term, we plan to replace the separate implementations in xtb with the libraries. But this will obviously take some time to do. |
I am running ONIOM molecular dynamics with GFN-FF as the low level and GFN2-XTB as the high level in ALPB solvent. Total system is ~5000 atoms, SQM region is ~60 atoms. Using 24 CPU cores. I notice the reported timings within the XTB logs for each energy+gradient are quite fast (0.003 seconds for GFN-FF, 0.4 seconds for GFN2-XTB). In practice, each MD step is ~5 seconds, and from my testing seems to be dominated by GFN-FF energy+gradient time. I know in the GFN-FF publication the expected time per step for a similar system should be around 1 second, and I am using more resources. I would be very happy with 0.5-1 seconds per step on these resources. Are there any further optimizations or tricks I can pursue to speed up my MD simulations, or to get some accurate baseline here? Any help would be greatly appreciated!
The text was updated successfully, but these errors were encountered: