You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Travis CI containers only give me 4 GB of memory. Grappa is allocating all of that in the shared heap (or 4x that if the printout is the per-thread amount).
I read through source and cannot see how to set this myself. Is that supported?
I guess I can run only 2 procs, but I like testing on 4 even though this is oversubscribing, because oversubscribing seems to release more bugs in parallel runtimes.
+/home/travis/PRK-deps/mpich/bin/mpirun -n 4 GRAPPA/Synch_p2p/p2p 10 1024 1024
I0107 00:46:42.978490 81814 Allocator.hpp:185] Allocator is responsible for addresses from 0 to 0xeb860000
I0107 00:46:42.978878 81814 GlobalMemory.cpp:67] Initialized GlobalMemory with 3951427584 bytes of shared heap.
I0107 00:46:42.990459 81815 GlobalMemory.cpp:67] Initialized GlobalMemory with 3951427584 bytes of shared heap.
I0107 00:46:42.990805 81816 GlobalMemory.cpp:67] Initialized GlobalMemory with 3951427584 bytes of shared heap.
I0107 00:46:43.006459 81817 GlobalMemory.cpp:67] Initialized GlobalMemory with 3951427584 bytes of shared heap.
I0107 00:46:43.009131 81814 Grappa.cpp:647]
-------------------------
Shared memory breakdown:
node total: 29.4405 GB
locale shared heap total: 14.7202 GB
locale shared heap per core: 3.68006 GB
communicator per core: 0.125 GB
tasks per core: 0.0156631 GB
global heap per core: 0.920013 GB
aggregator per core: 0.00247955 GB
shared_pool current per core: 4.76837e-07 GB
shared_pool max per core: 0.920015 GB
free per locale: 10.475 GB
free per core: 2.61876 GB
-------------------------
Parallel Research Kernels version 2.16
Grappa pipeline execution on 2D grid
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 81815 RUNNING AT testing-worker-linux-docker-8535467c-3182-linux-2
= EXIT CODE: 135
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Bus error (signal 7)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
The text was updated successfully, but these errors were encountered:
You can get some memory related options with $YOUR_APPLICATION_COMMAND --help | grep global\|locale\|heap\|memory -A1
You may want to mess with the --global_heap_fraction option. If your configuration is 4cores(ie grappa processes) on 1 node, it does look like all 3.6 GB are being assigned to the global heap (0.9 * 4)
There are three main flags that control the way the node memory gets divided up. The main one is
--locale_shared_fraction (Fraction of total node memory to allocate for
Grappa) type: double default: 0.5
There are a couple other other pools that are allocated out of that locale shared heap, and they are controlled with
-global_heap_fraction (Fraction of locale shared memory to set aside for
global shared heap) type: double default: 0.25
-shared_pool_memory_fraction (Fraction of locale shared heap to use for
shared pool) type: double default: 0.25
I would suggest first setting ---locale_shared_fraction=0.25 or slightly less and see what happens. We did not design for oversubscription, though, so we may not have exposed all the necessary flags. I can take a look in a couple days if this doesn't work for you immediately.
(This is another thing I hope to simplify this month)
Travis CI containers only give me 4 GB of memory. Grappa is allocating all of that in the shared heap (or 4x that if the printout is the per-thread amount).
I read through source and cannot see how to set this myself. Is that supported?
I guess I can run only 2 procs, but I like testing on 4 even though this is oversubscribing, because oversubscribing seems to release more bugs in parallel runtimes.
The text was updated successfully, but these errors were encountered: