The manual for the DRMAA for Simple Linux Utility for Resource Management (SLURM) is available at project wiki: http://apps.man.poznan.pl/trac/slurm-drmaa/wiki
- master - slurm-drmaa 1.0.7 (from source tarball, not
SVN) with mostly-complete Slurm multicluster
-M
/--clusters
support (it is possible to specify 1 cluster with this option, but not multiple, see Limitations below). - slurm-drmaa-1.2.0 - slurm-drmaa 1.2.0 from SVN, no modifications
- slurm-drmaa-1.2.0-clusters - slurm-drmaa 1.2.0 with the same cluster modifications as in the master branch.
- slurm-drmaa-1.2.0-multisubmit -
slurm-drmaa 1.2.0 with full multicluster submission (i.e.
sbatch --clusters=cluster1,cluster2
) functionality (See Limitations below) - slurm-drmaa-1.2.0-multisubmit-15.08 -
slurm-drmaa 1.2.0 with full multicluster submission (i.e.
sbatch --clusters=cluster1,cluster2
) functionality, requires Slurm 15.08 or later.
- Get the features in the various branches enabled by libslurm version (at runtime?) and merged back to a single branch.
- Reimport the upstream repository from PSNC with full attributed SVN history
I added a slurm-drmaa.spec
file in the master branch (but it can be used with any branch) for use with rpmbuild
to create an RPM. My process (on CentOS 6) is:
% ./autogen.sh
% make distclean
% find . -type d -name autom4te.cache -print0 | xargs -0 rm -rf
% ragel -o drmaa_utils/drmaa_utils/timedelta.c drmaa_utils/drmaa_utils/timedelta.rl
% cp slurm-drmaa.spec ~/rpmbuild/SPECS
% cd ..
% tar zcf slurm-drmaa-1.2.0.tar.gz --exclude=.git\* slurm-drmaa-1.2.0
% cp slurm-drmaa-1.2.0.tar.gz ~/rpmbuild/SOURCES
% rpmbuild -bb ~/rpmbuild/SPECS/slurm-drmaa.spec
Multicluster support will not work on a standard Slurm installation prior to
Slurm 14.11. slurm-drmaa needs access to Slurm's working_cluster_rec
global
in libslurmdb
, which was not extern/public in older versions. However, older
versions can be used if you compile a new libslurmdb.so
. To do so, apply the
following patch to the Slurm source, reconfigure, and recompile:
--- src/db_api/Makefile.am.orig 2014-05-06 11:24:19.000000000 -0500
+++ src/db_api/Makefile.am 2014-10-10 11:48:12.730845279 -0500
@@ -95,6 +95,7 @@
(echo "{ global:"; \
echo " slurm_*;"; \
echo " slurmdb_*;"; \
+ echo " working_cluster_rec;"; \
echo " local: *;"; \
echo "};") > $(VERSION_SCRIPT)
--- src/db_api/Makefile.in.orig 2014-05-06 11:24:19.000000000 -0500
+++ src/db_api/Makefile.in 2014-10-10 11:48:22.765938016 -0500
@@ -915,6 +915,7 @@
(echo "{ global:"; \
echo " slurm_*;"; \
echo " slurmdb_*;"; \
+ echo " working_cluster_rec;"; \
echo " local: *;"; \
echo "};") > $(VERSION_SCRIPT)
In my instance I then copied the resulting .libs/libslurdb*.so*
to
slurm-drmaa's lib directory and configured slurm-drmaa with
LDFLAGS='-Wl,-rpath=/path/to/slurm-drmaa/lib' ./configure
, but putting
libslurmdb
elsewhere and/or simply setting $LD_LIBRARY_PATH
at application
runtime also work.
Note that root privileges are not required for this to work. The canonical copy of libslurmdbd.so does not need to be modified, only the one that libdrmaa links to at runtime.
Multiple clusters specified in a single -M
/--clusters
option (e.g. -M cluster1,cluster2
) are not supported in the master
and
slurm-drmaa-1.2.0-clusters
branches.
However, support for it has been hacked in to the
slurm-drmaa-1.2.0-multisubmit
branch. From the commit message:
This is fairly hacky because the functionality to do this properly is not available via Slurm's public API. And:
- I wasn't going to take the time to figure out the mess of private headers I had to extract from the slurm source and include here, so a copy of the Slurm source is required at compile time.
- I have about 1 hours' worth of experience with autoconf/automake at this point, so the changes I made there are crude.
- I'm not going to put a lot of effort into making this nicer since the Slurm development team has put implementing the features necessary for doing this properly on their roadmap for Slurm 15.08: http://bugs.schedmd.com/show_bug.cgi?id=1234
That said, once compiled, it will work with a standard Slurm 14.11 (or earlier
versions with the working_cluster_rec
patch).
Proper support for multiple cluster submission with Slurm 15.08 and later is
available in the slurm-drmaa-1.2.0-multicluster-15.08
branch.
Because DRMAA does not provide a means for reporting back which cluster is
selected, I've chosen to modify the format of the Job ID returned by the submit
function. If the native specification does not contain -M
/--clusters
, Job
IDs are numeric as before. If -M/--clusters
is passed, the Job ID is appended
with a '.'
, followed by the cluster name to which the job was submitted (e.g.
42.cluster1
). The functions which check job state and perform job controls
will also accept Job IDs in this format.