Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch timeseries analysis in slurm #120

Merged
merged 29 commits into from
Nov 28, 2023
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
c530c9f
working lotus script for single time series
ledm Nov 24, 2023
5fb37d9
created parrellel_timeseries analysis
ledm Nov 24, 2023
93a4a1a
working here.
ledm Nov 24, 2023
9a25b7e
Update .gitignore
ledm Nov 24, 2023
da0f257
testing sbatch
ledm Nov 24, 2023
19465fc
working and debuggin.
ledm Nov 24, 2023
70f7bee
Merge branch 'dev_parrallel_ts' of github.com:valeriupredoi/bgcval2 i…
ledm Nov 24, 2023
b8da78c
debugs and tests and prints.
ledm Nov 24, 2023
89f113a
Merge remote-tracking branch 'origin/main' into dev_parrallel_ts
ledm Nov 24, 2023
860a6b3
working fixes
ledm Nov 24, 2023
b18cb8e
on-going TF runs
ledm Nov 24, 2023
7e92e4a
Added end of command print"
ledm Nov 24, 2023
da23923
renamed parrallel to batch as per decreed by @valeriupredoi
ledm Nov 27, 2023
431b54d
Update bgcval2/batch_timeseries.py
ledm Nov 27, 2023
1825a3b
local changes.
ledm Nov 27, 2023
489475b
added command line arguments for output.
ledm Nov 27, 2023
1c567e9
added documentation.
ledm Nov 27, 2023
7780674
reverted text deletion.
ledm Nov 27, 2023
923154b
Update README.md
ledm Nov 27, 2023
b759b33
add test to new cmd line func
valeriupredoi Nov 27, 2023
841eeb5
add GA call to cmd line func
valeriupredoi Nov 27, 2023
6a19048
Merge branch 'dev_parrallel_ts' of https://github.com/valeriupredoi/b…
valeriupredoi Nov 27, 2023
07ecc30
additional doc
ledm Nov 27, 2023
7d92af4
Changed logo
ledm Nov 27, 2023
4f82bfd
changed comments
ledm Nov 28, 2023
31cc3c4
Update README.md
ledm Nov 28, 2023
25b7f9c
Update README.md
ledm Nov 28, 2023
c0aee37
Update README.md
ledm Nov 28, 2023
355c8c4
Update README.md
ledm Nov 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,7 @@ local_test/BGC_data/valeriu
mass_scripts
CompareReports2
.idea/workspace.xml
*.iml
.idea/inspectionProfiles/profiles_settings.xml
.idea/misc.xml
.idea/vcs.xml
valeriupredoi marked this conversation as resolved.
Show resolved Hide resolved
32 changes: 32 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ pip install -e .[develop]
Test that the tool has been installed correctly with:
```
analysis_compare -h
compare -h
ledm marked this conversation as resolved.
Show resolved Hide resolved
```
which should print the module information and instructions on how to run the tool.

Expand Down Expand Up @@ -115,6 +116,7 @@ Executable name | What it does | Command
`bgcval` | runs time series and point to point. | bgcval jobID
`bgcval2_make_report` | makes the single model HTML report. | bgcval2_make_report jobID
`analysis_compare` | runs comparison of multiple single jobs | analysis_compare
`batch_timeseries` | Submits single job time series analysis to slurm | batch_timeseries


### Checking out development branches
Expand Down Expand Up @@ -319,6 +321,36 @@ then the report will appear on the [JASMIN public facing page](https://gws-acces
which is public facing but password protected.


Batch times series Analysis
===========================

The `batch_timeseries` tool can take an `analysis_compare` input yaml file,
and instead of running the time series analysis for each job on
the interactive shell terminal in series, it uses slurm to submit
each job as an independent job.

On jasmin, users can run up to five jobs simulataneously,
so this can singnificantly boost the speed of the analysis.

The command to run it is:
```
batch_timeseries - y comparison_recipe.yml
```

There is also an optional flag `-d` or `--dry_run` to test the batch_timeseries,
which outputs the submission command to screen but does not submit the jobs.

Note that this task does not run the `analysis_compare` suite so it will
not generate the html report. However, the html report can be generated more quickly
with the `-s` skip the `analysis_timeseries` section
ledm marked this conversation as resolved.
Show resolved Hide resolved
described above.

In addition, note that this will not run the `download_from_mass`
script, so jobs added here will not be included in the automated download.
However, these jobs are added for automated download when `analysis_compare`
is used.


Downloading data using MASS
===========================

Expand Down
3 changes: 2 additions & 1 deletion bgcval2/analysis_timeseries.py
Original file line number Diff line number Diff line change
Expand Up @@ -733,7 +733,8 @@ def applyLandMask1e3(nc, keys):
gridFile=av[name]['gridFile'],
clean=False,
)

print("analysis_timeseries:\tINFO:\tEnd of the timeseries analysis", jobID, suites)


def get_args():
"""Parse command line arguments. """
Expand Down
152 changes: 152 additions & 0 deletions bgcval2/batch_timeseries.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
#!/usr/bin/env python
#
# Copyright 2015, Plymouth Marine Laboratory
#
# This file is part of the bgc-val library.
#
# bgc-val is free software: you can redistribute it and/or modify it
# under the terms of the Revised Berkeley Software Distribution (BSD) 3-clause license.

# bgc-val is distributed in the hope that it will be useful, but
# without any warranty; without even the implied warranty of merchantability
# or fitness for a particular purpose. See the revised BSD license for more details.
# You should have received a copy of the revised BSD license along with bgc-val.
# If not, see <http://opensource.org/licenses/BSD-3-Clause>.
#
# Address:
# Plymouth Marine Laboratory
# Prospect Place, The Hoe
# Plymouth, PL1 3DH, UK
#
# Email:
# [email protected]
#
"""
.. module:: batch_timeseries
:platform: Unix
:synopsis: A script to submit slurm scripts time series.

.. moduleauthor:: Lee de Mora <[email protected]>

"""
import argparse
import subprocess
import os
import sys

from getpass import getuser

from bgcval2.analysis_compare import load_comparison_yml


def get_args():
"""Parse command line arguments."""
parser = argparse.ArgumentParser(
description=__doc__,
formatter_class=argparse.RawDescriptionHelpFormatter)

parser.add_argument('-y',
'--compare_yml',
nargs='+',
type=str,
help='One or more Comparison Analysis configuration file, for examples see bgcval2 input_yml directory.',
required=True,
)

parser.add_argument('-c',
'--config-file',
default=os.path.join(os.path.dirname(os.path.realpath(__file__)),
'default-bgcval2-config.yml'),
help='User configuration file (for paths).',
required=False)

parser.add_argument('--dry_run',
'-d',
default=False,
help='When True: Do not submit the jobs to lotus.',
action=argparse.BooleanOptionalAction,
required=False)

args = parser.parse_args()
return args


def submits_lotus(compare_yml, config_user, dry_run=False):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not actually using config_user here, but I'm keeping it for completeness, just in case.

"""
Loads the yaml file and submits individual time series to sbatch.
"""
# Load details from yml file
details = load_comparison_yml(compare_yml)

# list of job IDS
jobs = details['jobs']

# username
user = getuser()

# Load current on-going list of this users slurm jobs:
out = str(subprocess.check_output(["squeue", "--user="+user]))

# loop over jobs:
for job in jobs:
# Check whether there's already a job running for this jobID
if out.find(job) > -1:
print("That job exists already: skipping", job)
continue

# Get list of suites for each job
suites = details['suites'][job]

# Make it a list:
if isinstance(suites, str):
suites = suites.split(' ')

# prepare the command
command_txt = ['sbatch',
'-J', job,
''.join(['--error=logs/', job,'.err']),
''.join(['--output=logs/', job,'.out']),
'lotus_timeseries.sh', job]
for suite in suites:
command_txt.append(suite)

# Send it!
if dry_run:
print('Not submitting (dry-run):', ' '.join(command_txt))
else:
# Submit job:
print('Submitting:', ' '.join(command_txt))
#command1 = subprocess.Popen(command_txt)
command1 = subprocess.Popen(
command_txt,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
)


def main():

"""Run the main routine."""
args = get_args()

# This has a sensible default value.
config_user=args.config_file

# This shouldn't fail as it's a required argument.
compare_ymls = args.compare_yml

for compare_yml in compare_ymls:
print(f"analysis_timeseries: Comparison config file {compare_yml}")

if not os.path.isfile(compare_yml):
print(f"analysis_timeseries: Could not find comparison config file {compare_yml}")
sys.exit(1)
dry_run = args.dry_run
submits_lotus(compare_yml, config_user, dry_run)


if __name__ == "__main__":
from ._version import __version__
print(f'BGCVal2: {__version__}')
main()

2 changes: 1 addition & 1 deletion bgcval2/timeseries/timeseriesAnalysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ def loadModel(self):
if self.debug: print("timeseriesAnalysis:\tloadModel.")
####
# load and calculate the model info
if glob.glob(self.shelvefn):
if glob.glob(self.shelvefn+'*'): # shelve files have .bak .dat .dir files now
sh = shOpen(self.shelvefn)
print('seems fine:', self.shelvefn)
sh = shOpen(self.shelvefn)
Expand Down
Loading
Loading