Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-parameter runs #19

Open
lumlauf opened this issue Feb 16, 2021 · 12 comments
Open

Multi-parameter runs #19

lumlauf opened this issue Feb 16, 2021 · 12 comments

Comments

@lumlauf
Copy link
Collaborator

lumlauf commented Feb 16, 2021

Hi all,

I don't know if this is actually an "issue": GOTM is often used for multi-parameter runs. In this case, it is repeatedly called from some external software (Matlab, phyton, shell, ...) with slightly modified namelist parameters to cover a large parameter space step by step.

Perhaps we have this functionality already, and I just don't know? Perhaps it can be improved via the new yaml files?

Best, Lars

@bolding
Copy link
Collaborator

bolding commented Feb 16, 2021

Hi Lars

The parsac python package does that - and more on top. This is a very complete package that is used extensively by the lake users of GOTM. It can do sensitivity analysis and auto-calibration as well as ensemble simulations.

parsac supports both namelist files and yaml-files and also runs on HPC - ~50000 simulations are not un-common.

Karsten

@lumlauf
Copy link
Collaborator Author

lumlauf commented Feb 16, 2021 via email

@bolding
Copy link
Collaborator

bolding commented Feb 16, 2021

It requires some work to get to know the workflow but when you have made your configuration file (in .xml) everything else is done via executing parsac with different command line options. Non-technical biologists have broken the code.

But depending on the question you have it might be easier to just create 20 different gotm.yaml files and make 20 runs from a script.

You can ask Peter or Marvin for more information as they used the tool when we had a workshop in DK.

Qing has used Jupyter Notebook with python to manipulate gotm.yaml files, you can do the same in R. But if you really need to run many runs you need something that works in parallel.

@bolding
Copy link
Collaborator

bolding commented Mar 24, 2021

Picking this up again

Jorn and I have been working on a toll that will do (ensemble simulations) and data-assimilation using GOTM as the model. It will be a standalone tool using GOTM as a submodule - i.e. it will run with an unmodified vanilla version of GOTM. To facilitate that a few changes to GOTM will be nice to have - not need to have.

  1. GOTM writes to unit 0 - stderr and flexout write to stdout. Running only one instance GOTM will just write to the screen. Running in parallel it becomes a mess because all members write to the screen in an un-ordered way and re-directing to a file for each of the members makes a lot of sense. This can be done in two ways. 1) As part of GOTM core i.e. provide filenames in gotm.yaml - if length is 0 write to screen - if not open files. 2) Open the files outside core GOTM.

  2. The progress variable in GOTM must be a module level variable set in init_gotm() instead of only in time_loop()

  3. jul2 and secs2 are made public in time.F90. So a data-assimilation tool can know about the stop time.

  4. Are there any objections against renaming like: init_gotm() -> initialize_gotm(), time_loop() -> integrate_gotm() and clean_up() -> finalize_gotm()?

@jornbr
Copy link
Contributor

jornbr commented Mar 24, 2021

Hi Karsten,

I expect that the ensemble/DA functionality would be implemented by wrapping GOTM? (since it would require MPI, and GOTM itself should not depend on MPI). If so, I imagine that some of what you propose could be implemented in that wrapper, without touching GOTM itself. Point by point:

  1. I'd imagine that redirecting can be done with minimal changes to GOTM as long as it picks up the unit numbers to write to from some publicly accessible shared module (e.g. util). I do see why the names would have to go into gotm.yaml though. Why not use command line switches and/or some logic in the wrapper to change the unit numbers, pointing them to newly opened file? flexout already allows the host to provide custom fatal_error/log_message routines, so GOTM can make those use the same unit numbers (by implementing those routines as part of type_gotm_host).

  2. In principle, I guess you could manipulate MinN and MaxN from the wrapper (like GOTM-GUI does too) - and then only the wrapper might want to keep track of the original MinN and MaxN (i.e., no changes to GOTM core needed)?

  3. That's already the case, right? jul2, secs2 are public

  4. That seems a good idea to me. There could be a benefit in additionally splitting init_gotm into a configure_gotm and an initialize_gotm

Cheers,

Jorn

@bolding
Copy link
Collaborator

bolding commented Mar 24, 2021

add 1) - the way redirecting to a file now is by opening a file with e.g. unit=0 (or in modern Fortran error_unit). Now the redirection is done outside GOTM - in the worker wrapper. Question is if it should be moved into GOTM.

9 write(output_id, "(A,I0.4)") '', member
10 write(strbuf, "(A,I0.4)") 'gotm
', member
11 yaml_file = TRIM(strbuf) // '.yaml'
12 fname = TRIM(strbuf) // '.stderr'
13 open(error_unit,file=fname)
14 fname = TRIM(strbuf) // '.stdout'
15 open(output_unit,file=fname)
16 call init_gotm()

diagnostics and standard output could be done similar to yaml_file - i.e. directly setting a internal GOTM variable and let GOTM do the opening.

Introducing stderr and stdout as variables and assigning to error_unit and output_unit if filenames not specified otherwise get stderr and stdout via newunit call in open.

One advantage of re-directing directly to error_unit is that flushing is done.

So not 100% sure what is best.

add 2) - that is what is being done - server does date calculations, calulate MinN and MaxN and send them to workers where they are directly used in time_loop().

add 3) - it was jul1 and secs2 as it will allow observation_handler to skip observations before simulation start.

add 4) I'll do that in master directly as I only think you (Jorn) will have side effects :-)

@jornbr
Copy link
Contributor

jornbr commented Mar 24, 2021

Re 1, I'd go for the "Introducing stderr and stdout as variables and [initializing those] to error_unit and output_unit". And the wrapper could open files instead and assign their units to stdout and stderr (instead of reopening error_unit and output_unit). It keeps the GOTM core simpler. And redirect to file for a serial run can just be done on the command line as usual (e.g., gotm &> output.log) - no need for built-in support.

Regarding buffering - that seems compiler specific, with ifort for instance line buffering output_unit (good enough for us) - https://community.intel.com/t5/Intel-Fortran-Compiler/Enabling-buffered-I-O-to-stdout-with-Intel-ifort-compiler/td-p/993203. I'd write everything to stdout except error messages, like I think most tools would - https://en.wikipedia.org/wiki/Standard_stream.

@bolding
Copy link
Collaborator

bolding commented Mar 24, 2021

lets see if other have comments - otherwise I'll do the changes to GOTM soon.

@lumlauf
Copy link
Collaborator Author

lumlauf commented Mar 24, 2021

Hans and me discussed some planned applications with multiple instances of GOTM that may be related to this topic. We are collaborating with two groups that run atmospheric models. Looking for a simple representation of two-way atmosphere-ocean coupling, we thought about running an instance of GOTM underneath each grid point of the atmospheric model. GOTM and the atmospheric model would then feed-back via the atmosphere-ocean fluxes on each time step (or perhaps, if this turns out to be more efficient) only every couple of time steps. We are talking about order 10^4-10^5 grid points (= GOTM instances) to start with.

In view of the changes discussed above, are there any things worth considering already at this point? What are your thoughts about this?

Thanks!

@bolding
Copy link
Collaborator

bolding commented Mar 25, 2021

I think that requires a different concept. You don't really want to create and open 10^5 gotm.yaml files. Then ensemble runs the same setup in different configurations and only in order of 10^2. You want different setups (e.g. lat, lon) but only in one incarnation each.

  1. add necessary routines to the meteo model and let it handle 'everything'.

  2. Create a dedicated program that calls GOTM in a grid - let it interact with the meteo model via MPI.

I would personally go for 2.

@bolding
Copy link
Collaborator

bolding commented Mar 8, 2022

Just stumbled on this

As part of the development of the rewrite of GETM Jorn has actually done exactly what you ask for - i.e. running a (large) number of GOTM models on a grid. The 2-way coupling with the atmospheric model is still to be done.

@bolding
Copy link
Collaborator

bolding commented Jan 17, 2024

Back to the original question - EAT- https://github.com/BoldingBruggeman/eat/wiki can do exactly what you ask initially - i.e. run a large number of differently configured GOTM simulations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants