Skip to content

Commit c1c9619

Browse files
committed
Jupyter: don't allow overwriting data files, add documentation
Signed-off-by: Lance-Drane <[email protected]>
1 parent 69e477d commit c1c9619

File tree

3 files changed

+167
-0
lines changed

3 files changed

+167
-0
lines changed

doc/user_guides/jupyter.rst

+160
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,160 @@
1+
Jupyter
2+
=======
3+
4+
The IPS Framework supports automatically creating Juypter-based workflows. If the IPS simulation is executed on a platform with JupyterHub installed, you can automatically add notebooks and data to your JupyterHub directory.
5+
6+
**Configuration File**
7+
8+
The following variables are additional variables which are mandatory for an IPS simulation wanting to utilize the Jupyter workflow. They are required and do not utilize any default values.
9+
10+
*PORTAL_URL* - This should be the hostname of the IPS web portal you are interacting with (do not include any subpath). The IPS Portal will associate your run with a specific ID, which is used on JupyterHub/JupyterLab .
11+
12+
*JUPYTERHUB_DIR* - This is the base directory for your JupyterHub or JupyterLab web server. This MUST be an absolute directory.
13+
14+
*JUPYTERHUB_URL* - This is the base URL for your JupyterHub web server, i.e. "https://yourdomain.com/lab/tree/var/www/jupyterlab"
15+
16+
It is recommended that you configure *INPUT_DIR* as well, and place any notebook templates as IPS input files.
17+
18+
**Configuration File - NERSC specific information**
19+
20+
The IPS Framework is agnostic as to *specific* JupyterHub implementations, but at time of writing we expect most users will be running simulations and viewing Jupyter Notebooks on NERSC. Below is some specific information about NERSC:
21+
22+
*JUPYTERHUB_DIR* - You can generally just set this to ${PSCRATCH}, which is an environment variable pre-set on NERSC systems.
23+
24+
*JUPYTERHUB_URL* - You can usually just set this to https://jupyter.nersc.gov/user/${USER}/perlmutter-login-node-base/lab/tree${PSCRATCH} . Some notes on this:
25+
26+
- The "/user/${USER}" URL path authenticates through NERSC Shibboleth as ${USER} , so you will need to make sure that anyone who clicks on this URL can authenticate as the user or knows to replace the username with their own.
27+
- By default, the notebooks will be executed on the login nodes. If the notebooks should be executed on a different node, replace "perlmutter-login-node-base" with the appropriate node name.
28+
- The directory path after "/lab/tree" needs to have read and execute permissions for the NERSC Shibboleth user. For users to access the Jupyter Notebook through either JupyterHub OR directly on the server, you will have to manually `chmod 755` or `chmod 750` your $PSCRATCH/ipsframework/runs directory and set Unix group ownerships as necessary.
29+
30+
**Notebook Input File information**
31+
32+
You can load template notebooks in your input directory which can automatically generate analyses visible on a remote JupyterHub instance. The IPS Framework instance will copy your template notebook and add some initialization code in a new cell at the beginning.
33+
34+
In your template code, you can reference the variable `DATA_FILES` to load the current state mapping. This state mapping is a dictionary of timesteps (floating point) to filepaths of the data file.
35+
36+
**IPS Framework Usage**
37+
38+
In an IPS Component which only executes once, you should call:
39+
40+
.. code-block:: python
41+
42+
from ipsframework import Component
43+
44+
SOURCE_NOTEBOOK_NAME='base_notebook.ipynb'
45+
46+
class Driver(Component):
47+
def step(self, timestamp=0.0):
48+
# ...
49+
# assumes your notebooks are configured in the input directory
50+
# if you have an absolute path on the filesystem to your notebook, staging the input notebook is not required
51+
self.services.stage_input_files([SOURCE_NOTEBOOK_NAME])
52+
self.services.initialize_jupyter_notebook(
53+
dest_notebook_name='jupyterhub_visible_notebook.ipynb',
54+
source_notebook_path=SOURCE_NOTEBOOK_NAME,
55+
)
56+
# call self.services.initialize_jupyter_notebook for EACH notebook you want to initialize
57+
# ...
58+
59+
This code initializes JupyterHub to work with this run and contacts the web portal to associate a runid with this specific run.
60+
61+
---
62+
63+
For updating data files, we generally accomodate for two workflows: one where you want to add a data file for each timestep called, and one where you maintain a single data file but replace it per timestep call.
64+
65+
Data files will generally be derived from IPS state files.
66+
67+
For the workflow where multiple data files are maintained, the below code provides an example of loading it from a state file:
68+
69+
.. code-block:: python
70+
71+
import os
72+
from ipsframework import Component
73+
74+
class Monitor(Component):
75+
def step(self, timestamp=0.0):
76+
# ... get state file pathname
77+
self.services.add_analysis_data_file(
78+
current_data_file_path=state_file,
79+
new_data_file_name=f'{timestamp}_{os.path.basename(state_file)}',
80+
timestamp=timestamp,
81+
)
82+
83+
Or, if you only want to maintain a single data file:
84+
85+
.. code-block:: python
86+
87+
import os
88+
from ipsframework import Component
89+
90+
class Monitor(Component):
91+
def step(self, timestamp=0.0):
92+
# ... get state file pathname
93+
self.services.add_analysis_data_file(
94+
current_data_file_path=state_file,
95+
new_data_file_name=os.path.basename(state_file)',
96+
replace=True,
97+
)
98+
99+
Note that if you attempt to overwrite an existing data file without setting `replace=True`, a ValueError will be raised.
100+
101+
**JupyterHub Filesystem Notes**
102+
103+
Inside of ${JUPYTERHUB_DIR}/ipsframework/runs, a directory structure may look like this:
104+
105+
.. code-block:: bash
106+
107+
.
108+
├── https://example-portal-url.com
109+
└── https://lb.ipsportal.development.svc.spin.nersc.org/
110+
├── 1
111+
│ ├── basic.ipynb
112+
│ ├── bokeh-plots.ipynb
113+
│ ├── data
114+
│ │ ├── 10.666666666666666_state.json
115+
│ │ ├── 1.0_state.json
116+
│ │ ├── 11.633333333333333_state.json
117+
│ │ ├── 12.6_state.json
118+
│ │ ├── 13.566666666666666_state.json
119+
│ │ ├── 14.533333333333333_state.json
120+
│ │ ├── 15.5_state.json
121+
│ │ ├── 16.46666666666667_state.json
122+
│ │ ├── 17.433333333333334_state.json
123+
│ │ ├── 18.4_state.json
124+
│ │ ├── 19.366666666666667_state.json
125+
│ │ ├── 1.9666666666666668_state.json
126+
│ │ ├── 20.333333333333332_state.json
127+
│ │ ├── 21.3_state.json
128+
│ │ ├── 22.266666666666666_state.json
129+
│ │ ├── 23.233333333333334_state.json
130+
│ │ ├── 24.2_state.json
131+
│ │ ├── 25.166666666666668_state.json
132+
│ │ ├── 26.133333333333333_state.json
133+
│ │ ├── 27.1_state.json
134+
│ │ ├── 28.066666666666666_state.json
135+
│ │ ├── 29.033333333333335_state.json
136+
│ │ ├── 2.9333333333333336_state.json
137+
│ │ ├── 30.0_state.json
138+
│ │ ├── 3.9_state.json
139+
│ │ ├── 4.866666666666667_state.json
140+
│ │ ├── 5.833333333333333_state.json
141+
│ │ ├── 6.8_state.json
142+
│ │ ├── 7.766666666666667_state.json
143+
│ │ ├── 8.733333333333334_state.json
144+
│ │ └── 9.7_state.json
145+
│ └── data_listing.py
146+
├── 2
147+
│ ├── basic.ipynb
148+
| ├── data
149+
│ │ └── 0.0_state.json
150+
│ └── data_listing.py
151+
├── api_v1_notebook.ipynb
152+
└── api_v1.py
153+
154+
- The IPS Framework will only modify files inside of ${JUPYTERHUB_DIR}/ipsframework/runs/
155+
- From this directory, runs are divided by specific web portal hostnames, as runids are determined by a web portal.
156+
- From the ${JUPYTERHUB_DIR}/ipsframework/runs/${PORTAL_URL} directory, the directory tree will continue based on runids. Note that files titled `api_v*.py` and `api_v*_notebook.ipynb` will be added to this directory as well. These files may potentially be overwritten by the framework, but should always be done so in a backwards compatible manner.
157+
- From the ${JUPYTERHUB_DIR}/ipsframework/runs/${PORTAL_URL}/${RUNID} directory, a few additional files will be added:
158+
- Notebooks generated from your input notebooks.
159+
- A `data_listing.py` Python module file which is imported from and which exports a dictionary containing a mapping of timesteps to state file names. Note that this file is likely to be modified during a run, do NOT change it yourself unless you're sure the run has been finalized.
160+
- A `data` directory which will contain all state files you added during the run. (Note that the state files are determined on the domain science side, and can be of any content-type, not just JSON.)

doc/user_guides/user_guides.rst

+4
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,9 @@ This directory has all of the user guides for using the IPS (see the component a
3838
:doc:`Using the IPS Portal<portal_guides>`
3939
How to setup simulation to use the IPS portal.
4040

41+
:doc:`Jupyter<jupyter>`
42+
How to setup Jupyter workflows for your simulation.
43+
4144
.. toctree::
4245
:maxdepth: 1
4346

@@ -49,4 +52,5 @@ This directory has all of the user guides for using the IPS (see the component a
4952
migration
5053
nersc_conda
5154
portal_guides
55+
jupyter
5256
dask

ipsframework/services.py

+3
Original file line numberDiff line numberDiff line change
@@ -1956,6 +1956,7 @@ def add_analysis_data_file(self, current_data_file_path: str, new_data_file_name
19561956
- new_data_file_name: name of the new data file (relative to Jupyterhub data directory, should be unique per run)
19571957
- timestamp: label to assign to the data (currently must be a floating point value)
19581958
- replace: If True, replace the last data file added with the new data file. If False, simply append the new data file. (default: False)
1959+
Note that if replace is not True but you attempt to overwrite it, a ValueError will be thrown.
19591960
"""
19601961
if not self._jupyterhub_dir:
19611962
if not self._init_jupyter():
@@ -1966,6 +1967,8 @@ def add_analysis_data_file(self, current_data_file_path: str, new_data_file_name
19661967
new_data_file_name = os.path.basename(new_data_file_name)
19671968

19681969
jupyter_data_file = os.path.join(self._jupyterhub_dir, 'data', new_data_file_name)
1970+
if not replace and os.path.exists(jupyter_data_file):
1971+
raise ValueError(f'Replacing existing filename {jupyter_data_file}, set replace to equal True in add_analysis_data_file if this was intended.')
19691972
# this may raise an OSError, it is the responsibility of the caller to handle it.
19701973
shutil.copyfile(current_data_file_path, jupyter_data_file)
19711974

0 commit comments

Comments
 (0)