Skip to content

Commit 3bc4324

Browse files
j-atkinsVeckoTheGeckopre-commit-ci[bot]erikvansebille
authored
v0.3 dev (#226)
Introduces substantial refactoring, new features, and documentation updates to VirtualShip. Overall, the changes aim to unify configuration, centralise instrument logic and overhaul data ingestion. ----- * Unify config files to expedition.yaml (#217) Consolidates/unifies the old dual ship_config.yaml and schedule.yaml config files into one expedition.yaml file, in line with v1 dev objectives. * Update link to website (#215) * first refactoring step, parent instrument classes * ignore refactoring notes in gitignore * add note to remove upon completing v1 dev * scratch inputdataset objects integration to _fetch * add call to download_data() * Add new instrument classes and update InputDataset to include depth parameters * Refactor instrument handling in _fetch and update imports for consistency * Refactor instrument classes and re-add (temporary) simulation functions across multiple files * improve/clarify comments and notes * Refactor ArgoFloat and XBT classes to include depth parameters and remove outdated comments * avoid circular import issues * make tests for InputDataset base class * refactor instrument handling in _fetch.py and update expedition model to include get_instruments method * refactor instrument error handling in Expedition model and remove Schedule and ShipConfig classes * add is_underway property to InstrumentType and filter instruments in plan UI * enhance CLI output for fetching * general fixes and new error class * refactor test cases to use Expedition object * move instruments base classes out of models/ dir * update base class imports * make get_instruments_registry more robust with testing * update mock reanalysis period and refactor tests to use expedition fixture * refactor: reorganize instrument classes and update imports for clarity * implement instrument registration and input dataset retrieval * refactor: reorganize imports in instrument test files for consistency * further refactoring: instrument classes to use a unified InputDataset and Instrument structure * evaporate simulate_measurements.py; centralise run logic * draft up check land using bathymetry * small bug fixes * patch copernicus product id search logic to new instrument classes, plus more debugging; verbose INFO is outstanding * adding U and V to instruments where missing * enhanced error messaging for XBT in too shallow regions * bug fixes * version U and V downloaded * dummy U and V * Neaten up logging output * small bug fixes * tidy up * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Refactor type hints and improve test coverage for instruments and utils * Remove TODO comments and tidy up imports in test files * Refactor bathymetry error handling, update verification methods and remove unused function * move product id selection logic to utils * update * first draft direct ingestion via copernicusmarine (CTD only) * refactor bathymetry data handling and add (temporary) timing for performance evaluation * update instrument constructors for Copernicus Marine ingestion * move expedition/do_expedition.py to cli/_run.py, rename Instrument.run() to Instrument.execute() * move checkpoint class to models, move expedition_cost() to utils.py * update imports for expedition_cost * working with drifters (bodge), CTD_BGC not yet working * remove fetch and all associated logic * update docstrings/--help info * add buffers to drifters and argos, plus depth limits for drifters * remove _creds.py * CTD_BGC fieldset bug fix * fixing bugs associated with BGC data access * update dependencies * logic for handling copernicus credentials * add support for taking local pre-downloaded data with --from-data optional flag in virtualship run * update drifter to be at -1m depth, to avoid out of bounds at surface * tidy up * bug fixes for unnecessary copernicusmarine call when using pre-downloaded data * remove redundant tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * update tests (not yet instrument subclasses) * update instrument tests * test is on land tests in schedule.verify() * Run pre-commit * update pre-download ingestion methods to take files split by time * fix bug * fix bug in ingesting bgc data from disk * tidy up * add test for data directory structure compliance * update docs * edits to docs * add more checks to docs compliance testing * TODO in readme * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update docs/user-guide/documentation/pre_download_data.md Co-authored-by: Erik van Sebille <[email protected]> * Apply suggestions from code review Co-authored-by: Erik van Sebille <[email protected]> * Set t_min to first day of month for monthly resolution Adjust t_min to the first day of the month based on schedule start date. * remove redundant parameters from instrument classes * change variable name * make _find_files_in_timerange standalone from Instrument base class * Update error docstring * update plan UI logic to update space-time region dynamically * fix xbt bug * add warnings to ADCP max depth config if exceeds authentic limits * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * error messaging for case where measurements cause schedule to be missed * revert to using ScheduleProblem class * add more informative messaging on ScheduleProblem * change test to mock using data from disk to avoid copernicus calls * remove accidental breakpoint --------- Co-authored-by: Nick Hodgskin <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Erik van Sebille <[email protected]>
1 parent 7cf63fe commit 3bc4324

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+4190
-4017
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@
3030
</tr>
3131
</table>
3232

33+
<!-- TODO: README needs updating for v1-dev! -->
34+
3335
<!-- Insert catchy summary -->
3436

3537
VirtualShipParcels is a command line simulator allowing students to plan and conduct a virtual research expedition, receiving measurements as if they were coming from actual oceanographic instruments including:

docs/user-guide/documentation/copernicus_products.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
VirtualShip supports running experiments anywhere in the global ocean from 1993 through to the present day (and approximately two weeks into the future), using the suite of products available from the [Copernicus Marine Data Store](https://data.marine.copernicus.eu/products).
44

5-
The data sourcing task is handled by the `virtualship fetch` command. The three products relied on by `fetch` to source data for all [VirtualShip instruments](https://virtualship.readthedocs.io/en/latest/user-guide/assignments/Research_proposal_intro.html#Measurement-Options) (both physical and biogeochemical) are:
5+
The data sourcing task is handled by the `virtualship run` command, which in turn relies on the [copernicusmarine toolbox](https://github.com/mercator-ocean/copernicus-marine-toolbox?tab=readme-ov-file) for 'streaming' data from the Copernicus Marine Data Store. The three products relied on in `run` to source data for all [VirtualShip instruments](https://virtualship.readthedocs.io/en/latest/user-guide/assignments/Research_proposal_intro.html#Measurement-Options) (both physical and biogeochemical) are:
66

77
1. **Reanalysis** (or "hindcast" for biogeochemistry).
88
2. **Renalysis interim** (or "hindcast interim" for biogeochemistry).
@@ -15,7 +15,7 @@ The Copernicus Marine Service describe the differences between the three product
1515
As a general rule of thumb the three different products span different periods across the historical period to present and are intended to allow for continuity across the previous ~ 30 years.
1616

1717
```{note}
18-
The ethos for automated dataset selection in `virtualship fetch` is to prioritise the Reanalysis/Hindcast products where possible (the 'work horse'), then _interim products where possible for continuity, and finally filling the very near-present (and near-future) temporal range with the Analysis & Forecast products.
18+
The ethos for automated dataset selection in `virtualship run` is to prioritise the Reanalysis/Hindcast products where possible (the 'work horse'), then _interim products where possible for continuity, and finally filling the very near-present (and near-future) temporal range with the Analysis & Forecast products.
1919
```
2020

2121
```{warning}
@@ -24,13 +24,13 @@ In the rare situation where the start and end times of an expedition schedule sp
2424

2525
### Data availability
2626

27-
The following tables summarise which Copernicus product is selected by `virtualship fetch` per combination of time period and variable (see legend below).
27+
The following tables summarise which Copernicus product is selected by `virtualship run` per combination of time period and variable (see legend below).
2828

2929
For biogeochemical variables `ph` and `phyc`, monthly products are required for hindcast and hindcast interim periods. For all other variables, daily products are available.
3030

3131
#### Physical products
3232

33-
| Period | Product ID | Temporal Resolution | Typical Years Covered | Variables |
33+
| Period | Dataset ID | Temporal Resolution | Typical Years Covered | Variables |
3434
| :------------------ | :--------------------------------------- | :------------------ | :---------------------------------- | :------------------------- |
3535
| Reanalysis | `cmems_mod_glo_phy_my_0.083deg_P1D-m` | Daily | ~30 years ago to ~5 years ago | `uo`, `vo`, `so`, `thetao` |
3636
| Reanalysis Interim | `cmems_mod_glo_phy_myint_0.083deg_P1D-m` | Daily | ~5 years ago to ~2 months ago | `uo`, `vo`, `so`, `thetao` |
@@ -40,7 +40,7 @@ For biogeochemical variables `ph` and `phyc`, monthly products are required for
4040

4141
#### Biogeochemical products
4242

43-
| Period | Product ID | Temporal Resolution | Typical Years Covered | Variables | Notes |
43+
| Period | Dataset ID | Temporal Resolution | Typical Years Covered | Variables | Notes |
4444
| :---------------------------- | :----------------------------------------- | :------------------ | :---------------------------------- | :-------------------------------- | :------------------------------------- |
4545
| Hindcast | `cmems_mod_glo_bgc_my_0.25deg_P1D-m` | Daily | ~30 years ago to ~5 years ago | `o2`, `chl`, `no3`, `po4`, `nppv` | Most BGC variables except `ph`, `phyc` |
4646
| Hindcast (monthly) | `cmems_mod_glo_bgc_my_0.25deg_P1M-m` | Monthly | ~30 years ago to ~5 years ago | `ph`, `phyc` | Only `ph`, `phyc` (monthly only) |
Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "a48322c9",
6+
"metadata": {},
7+
"source": [
8+
"# Example Copernicus data download \n",
9+
"\n",
10+
"This notebook provides a rough, non-optimised example of how to download Copernicus Marine data using the `copernicusmarine` Python package.\n",
11+
"\n",
12+
"This will download:\n",
13+
"- Global bathymetry data (static)\n",
14+
"- Global biogeochemical monthly data (0.25 degree hindcast)\n",
15+
"- Global physical daily data (0.25 degree reanalysis)\n",
16+
"\n",
17+
"For a singular year (2023) and two months (June and July).\n",
18+
"\n",
19+
"This notebook is intended as a basic example only. Modifications will be needed to adapt this to your own use case."
20+
]
21+
},
22+
{
23+
"cell_type": "code",
24+
"execution_count": null,
25+
"id": "7f5a7cc7",
26+
"metadata": {},
27+
"outputs": [],
28+
"source": [
29+
"import copernicusmarine\n",
30+
"import os\n",
31+
"from datetime import datetime"
32+
]
33+
},
34+
{
35+
"cell_type": "code",
36+
"execution_count": null,
37+
"id": "e7279d5a",
38+
"metadata": {},
39+
"outputs": [],
40+
"source": [
41+
"YEAR = \"2023\"\n",
42+
"MONTHS = [\"06\", \"07\"]\n",
43+
"DAYS = [\n",
44+
" \"01\",\n",
45+
" \"02\",\n",
46+
" \"03\",\n",
47+
" \"04\",\n",
48+
" \"05\",\n",
49+
" \"06\",\n",
50+
" \"07\",\n",
51+
" \"08\",\n",
52+
" \"09\",\n",
53+
" \"10\",\n",
54+
" \"11\",\n",
55+
" \"12\",\n",
56+
" \"13\",\n",
57+
" \"14\",\n",
58+
" \"15\",\n",
59+
" \"16\",\n",
60+
" \"17\",\n",
61+
" \"18\",\n",
62+
" \"19\",\n",
63+
" \"20\",\n",
64+
" \"21\",\n",
65+
" \"22\",\n",
66+
" \"23\",\n",
67+
" \"24\",\n",
68+
" \"25\",\n",
69+
" \"26\",\n",
70+
" \"27\",\n",
71+
" \"28\",\n",
72+
" \"29\",\n",
73+
" \"30\",\n",
74+
" \"31\",\n",
75+
"]"
76+
]
77+
},
78+
{
79+
"cell_type": "code",
80+
"execution_count": null,
81+
"id": "1a583dba",
82+
"metadata": {},
83+
"outputs": [],
84+
"source": [
85+
"### PHYSICAL DAILY FILES\n",
86+
"\n",
87+
"os.chdir(\"~/data/phys/\")\n",
88+
"DATASET_ID = \"cmems_mod_glo_phy-all_my_0.25deg_P1D-m\"\n",
89+
"\n",
90+
"for month in MONTHS:\n",
91+
" for day in DAYS:\n",
92+
" # check is valid date\n",
93+
" try:\n",
94+
" datetime(year=int(YEAR), month=int(month), day=int(day), hour=0)\n",
95+
" except ValueError:\n",
96+
" continue\n",
97+
"\n",
98+
" filename = f\"{DATASET_ID}_global_fulldepth_{YEAR}_{month}_{day}.nc\"\n",
99+
"\n",
100+
" if os.path.exists(filename):\n",
101+
" print(f\"File {filename} already exists, skipping...\")\n",
102+
" continue\n",
103+
"\n",
104+
" copernicusmarine.subset(\n",
105+
" dataset_id=DATASET_ID,\n",
106+
" variables=[\"uo_glor\", \"vo_glor\", \"thetao_glor\", \"so_glor\"],\n",
107+
" minimum_longitude=-180,\n",
108+
" maximum_longitude=179.75,\n",
109+
" minimum_latitude=-80,\n",
110+
" maximum_latitude=90,\n",
111+
" start_datetime=f\"{YEAR}-{month}-{day}T00:00:00\",\n",
112+
" end_datetime=f\"{YEAR}-{month}-{day}T00:00:00\",\n",
113+
" minimum_depth=0.5057600140571594,\n",
114+
" maximum_depth=5902.0576171875,\n",
115+
" output_filename=filename,\n",
116+
" )"
117+
]
118+
},
119+
{
120+
"cell_type": "code",
121+
"execution_count": null,
122+
"id": "89921772",
123+
"metadata": {},
124+
"outputs": [],
125+
"source": [
126+
"### BIOGEOCHEMICAL MONTHLY FILES\n",
127+
"\n",
128+
"os.chdir(\"~/data/bgc/\")\n",
129+
"DATASET_ID = \"cmems_mod_glo_bgc_my_0.25deg_P1M-m\"\n",
130+
"DAY = \"01\"\n",
131+
"\n",
132+
"for month in MONTHS:\n",
133+
" try:\n",
134+
" datetime(year=int(YEAR), month=int(month), day=int(DAY), hour=0)\n",
135+
" except ValueError:\n",
136+
" continue\n",
137+
"\n",
138+
" filename = f\"{DATASET_ID}_global_fulldepth_{YEAR}_{month}_{DAY}.nc\"\n",
139+
"\n",
140+
" if os.path.exists(filename):\n",
141+
" print(f\"File {filename} already exists, skipping...\")\n",
142+
" continue\n",
143+
"\n",
144+
" copernicusmarine.subset(\n",
145+
" dataset_id=\"cmems_mod_glo_bgc_my_0.25deg_P1M-m\",\n",
146+
" variables=[\"chl\", \"no3\", \"nppv\", \"o2\", \"ph\", \"phyc\", \"po4\"],\n",
147+
" minimum_longitude=-180,\n",
148+
" maximum_longitude=179.75,\n",
149+
" minimum_latitude=-80,\n",
150+
" maximum_latitude=90,\n",
151+
" start_datetime=f\"{YEAR}-{month}-{DAY}T00:00:00\",\n",
152+
" end_datetime=f\"{YEAR}-{month}-{DAY}T00:00:00\",\n",
153+
" minimum_depth=0.5057600140571594,\n",
154+
" maximum_depth=5902.05810546875,\n",
155+
" output_filename=filename,\n",
156+
" )"
157+
]
158+
},
159+
{
160+
"cell_type": "code",
161+
"execution_count": null,
162+
"id": "8b5495c6",
163+
"metadata": {},
164+
"outputs": [],
165+
"source": [
166+
"### BATHYMETRY FILE\n",
167+
"os.chdir(\"~/data/bathymetry/\")\n",
168+
"DATASET_ID = \"cmems_mod_glo_phy_anfc_0.083deg_static\"\n",
169+
"filename = \"cmems_mod_glo_phy_anfc_0.083deg_static_bathymetry.nc\"\n",
170+
"\n",
171+
"copernicusmarine.subset(\n",
172+
" dataset_id=DATASET_ID,\n",
173+
" dataset_part=\"bathy\",\n",
174+
" variables=[\"deptho\"],\n",
175+
" minimum_longitude=-180,\n",
176+
" maximum_longitude=179.91668701171875,\n",
177+
" minimum_latitude=-80,\n",
178+
" maximum_latitude=90,\n",
179+
" minimum_depth=0.49402499198913574,\n",
180+
" maximum_depth=0.49402499198913574,\n",
181+
" output_filename=filename,\n",
182+
")"
183+
]
184+
}
185+
],
186+
"metadata": {
187+
"language_info": {
188+
"name": "python"
189+
}
190+
},
191+
"nbformat": 4,
192+
"nbformat_minor": 5
193+
}
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Pre-downloading data
2+
3+
By default, VirtualShip will automatically 'stream' data from the Copernicus Marine Service via the [copernicusmarine toolbox](https://github.com/mercator-ocean/copernicus-marine-toolbox?tab=readme-ov-file). However, for users who wish to manage data locally, it is possible to pre-download the required datasets and feed them into VirtualShip simulations.
4+
5+
<!-- TODO: quickstart guide needs full update! -->
6+
7+
As outlined in the [Quickstart Guide](https://virtualship.readthedocs.io/en/latest/user-guide/quickstart.html), the `virtualship run` command supports an optional `--from-data` argument, which allows users to specify a local directory containing the necessary data files.
8+
9+
```{tip}
10+
See the [for example...](#for-example) section for an example data download workflow.
11+
```
12+
13+
### Data requirements
14+
15+
For pre-downloaded data, VirtualShip only supports daily and monthly resolution physical and biogeochemical data, along with a static bathymetry file.
16+
17+
In addition, all pre-downloaded data must be split into separate files per timestep (i.e. one .nc file per day or month).
18+
19+
```{note}
20+
**Monthly data**: when using monthly data, ensure that your final .nc file download is for the month *after* your expedition schedule end date. This is to ensure that a Parcels FieldSet can be generated under-the-hood which fully covers the expedition period. For example, if your expedition runs from 1st May to 15th May, your final monthly data file should be in June. Daily data files only need to cover the expedition period exactly.
21+
```
22+
23+
Further, VirtualShip expects pre-downloaded data to be organised in a specific directory & filename structure within the specified local data directory. The expected structure is as outlined in the subsequent sections.
24+
25+
#### Directory structure
26+
27+
Assuming the local data directory (as supplied in the `--from-data` argument) is named `data/`, the expected subdirectory structure is:
28+
29+
```bash
30+
.
31+
└── data
32+
├── bathymetry # containing the singular bathymetry .nc file
33+
├── bgc # containing biogeochemical .nc files
34+
└── phys # containing physical .nc files
35+
```
36+
37+
#### Filename conventions
38+
39+
Within these subdirectories, the expected filename conventions are:
40+
41+
- Physical data files (in `data/phys/`) should be named as follows:
42+
- `<COPERNICUS_DATESET_NOMENCLATURE>_<YYYY_MM_DD>.nc`
43+
- e.g. `cmems_mod_glo_phy-all_my_0.25deg_P1D-m_1998_05_01.nc` and so on for each timestep.
44+
- Biogeochemical data files (in `data/bgc/`) should be named as follows:
45+
- `<COPERNICUS_DATESET_NOMENCLATURE>_<YYYY_MM_DD>.nc`
46+
- e.g. `cmems_mod_glo_bgc_my_0.25deg_P1M-m_1998_05_01.nc` and so on for each timestep.
47+
- Bathymetry data file (in `data/bathymetry/`) should be named as follows:
48+
- `cmems_mod_glo_phy_anfc_0.083deg_static_bathymetry.nc`
49+
50+
```{tip}
51+
Take care to use an underscore (`_`) as the separator between date components in the filenames (i.e. `YYYY_MM_DD`).
52+
```
53+
54+
```{note}
55+
Using the `<COPERNICUS_DATESET_NOMENCLATURE>` in the filenames is vital in order to correctly identify the temporal resolution of the data (daily or monthly). The `P1D` in the example above indicates daily data, whereas `P1M` would indicate monthly data.
56+
57+
See [here](https://help.marine.copernicus.eu/en/articles/6820094-how-is-the-nomenclature-of-copernicus-marine-data-defined#h_34a5a6f21d) for more information on Copernicus dataset nomenclature.
58+
59+
See also our own [documentation](https://virtualship.readthedocs.io/en/latest/user-guide/documentation/copernicus_products.html#data-availability) on the Copernicus datasets used natively by VirtualShip when 'streaming' data if you wish to use the same datasets for pre-download.
60+
```
61+
62+
```{note}
63+
**Monthly data**: the `DD` component of the date in the filename for monthly .nc files should always be `01`, representing the first day of the month. This ensures that a Parcels FieldSet can be generated under-the-hood which fully covers the expedition period from the start.
64+
```
65+
66+
#### Further assumptions
67+
68+
The following assumptions are also made about the data:
69+
70+
1. All pre-downloaded data files must be in NetCDF format (`.nc`).
71+
2. Physical data files must contain the following variables: `uo`, `vo`, `so`, `thetao`
72+
- Or these strings must appear as substrings within the variable names (e.g. `uo_glor` is acceptable for `uo`).
73+
3. If using BGC instruments (e.g. `CTD_BGC`), the relevant biogeochemical data files must contain the following variables: `o2`, `chl`, `no3`, `po4`, `nppv`, `ph`, `phyc`.
74+
- Or these strings must appear as substrings within the variable names (e.g. `o2_glor` is acceptable for `o2`).
75+
4. Bathymetry data files must contain a variable named `deptho`.
76+
77+
#### Also of note
78+
79+
1. Whilst not mandatory to use data downloaded only from the Copernicus Marine Service (any existing data you may hold can be re-organised accordingly), the assumptions made by VirtualShip regarding directory structure and filename conventions are motivated by alignment with the Copernicus Marine Service's practices.
80+
- If you want to use pre-existing data with VirtualShip, which you may have accessed from a different source, it is possible to do so by restructuring and/or renaming your data files as necessary.
81+
2. The whole VirtualShip pre-downloaded data workflow should support global data or subsets thereof, provided the data files contain the necessary variables and are structured as outlined above.
82+
83+
#### For example...
84+
85+
Example Python code for automating the data download from Copernicus Marine can be found in [Example Copernicus Download](example_copernicus_download.ipynb).
86+
87+
<!-- TODO: replace with URL? -->

docs/user-guide/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,6 @@ assignments/index
1515
:maxdepth: 1
1616
1717
documentation/copernicus_products.md
18+
documentation/pre_download_data.md
19+
documentation/example_copernicus_download.ipynb
1820
```

0 commit comments

Comments
 (0)