You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Global ECMWF Fire Forecasting (GEFF) system, implemented in Fortran 90, is based on empirical models conceptualised several decades back. Recent GIS & Machine Learning advances could, theoretically, be used to boost these models' performance or completely replace the current forecasting system. However thorough benchmarking is needed to compare GEFF to Deep Learning based prediction techniques.
20
+
21
+
The project intends to reproduce the Fire Forecasting capabilities of GEFF using Deep Learning and develop further improvements in accuracy, geography and time scale through inclusion of additional variables or optimisation of model architecture & hyperparameters. Finally, a preliminary fire spread prediction tool is proposed to allow monitoring activities.
2
22
3
23
## TL; DR
4
24
5
-
This codebase (and this README) is a work-in-progress. We are constantly refactoring and introducing breaking changes. Here's a quick few steps that *just work* to get going:
25
+
This codebase (and this README) is a work-in-progress. The `master` is a stable release and we aim to address issues and introduce enhancements on a rolling basis. If you encounter a bug, please [file an issue](https://github.com/esowc/wildfire-forecasting/issues/new). Here are a quick few pointers that *just work* to get you going with the project:
26
+
27
+
* Clone & navigate into the repo and create a conda environment using `environment.yml` on Ubuntu 18.04 and 20.04 only.
28
+
* All EDA and Inference notebooks must be run within this environment. Use `conda activate wildfire-dl`
29
+
* Check out the EDA notebooks titled [`EDA_XXX_mini_sample.ipynb`](data/EDA). We recommend `jupyterlab`.
30
+
* Check out the Inference notebook titled [`Inference_4_10.ipynb`](examples/Inference_4_10.ipynb).
31
+
* The notebooks also include code to download mini-samples of the dataset (`~1.5GiB`).
6
32
7
-
* Clone the repo and create a conda environment using `minimal_environment.yml` on Ubuntu 18.04 and 20.04 only.
8
-
* Check the EDA notebooks titled [`EDA_XXX_mini_sample.ipynb`](data/EDA). We recommend `jupyterlab`.
9
-
* The notebooks also include code to download mini-samples of the dataset (`~17GiB`).
10
-
* Check the Inference notebook titled [`Sample_Inference_4_10.ipynb`](examples/Sample_Inference_4_10.ipynb).
33
+
**Next:**
11
34
12
-
For a deeper dive, read the instructions below or head straight to [`Code_Structure_Overview.md`](Code_Structure_Overview.md) and then explore your way around [`train.py`](src/train.py), [`test.py`](src/test.py), [`dataloader/`](src/dataloader/) and [`model/`](src/model/).
35
+
* See [Getting Started](#getting-started) for how to set up your local environment for training or inference
36
+
* For a detailed description of the project codebase, check out the [Code_Structure_Overview](Code_Structure_Overview.md)
37
+
* Read the [Running Inference](#running-inference) section for testing pre-trained models on sample data.
38
+
* See [Implementation Overview](#implementation-overview) for details on tools & frameworks and how to retrain the model.
13
39
14
40
The work-in-progress documentation can be viewed online on [wildfire-forecasting.readthedocs.io](https://wildfire-forecasting.readthedocs.io/en/latest/).
While we have included support for launching the repository in [](https://mybinder.org/v2/gh/esowc/wildfire-forecasting/master), the limited memory offered by Binder means that you might end up with crashed/dead kernels while trying to test the `Inference` or the `Forecast` notebooks. At this point, we don't have a workaround for this issue.
Once you have cloned and navigated into the repository, you can set up a development environment using either `conda` or `docker`. Refer to the relevant instructions below and then skip to the next section on [Running Inference](#running-inference)
56
+
57
+
### Using conda
58
+
59
+
To create the environment, run:
20
60
21
-
***Using conda**: To create the environment, run
22
-
<br> `conda env create -f environment.yml`
23
-
<br> `conda activate wildfire-dl`
61
+
```bash
62
+
conda env create -f environment.yml
63
+
conda clean -a
64
+
conda activate wildfire-dl
65
+
```
66
+
67
+
>The setup is tested on Ubuntu 18.04, 20.04 and Windows 10 only. On systems with CUDA supported GPU and CUDA drivers set up, the conda environment and the code ensure that GPUs are used by default for training and inference. If there isn't sufficient GPU memory, this will typically lead to Out of Memory Runtime Errors. As a rule of thumb, around 4 GiB GPU memory is needed for inference and around 12 GiB for training.
68
+
69
+
### Using Docker
24
70
25
-
***Using docker**: Docker support is experimental. To create the image and container, run
26
-
<br> `docker build -t deepfwi .`
27
-
<br> `docker docker run -it deepfwi`
71
+
We include a `Dockerfile` & `docker-compose.yml` and provide detailed instructions for setting up your development environment using Docker for training on both CPUs and GPUs. Please head over to the [Docker README](docker/README.md) for more details.
28
72
29
-
>The setup is tested on Ubuntu 18.04 only and will not work on any non-Linux systems. See [this](https://github.com/conda/conda/issues/7311) issue for further details.
30
73
## Running Inference
74
+
31
75
***Examples**:<br>
32
-
The [inference_2_1.ipynb](examples/inference_2_1.ipynb) and [inference_4_10.ipynb](examples/inference_4_10.ipynb) notebooks demonstrate the end-to-end procedure of loading data, creating model from saved checkpoint, and getting the predictions for 2 day input, 1 day output; and 4 day input, 10 day output experiments respectively.
76
+
The [Inference_2_1.ipynb](examples/Inference_2_1.ipynb) and [Inference_4_10.ipynb](examples/Inference_4_10.ipynb) notebooks demonstrate the end-to-end procedure of loading data, creating model from saved checkpoint, and getting the predictions for 2 day input, 1 day output; and 4 day input, 10 day output experiments respectively.
33
77
***Testing data**:<br>
34
-
Ensure the access to fwi-forcings and fwi-reanalysis data.
35
-
***Obtain pre-trained model**:<br>
36
-
Place the model checkpoint file somewhere in your system and note the filepath.
37
-
* Checkpoint file for 2 day input, 1 day FWI prediction is available [here](src/model/checkpoints/pre_trained/2_1/epoch_41_100.ckpt)
38
-
* Checkpoint file for 4 day input, 10 day FWI prediction is available [here](src/model/checkpoints/pre_trained/4_10/epoch_99_100.ckpt)
78
+
Ensure the access to fwi-forcings and fwi-reanalysis data. Limited sample data is available at `gs://deepfwi-mini-sample` (Released for educational purposes only).
79
+
***Pre-trained model**:<br>
80
+
Pre-trained models are stored in [this](src/model/checkpoints/pre_trained) directory. Set the `$CHECKPOINT_FILE ` or pass the directory path through the argument.
39
81
***Run the inference script**:<br>
40
-
* Set `$FORCINGS_DIR` and `$REANALYSIS_DIR` or pass the directory paths through the arguments.
82
+
Set `$FORCINGS_DIR` and `$REANALYSIS_DIR` or pass the directory paths through the arguments.
We implement a modified U-Net style Deep Learning architecture using [PyTorch 1.6](https://pytorch.org/docs/stable/index.html). We use [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning) for code organisation and reducing boilerplate. The mammoth size of the total original dataset (~1TB) means we use extensive GPU acceleration in the code using [NVIDIA CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit). For a GeForce RTX 2080 with 12GB memory and 40 vCPUs with 110 GB RAM, this translates to a 25x speedup over using only 8 vCPUs with 52GB RAM.
91
+
92
+
For reading geospatial datasets, we use [`xarray`](http://xarray.pydata.org/en/stable/quick-overview.html) and [`netcdf4`](https://unidata.github.io/netcdf4-python/netCDF4/index.html). The [`imbalanced-learn`](https://imbalanced-learn.readthedocs.io/en/stable/under_sampling.html) library is useful for Undersampling to tackle the high data skew. Code-linting and formatting is done using [`black`](https://black.readthedocs.io/en/stable/) and [`flake8`](https://flake8.pycqa.org/en/latest/).
93
+
44
94
* The entry point for training is [src/train.py](src/train.py)
***Dataset**: We train our model on 1 year of global data. The `gs://deepfwi-mini-sample` dataset demonstrated in the various EDA and Inference notebooks are not intended for use with `src/train.py`. The scripts will fail if used with those small datasets. If you intend to re-run the training, reach out to us for access to a bigger dataset necessary for the scripts.
98
+
99
+
***Logging**: We use [Weights & Biases](https://www.wandb.com/) for logging our training. When running the training script, you can either provide a `wandb API key` or choose to skip logging altogether. W&B logging is free and lets you monitor your training remotely. You can sign up for an account and then use `wandb login` from inside the environment to supply the key.
46
100
47
101
* The entry point for inference is [src/test.py](src/test.py)
-dry-run False Use small amount of data for sanity check [Bool]
65
-
-case-study False The case-study region to use for inference: australia,california, portugal, siberia, chile, uk [Bool/str]
66
-
-clip-output False Limit the inference to the output values within supplied range (e.g. 0.5,60) [Bool/list]
67
-
-boxcox 0.1182 Apply boxcox transformation with specified lambda while training and the inverse boxcox transformation during the inference. [Bool/float]
68
-
-binned "0,5.2,11.2,21.3,38.0,50" Show the extended metrics for supplied comma separated binned FWI value range [Bool/list]
69
-
-undersample False Undersample the datapoints having smaller than specified FWI (e.g. -undersample=10) [Bool/float]
70
-
-round-to-zero False Round off the target values below the specified threshold to zero [Bool/float]
71
-
-date_range False Filter the data with specified date range. E.g. 2019-04-01,2019-05-01 [Bool/float]
72
-
-cb_loss False Use Class-Balanced loss with the supplied beta parameter [Bool/float]
73
-
-chronological_split False Do chronological train-test split in the specified ratio [Bool/float]
74
-
-model unet_tapered Model to use: unet, unet_downsampled, unet_snipped, unet_tapered, unet_interpolated [str]
75
-
-out fwi_reanalysis Output data for training: fwi_forecast or gfas_frp [str]
76
-
-smos_input False Use soil-moisture input data [Bool]
77
-
-forecast-dir ${FORECAST_DIR} Directory containing forecast data. Alternatively set $FORECAST_DIR [str]
78
-
-forcings-dir ${FORCINGS_DIR} Directory containing forcings data. Alternatively set $FORCINGS_DIR [str]
79
-
-reanalysis-dir ${REANALYSIS_DIR} Directory containing reanalysis data. Alternatively set $REANALYSIS_DIR [str]
-dry-run False Use small amount of data for sanity check [Bool]
120
+
-case-study False The case-study region to use for inference: australia,california, portugal, siberia, chile, uk [Bool/str]
121
+
-clip-output False Limit the inference to the output values within supplied range (e.g. 0.5,60) [Bool/list]
122
+
-boxcox 0.1182 Apply boxcox transformation with specified lambda while training and the inverse boxcox transformation during the inference. [Bool/float]
123
+
-binned "0,5.2,11.2,21.3,38.0,50" Show the extended metrics for supplied comma separated binned FWI value range [Bool/list]
124
+
-undersample False Undersample the datapoints having smaller than specified FWI (e.g. -undersample=10) [Bool/float]
125
+
-round-to-zero False Round off the target values below the specified threshold to zero [Bool/float]
126
+
-date_range False Filter the data with specified date range. E.g. 2019-04-01,2019-05-01 [Bool/float]
127
+
-cb_loss False Use Class-Balanced loss with the supplied beta parameter [Bool/float]
128
+
-chronological_split False Do chronological train-test split in the specified ratio [Bool/float]
129
+
-model unet_tapered Model to use: unet, unet_downsampled, unet_snipped, unet_tapered, unet_interpolated [str]
130
+
-out fwi_reanalysis Output data for training: gfas_frp or fwi_reanalysis [str]
131
+
-smos_input False Use soil-moisture input data [Bool]
132
+
-forecast-dir ${FORECAST_DIR} Directory containing forecast data. Alternatively set $FORECAST_DIR [str]
133
+
-forcings-dir ${FORCINGS_DIR} Directory containing forcings data. Alternatively set $FORCINGS_DIR [str]
134
+
-reanalysis-dir ${REANALYSIS_DIR} Directory containing reanalysis data. Alternatively set $REANALYSIS_DIR [str]
-mask src/dataloader/mask.npy File containing the mask stored as the numpy array [str]
137
+
-benchmark False Benchmark the FWI-Forecast data against FWI-Reanalysis [Bool]
138
+
-comment Comment of choice! Used for logging [str]
139
+
-checkpoint-file Path to the test model checkpoint [Bool/str]</pre>
85
140
86
141
* The [src/](src) directory contains the architecture implementation.
87
142
* The [src/dataloader/](src/dataloader) directory contains the implementation specific to the training data.
88
143
* The [src/model/](src/model) directory contains the model implementation.
89
144
* The [src/model/base_model.py](src/model/base_model.py) script has the common implementation used by every model.
90
145
91
146
* The [data/EDA/](data/EDA/) directory contains the Exploratory Data Analysis and Preprocessing required for each dataset demonstrated via Jupyter Notebooks.
* A walk-through of the codebase is in the [Code_Structure_Overview.md](Code_Structure_Overview.md).
101
153
@@ -109,3 +161,9 @@ make html
109
161
```
110
162
111
163
Once the docs get built, you can access them inside [`docs/build/html/`](docs/build/html/index.html).
164
+
165
+
## Acknowledgements
166
+
167
+
This project tackles [Challenge #26](https://github.com/esowc/challenges_2020/issues/10) from Stream 2: Machine Learning and Artificial Intelligence, as part of the [ECMWF Summer of Weather Code 2020](https://esowc.ecmwf.int/) Program.
168
+
169
+
Team: Roshni Biswas, Anurag Saha Roy, Tejasvi S Tomar.
0 commit comments