Merge branch 'carpentries-incubator:gh-pages' into workbench

carpentries-incubator · May 26, 2023 · 8c7d16d · 8c7d16d
2 parents ce98734 + b268587
commit 8c7d16d
Show file tree

Hide file tree

Showing 7 changed files with 74 additions and 71 deletions.
diff --git a/_episodes/05-access-data.md b/_episodes/05-access-data.md
@@ -1,7 +1,7 @@
 ---
 title: "Access satellite imagery using Python"
-teaching: 30
-exercises: 15
+teaching: 70
+exercises: 30
 questions:
 - "Where can I find open-access satellite data?"
 - "How do I search for satellite imagery with the STAC API?"

diff --git a/_episodes/06-raster-intro.md b/_episodes/06-raster-intro.md
@@ -1,7 +1,7 @@
 ---
 title: "Read and visualize raster data"
-teaching: 60
-exercises: 20
+teaching: 70
+exercises: 30
 questions:
 -  "How is a raster represented by rioxarray?"
 -  "How do I read and plot raster data in Python?"

diff --git a/_episodes/08-crop-raster-data.md b/_episodes/08-crop-raster-data.md
@@ -1,7 +1,7 @@
 ---
 title: "Crop raster data with rioxarray and geopandas"
-teaching: 40
-exercises: 20
+teaching: 70
+exercises: 30
 questions:
 - "How can I crop my raster data to the area of interest?"
 objectives:

diff --git a/_episodes/10-zonal-statistics.md b/_episodes/10-zonal-statistics.md
@@ -3,7 +3,7 @@ title: "Calculating Zonal Statistics on Rasters"
 teaching: 40
 exercises: 20
 questions:
-- "How to compute raster statistics on different zones delineated by a vector data?"
+- "How to compute raster statistics on different zones delineated by vector data?"
 objectives:
 - "Extract zones from the vector dataset"
 - "Convert vector data to raster"
@@ -19,76 +19,41 @@ keypoints:
 
 Statistics on predefined zones of the raster data are commonly used for analysis and to better understand the data. These zones are often provided within a single vector dataset, identified by certain vector attributes. For example, in the previous episodes, we used the crop field polygon dataset. The fields with the same crop type can be identified as a "zone", resulting in multiple zones in one vector dataset. One may be interested in performing statistical analysis over these crop zones.
 
-In this episode, we will explore how to calculate zonal statistics based on the types of crops in `cropped_field.shp` . To do this, we will first identify zones from the vector data, then rasterize these vector zones. Finally the zonal statistics for `ndvi` will be calculated over the rasterized zones.
+In this episode, we will explore how to calculate zonal statistics based on the types of crops in `fields_cropped.shp`. To do this, we will first identify zones from the vector data, then rasterize these vector zones. Finally the zonal statistics for `ndvi` will be calculated over the rasterized zones.
 
 
 # Making vector and raster data compatible
 First, let's load the `NDVI.tif` file saved in the previous episode to obtained our calculated raster `ndvi` data. We also use the `squeeze()` function in order to reduce our raster data `ndvi` dimension to 2D by removing the singular `band` dimension - this is necessary for use with the `rasterize` and `zonal_stats` functions:
 
 ~~~
 import rioxarray
-ndvi = rioxarray.open_rasterio("NDVI.tif")
-ndvi_sq = ndvi.squeeze()
-~~~
-
-Let's also read the crop fields vector data from our saved `cropped_field.shp` file and view the CRS information.
-
-~~~
-field = gpd.read_file('cropped_field.shp')
-field.crs
+ndvi = rioxarray.open_rasterio("NDVI.tif").squeeze()
 ~~~
 {: .language-python}
 
-~~~
-<Derived Projected CRS: EPSG:28992>
-Name: Amersfoort / RD New
-Axis Info [cartesian]:
-- X[east]: Easting (metre)
-- Y[north]: Northing (metre)
-Area of Use:
-- name: Netherlands - onshore, including Waddenzee, Dutch Wadden Islands and 12-mile offshore coastal zone.
-- bounds: (3.2, 50.75, 7.22, 53.7)
-Coordinate Operation:
-- name: RD New
-- method: Oblique Stereographic
-Datum: Amersfoort
-- Ellipsoid: Bessel 1841
-- Prime Meridian: Greenwich
-~~~
-{: .output}
+Let's also read the crop fields vector data from our saved `fields_cropped.shp` file.
 
-In order to use the vector data as a classifier for our raster, we need to convert the vector data to the appropriate CRS. We can perform the CRS conversion from the vector CRS (EPSG:28992) to our raster `ndvi` CRS (EPSG:32631) and view the data with:
 ~~~
-field_to_raster_crs = field.to_crs(ndvi.rio.crs)
-field_to_raster_crs
+import geopandas as gpd
+fields = gpd.read_file('fields_cropped.shp')
 ~~~
 {: .language-python}
 
+In order to use the vector data as a classifier for our raster, we need to convert the vector data to the appropriate CRS. We can perform the CRS conversion from the vector CRS (EPSG:28992) to our raster `ndvi` CRS (EPSG:32631) with:
 ~~~
-category	gewas	gewascode	jaar	status	geometry
-0	Grasland	Grasland, blijvend	265	2020	Definitief	POLYGON ((634234.009 5807461.338, 634232.049 5...
-1	Grasland	Grasland, blijvend	265	2020	Definitief	POLYGON ((634514.198 5807699.177, 634504.207 5...
-2	Grasland	Grasland, blijvend	265	2020	Definitief	POLYGON ((633115.463 5808493.238, 633109.078 5...
-3	Grasland	Grasland, blijvend	265	2020	Definitief	POLYGON ((634803.514 5808081.449, 634809.802 5...
-4	Grasland	Grasland, blijvend	265	2020	Definitief	POLYGON ((634184.289 5807370.958, 634200.036 5...
-...	...	...	...	...	...	...
-4867	Grasland	Grasland, blijvend	265	2020	Definitief	POLYGON ((631384.726 5809352.385, 631383.343 5...
-4868	Grasland	Grasland, blijvend	265	2020	Definitief	POLYGON ((635240.367 5806904.896, 635245.819 5...
-4869	Grasland	Grasland, blijvend	265	2020	Definitief	POLYGON ((636074.093 5816782.787, 636123.922 5...
-4870	Grasland	Grasland, tijdelijk	266	2020	Definitief	POLYGON ((627526.751 5816828.877, 627674.251 5...
-4871	Grasland	Grasland, natuurlijk. Hoofdfunctie landbouw.	331	2020	Definitief	POLYGON ((642317.485 5813024.516, 642326.058 5...
-4872 rows × 6 columns
+# Uniform CRS
+fields_utm = fields.to_crs(ndvi.rio.crs)
 ~~~
-{: .output}
+{: .language-python}
 
-# Rasterizing our vector data
+# Rasterizing the vector data
 
-Before calculating zonal statistics, we first need to rasterize our `field_to_raster_crs` vector geodataframe with the `rasterio.features.rasterize` function. With this function, we aim to produce a grid with numerical values representing the types of crop as defined by the column `gewascode` from `field_cropped` - `gewascode` stands for the crop codes as defined by the Netherlands Enterprise Agency (RVO) for different types of crops or `gewas` (Grassland, permanent; Grassland, temporary; corn fields; etc.). This grid of values thus defines the zones for the `xrspatial.zonal_stats` function, where each pixel in the zone grid overlaps with a corresponding pixel in our NDVI raster.
+Before calculating zonal statistics, we first need to rasterize our `field_to_raster_crs` vector geodataframe with the `rasterio.features.rasterize` function. With this function, we aim to produce a grid with numerical values representing the types of crops as defined by the column `gewascode` from `field_cropped` - `gewascode` stands for the crop codes as defined by the Netherlands Enterprise Agency (RVO) for different types of crop or `gewas` (Grassland, permanent; Grassland, temporary; corn fields; etc.). This grid of values thus defines the zones for the `xrspatial.zonal_stats` function, where each pixel in the zone grid overlaps with a corresponding pixel in our NDVI raster.
 
 We can generate the `geometry, gewascode` pairs for each vector feature to be used as the first argument to `rasterio.features.rasterize` as:
 
 ~~~
-geom = field_to_raster_crs[['geometry', 'gewascode']].values.tolist()
+geom = fields_utm[['geometry', 'gewascode']].values.tolist()
 geom
 ~~~
 {: .language-python}
@@ -109,34 +74,66 @@ geom
 ~~~
 {: .output}
 
-This generates a list of the shapely geometries from the `geometry` column, and the unique field ID from the `gewascode` column in the `field_to_raster_crs` geodataframe.
+This generates a list of the shapely geometries from the `geometry` column, and the unique field ID from the `gewascode` column in the `fields_utm` geodataframe.
 
 We can now rasterize our vector data using `rasterio.features.rasterize`:
 
 ~~~
 from rasterio import features
-field_cropped_raster = features.rasterize(geom, out_shape=ndvi_sq.shape, fill=0, transform=ndvi.rio.transform())
+fields_rasterized = features.rasterize(geom, out_shape=ndvi.shape, transform=ndvi.rio.transform())
+~~~
+{: .language-python}
+
+The argument `out_shape` specifies the shape of the output grid in pixel units, while `transform` represents the projection from pixel space to the projected coordinate space. By default, the pixels that are not contained within a polygon in our shapefile will be filled with 0. It's important to pick a fill value that is not the same as any value already defined in `gewascode` or else we won't distinguish between this zone and the background.
+
+Let's inspect the results of rasterization:
+
+~~~
+import numpy as np
+print(fields_rasterized.shape)
+print(np.unique(fields_rasterized))
 ~~~
 {: .language-python}
 
-The argument `out_shape` specifies the shape of the output grid in pixel units, while `transform` represents the projection from pixel space to the projected coordinate space. We also need to specify the fill value for pixels that are not contained within a polygon in our shapefile, which we do with `fill = 0`. It's important to pick a fill value that is not the same as any value already defined in `gewascode` or else we won't distinguish between this zone and the background.
+~~~
+(500, 500)
+[  0 259 265 266 331 332 335 863]
+~~~
+{: .output}
+
+The output `fields_rasterized` is an `np.ndarray` with the same shape as `ndvi`. It contains `gewascode` values at the location of fields, and 0 outside the fields. Let's visualize it:
 
-We convert the output of the `rasterio.features.rasterize` function, which generates a numpy array `np.ndarray`, to `xarray.DataArray` which will be used further:
+~~~
+from matplotlib import pyplot as plt
+plt.imshow(fields_rasterized)
+plt.colorbar()
+~~~
+{: .language-python}
+
+![Rasterization results](../fig/E10-01-rasterization-results.png)
+
+We will convert the output to `xarray.DataArray` which will be used further. To do this, we will "borrow" the coordinates from `ndvi`, and fill in the rasterization data:
 ~~~
 import xarray as xr
-field_cropped_raster_xarr = xr.DataArray(field_cropped_raster)
+fields_rasterized_xarr = ndvi.copy()
+fields_rasterized_xarr.data = fields_rasterized
+
+# visualize
+fields_rasterized_xarr.plot(robust=True)
 ~~~
 {: .language-python}
 
+![Rasterization results Xarray](../fig/E10-02-rasterization-results-xr.png)
+
 # Calculate zonal statistics
 
 In order to calculate the statistics for each crop zone, we call the function, `xrspatial.zonal_stats`. The `xrspatial.zonal_stats` function takes as input `zones`, a 2D `xarray.DataArray`, that defines different zones, and `values`, a 2D `xarray.DataArray` providing input values for calculating statistics.
 
-We call the `zonal_stats` function with `field_cropped_raster_xarr` as our classifier and the 2D raster with our values of interest `ndvi_sq` to obtain the NDVI statistics for each crop type:
+We call the `zonal_stats` function with `fields_rasterized_xarr` as our classifier and the 2D raster with our values of interest `ndvi` to obtain the NDVI statistics for each crop type:
 
 ~~~
 from xrspatial import zonal_stats
-zonal_stats(field_cropped_raster_xarr, ndvi_sq)
+zonal_stats(fields_rasterized_xarr, ndvi)
 ~~~
 {: .language-python}
 
@@ -153,31 +150,37 @@ zonal_stats(field_cropped_raster_xarr, ndvi_sq)
 ~~~
 {: .output}
 
-The `zonal_stats` function calculates the minimum, maximum, and sum for each zone along with statistical measures such as the mean, variance and standard deviation for each rasterized vector zone. In our raster data-set `zone = 0`, corresponding to non-crop areas, has the highest count followed by `zone = 265` which corresponds to 'Grasland, blijvend' or 'Grassland, permanent'. The highest mean NDVI is observed for `zone = 266` for 'Grasslands, temporary' with the lowest mean, aside from non-crop area, going to `zone = 863` representing 'Forest without replanting obligation'. Thus, the `zonal_stats` function can be used to analyse and understand different sections of our raster data. The definition of the zones can be derived from vector data or from classified raster data as presented in the challenge below:
+The `zonal_stats` function calculates the minimum, maximum, and sum for each zone along with statistical measures such as the mean, variance and standard deviation for each rasterized vector zone. In our raster dataset `zone = 0`, corresponding to non-crop areas, has the highest count followed by `zone = 265` which corresponds to 'Grasland, blijvend' or 'Grassland, permanent'. The highest mean NDVI is observed for `zone = 266` for 'Grasslands, temporary' with the lowest mean, aside from non-crop area, going to `zone = 863` representing 'Forest without replanting obligation'. Thus, the `zonal_stats` function can be used to analyze and understand different sections of our raster data. The definition of the zones can be derived from vector data or from classified raster data as presented in the challenge below:
 
 > ## Challenge: Calculate zonal statistics for zones defined by `ndvi_classified`
 >
 > Let's calculate NDVI zonal statistics for the different zones as classified by `ndvi_classified` in the previous episode.
 >
-> Load both raster data-sets and convert into 2D `xarray.DataArray`.
-> Then, calculate zonal statistics for each `class_bins`. Inspect the output of the `zonal_stats` function.
+> Load both raster datasets: `NDVI.tif` and `NDVI_classified.tif`. Then, calculate zonal statistics for each `class_bins`. Inspect the output of the `zonal_stats` function.
 >
 >
 > > ## Answers
 > > 1) Load and convert raster data into suitable inputs for `zonal_stats`:
 > >
 > > ```python
-> > ndvi = rioxarray.open_rasterio("NDVI.tif")
-> > ndvi_classified = rioxarray.open_rasterio("NDVI_classified.tif")
-> > ndvi_sq = ndvi.squeeze()
-> > ndvi_classified_sq = ndvi_classified.squeeze()
+> > ndvi = rioxarray.open_rasterio("NDVI.tif").squeeze()
+> > ndvi_classified = rioxarray.open_rasterio("NDVI_classified.tif").squeeze()
 > > ```
 > >
 > > 2) Create and display the zonal statistics table.
 > >
 > > ```python
-> > zonal_stats(ndvi_classified_sq, ndvi_sq)
+> > zonal_stats(ndvi_classified, ndvi)
 > > ```
+> > 
+> > ~~~
+> > 	zone    mean       max        min           sum         std      var	  count  
+> > 0     1  -0.355660  -0.000257  -0.998648  -12838.253906  0.145916  0.021291  36097.0  
+> > 1     2   0.110731   0.199839   0.000000    1754.752441  0.055864  0.003121  15847.0  
+> > 2     3   0.507998   0.700000   0.200000   50410.167969  0.140193  0.019654  99233.0  
+> > 3     4   0.798281   0.999579   0.700025   78888.523438  0.051730  0.002676  98823.0
+> > ~~~
+> > {: .output}
 > {: .solution}
 {: .challenge}
 

diff --git a/_episodes/11-parallel-raster-computations.md b/_episodes/11-parallel-raster-computations.md
@@ -1,7 +1,7 @@
 ---
 title: "Parallel raster computations using Dask"
-teaching: 40
-exercises: 20
+teaching: 45
+exercises: 25
 questions:
 - "How can I parallelize computations on rasters with Dask?"
 - "How can I determine if parallelization improves calculation speed?"

diff --git a/fig/E10-01-rasterization-results.png b/fig/E10-01-rasterization-results.png
diff --git a/fig/E10-02-rasterization-results-xr.png b/fig/E10-02-rasterization-results-xr.png