Skip to content

Commit

Permalink
generate watersurfaces_hab v6, based on
Browse files Browse the repository at this point in the history
- watersurfaces 2024
- habitatmap_stdized 2023_v1
  • Loading branch information
cecileherr committed Dec 11, 2024
1 parent 265fbe0 commit 9f9ce86
Show file tree
Hide file tree
Showing 3 changed files with 47 additions and 45 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ We will create a map with watersurfaces that contain aquatic habitat and rib. Th

* The water surfaces map of Flanders `watersurfaces`

To be sure we will use the correct version of the data sources (version 2023 for the processed habitatmap and version 1.2 for watersurfaces), we will first derive the md5 file hashes and compare them to the file hashes in the [data source version overview table](https://docs.google.com/spreadsheets/d/1E8ERlfYwP3OjluL8d7_4rR1W34ka4LRCE35JTxf3WMI/edit#gid=2100595853)
To be sure we will use the correct version of the data sources (version 2023 for the processed habitatmap and version 2024 for watersurfaces), we will first derive the md5 file hashes and compare them to the file hashes in the [data source version overview table](https://docs.google.com/spreadsheets/d/1E8ERlfYwP3OjluL8d7_4rR1W34ka4LRCE35JTxf3WMI/edit#gid=2100595853)

### Processed habitatmap

Expand All @@ -29,7 +29,7 @@ hashes <-
md5 = map(filepath, function(x) {
x %>% md5sum() %>% str_c(collapse = '')
}) %>% as.character) %>%
mutate(md5_ref = c("5e9a0cb2a53f88001796bd7457a343ac"),
mutate(md5_ref = c("5e9a0cb2a53f88001796bd7457a343ac"), # version 2023_v1
match = md5 == md5_ref) %>%
select(filename,
md5,
Expand Down Expand Up @@ -70,7 +70,7 @@ hashes <-
md5 = map(filepath, function(x) {
x %>% md5sum() %>% str_c(collapse = '')
}) %>% as.character) %>%
mutate(md5_ref = c("72f6575ae7095622cd92eb2be720c7cb"), # version 1.2
mutate(md5_ref = c("d862df5b5e9ee8a2de4c333a7dcd7645"), # version 2024
match = md5 == md5_ref) %>%
select(filename,
md5,
Expand All @@ -84,7 +84,7 @@ if (!all.equal(hashes$md5, hashes$md5_ref)) {
stop(cat("The source map is NOT up to date ! Please check the datasource. "))
}
devtools::load_all("../../../n2khab") # n2khab not updated yet, I load my dev version

This comment has been minimized.

Copy link
@florisvdh

florisvdh Dec 13, 2024

Member

Hm, this may become difficult since you want to have a reproducible workflow in place before inbo/n2khab#192 is merged, while using its fix_geom and other 'newest' features to fix things on the latest watersurfaces.

So you'll need n2khab::whatever() anyway, assuming an installed version.

To avoid the need to make two n2khab releases (first one to release read_watersurfaces()), install the appropriate n2khab dev version with remotes::install_github() and register it with renv.

You may have intended all this; just marking it here for safety.

# load watersurfaces with corrected geometry
# (argument fix_geom available since n2khab 0.9.0)
watersurfaces <- read_watersurfaces(fix_geom = TRUE)
Expand Down
82 changes: 42 additions & 40 deletions src/generate_watersurfaces_hab/20_check_result.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -66,59 +66,59 @@ sum(!validities | is.na(validities)) == 0

## Compare with the previous version

We load the previous version of watersurfaces_hab (version 4 based on watersurfaces_v1.1 and habitatmap_stdized_2020_v1)
We load the previous version of watersurfaces_hab (version 5 based on watersurfaces_v1.2 and habitatmap_stdized_2023_v1)

This comment has been minimized.

Copy link
@florisvdh

florisvdh Dec 16, 2024

Member

Note that you will also need to update the reference to v6 (instead of v5) in the previous title ## The new version v5 and text underneath (lines 3 and 5).


```{r paged.print=FALSE, warning=FALSE}
filepath_v4 <- file.path(path,"20_processed/_versions/watersurfaces_hab/watersurfaces_hab_v4/watersurfaces_hab.gpkg")
filepath_v5 <- file.path(path,"20_processed/_versions/watersurfaces_hab/watersurfaces_hab_v5/watersurfaces_hab.gpkg")
hashes <-
tibble(filepath_v4) %>%
mutate(filename = basename(filepath_v4),
md5 = map(filepath_v4, function(x) {
tibble(filepath_v5) %>%
mutate(filename = basename(filepath_v5),
md5 = map(filepath_v5, function(x) {
x %>% md5sum() %>% str_c(collapse = '')
}) %>% as.character) %>%
mutate(md5_ref = c("96b6a7abc3d637d71052900970e70904"),
mutate(md5_ref = c("e7d9930938f5111de33de6ecaec31a66"),
match = md5 == md5_ref) %>%
select(filename,
md5,
md5_ref,
match)
if (!all.equal(hashes$md5, hashes$md5_ref)) {
stop(cat("The source map is NOT v4 ! Please check the datasource. "))
stop(cat("The source map is NOT v5 ! Please check the datasource. "))
}
(pol_v4 <- read_sf(filepath_v4,
(pol_v5 <- read_sf(filepath_v5,
layer = "watersurfaces_hab_polygons"))
(types_v4 <- read_sf(filepath_v4,
(types_v5 <- read_sf(filepath_v5,
layer = "watersurfaces_hab_types"))
```

- Are there differences between version v4 and version v5 and where are they located?
- Are there differences between version v5 and version v6 and where are they located?

In the table below we check the differences between both versions.
Note that for a large number of records only polygon_habitatmap_id changes, but the geometry and the type description remain the same.

```{r}
# polygons with polygon_id in v5 but not in version v4
check_polygon_id_v5_v4 <- pol %>%
anti_join(pol_v4 %>%
# polygons with polygon_id in v6 but not in version v5
check_polygon_id_v6_v5 <- pol %>%
anti_join(pol_v5 %>%
st_drop_geometry(),
by = c("polygon_id_habitatmap", "polygon_id_ws")) %>%
left_join(pol_v4 %>%
mutate(geom_text_v4 = st_as_text(geom)) %>%
left_join(pol_v5 %>%
mutate(geom_text_v5 = st_as_text(geom)) %>%
st_drop_geometry() %>%
select(polygon_id, description_orig_v4 = description_orig, geom_text_v4),
select(polygon_id, description_orig_v5 = description_orig, geom_text_v5),
by = c("polygon_id")) %>%
mutate(new_polygon_id = !(polygon_id %in% pol_v4$polygon_id),
new_polygon_id_ws = !(polygon_id_ws %in% pol_v4$polygon_id_ws),
new_polygon_id_habitatmap = !(polygon_id_habitatmap %in% pol_v4$polygon_id_habitatmap),
description_orig_update = description_orig != description_orig_v4,
geom_text_v5 = st_as_text(geom),
geom_update = geom_text_v5 != geom_text_v4)
check_polygon_id_v5_v4 %>%
mutate(new_polygon_id = !(polygon_id %in% pol_v5$polygon_id),
new_polygon_id_ws = !(polygon_id_ws %in% pol_v5$polygon_id_ws),
new_polygon_id_habitatmap = !(polygon_id_habitatmap %in% pol_v5$polygon_id_habitatmap),
description_orig_update = description_orig != description_orig_v5,
geom_text_v6 = st_as_text(geom),
geom_update = geom_text_v6 != geom_text_v5)
check_polygon_id_v6_v5 %>%
st_drop_geometry() %>%
group_by(new_polygon_id, new_polygon_id_ws, new_polygon_id_habitatmap, geom_update, description_orig_update) %>%
summarise(n_records = n()) %>%
Expand All @@ -128,26 +128,28 @@ check_polygon_id_v5_v4 %>%
```

We check some of the polygons for which the geometry has changed.
Changes are minor.
We check some of the polygons for which the geometry has changed.

In this case there are only 2 polygons with modified geometry
Changes are minor for Stappersven, and bigger extent for Houtsaegerduinen.

```{r}
check_geom <- check_polygon_id_v5_v4 %>%
check_geom <- check_polygon_id_v6_v5 %>%
filter(geom_update & !is.na(geom_update)) %>%
slice_head(n = 5) %>%
st_transform(4326)
check_geom_v4 <- pol_v4 %>%
check_geom_v5 <- pol_v5 %>%
filter(polygon_id %in% check_geom$polygon_id) %>%
st_transform(4326)
check_geom %>%
leaflet() %>%
addTiles() %>%
addPolygons(group = "v5") %>%
addPolygons(data = check_geom_v4, color = "yellow", group = "v4") %>%
addPolygons(group = "v6") %>%
addPolygons(data = check_geom_v5, color = "red", group = "v5") %>%
addLayersControl(
overlayGroups = c("v5", "v4"),
overlayGroups = c("v6 (blue)", "v5 (red)"),
options = layersControlOptions(collapsed = FALSE)
)
```
Expand All @@ -156,43 +158,43 @@ check_geom %>%


```{r}
check_polygon_id_v4_v5 <- pol_v4 %>%
check_polygon_id_v5_v6 <- pol_v5 %>%
anti_join(pol %>%
st_drop_geometry(),
by = c("polygon_id_habitatmap", "polygon_id_ws")) %>%
mutate(ws_removed = !(polygon_id_ws %in% pol$polygon_id_ws))
nrow(check_polygon_id_v4_v5)
nrow(check_polygon_id_v5_v6)
```

Here we show:

+ new polygons from the watersurfaces layer that are included in `watersurfaces_hab_v5` (blue polygons)
+ polygons from the watersurfaces layer that are removed in `watersurfaces_hab_v5` compared to `watersurfaces_hab_v4` (black polygons)
+ new polygons from the watersurfaces layer that are included in `watersurfaces_hab_v6` (blue polygons)
+ polygons from the watersurfaces layer that are removed in `watersurfaces_hab_v6` compared to `watersurfaces_hab_v5` (black polygons)


```{r}
ws_new <- check_polygon_id_v5_v4 %>%
ws_new <- check_polygon_id_v6_v5 %>%
filter(new_polygon_id_ws) %>%
st_transform(crs = 4326)
ws_removed <- check_polygon_id_v4_v5 %>%
ws_removed <- check_polygon_id_v5_v6 %>%
filter(ws_removed) %>%
st_transform(crs = 4326)
leaflet() %>%
addTiles(group = "OSM (default)") %>%
addPolygons(data = ws_new,
group = "in v5 and not v4 (blue)",
group = "in v6 and not v5 (blue)",
popup = paste("polygon_id_habitatmap:", ws_new$polygon_id_habitatmap, "<br>",
"polygon_id_ws:", ws_new$polygon_id_ws)) %>%
addPolygons(data = ws_removed,
color = "black", group = "in v4 and not v5 (black)",
color = "black", group = "in v5 and not v6 (black)",
popup = paste("polygon_id_habitatmap:", ws_removed$polygon_id_habitatmap, "<br>",
"polygon_id_ws:", ws_removed$polygon_id_ws)) %>%
addLayersControl(
overlayGroups = c("in v5 and not v4 (blue)", "in v4 and not v5 (black)"),
overlayGroups = c("in v6 and not v5 (blue)", "in v5 and not v6 (black)"),
options = layersControlOptions(collapsed = FALSE)
)
```
Expand Down
2 changes: 1 addition & 1 deletion src/generate_watersurfaces_hab/index.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ library(leaflet)
# ISO8601 timestamp to set as fixed value in the GeoPackage
# (to be UPDATED to the actual creation date; at least update for each version):
Sys.setenv(OGR_CURRENT_DATE = "2024-05-15T00:00:00.000Z")
Sys.setenv(OGR_CURRENT_DATE = "2024-12-11T00:00:00.000Z")
# This is used to keep results reproducible, as the timestamp is otherwise
# updated each time.
# Above environment variable OGR_CURRENT_DATE is used by the GDAL driver.
Expand Down

0 comments on commit 9f9ce86

Please sign in to comment.