Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Categorical & continuous sampling methods #17

Open
alpha-beta-soup opened this issue Aug 30, 2024 · 0 comments
Open

Categorical & continuous sampling methods #17

alpha-beta-soup opened this issue Aug 30, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@alpha-beta-soup
Copy link
Member

Mechenich, M.F., Žliobaitė, I. Eco-ISEA3H, a machine learning ready spatial database for ecometric and species distribution modeling. Sci Data 10, 77 (2023). https://doi.org/10.1038/s41597-023-01966-x

This paper has details of various sampling strategies employed for indexing raster data.

Categorical

  • Centroid: record the categorical variable occuring at each cell centroid. Nulls are carried over.
  • Fraction: record the proportion of each cell's area covered by each categorical value. There would be a fraction attribute for each class for each cell. (A sparse data structure could help manage this.)
  • Mode: as it suggestes on the tin; but a null value used in cases where fraction attributes sum to less than 0.2 of the cell's area. (I think this is probably wrong; it leads to data loss for cells on the edge of nodata areas. Perhaps there should be a switch for whether null should be a valid modal value; or to give a threshold like 0.2 as a parameter.)

Continuous

  • Centroid: as above.
  • Mean: area-weighted arithmetic mean. The authors are careful to do the conversion operations in the native coordinate reference system. For data in authalic coordinate reference systems, the area-weighted mean is the simple mean. But for data in WGS84, they calculate the size of each pixel and use that as a weight when calculating the mean. See VRT warping method causing spatial inconsistencies.  #14 for other discussion on how we handle reprojection issues currently; it may need revision.

This issue should be closed when this tool is capable of reproducing all of these cases.

@alpha-beta-soup alpha-beta-soup added the enhancement New feature or request label Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant