-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Description
In #15, some first tests on using Xarray on CESNET with Swift are proposed. This shows that (thanks to @sebastian-luna-valero), it's quite easy (even with some manual steps) to write on the Swift object storage using Zarr and Dask.
One thing we might want to do know would be to perform some light but comprehensive benchmarks to identify what performances we could get on this Infrastructure.
A classical benchmark could be:
- Define some example datasets: small, medium, large (up to some TiB?).
- 10GiB
- 100GiB
- 1TiB
- Write these datasets on Dask clusters varying in size:
- 5 workers
- 10 workers
- 20 workers
- Read back the datasets on varying Dask clusters
- Compute things like troughput or other stats (maybe by analyzing the Dask task report)
- Analyze the results.
We might want to start from something like https://github.com/pangeo-data/benchmarking on wihch @tinaok contributed.
We need also to ask CESNET for potential limits or constraints they'd have @sebastian-luna-valero.
Metadata
Metadata
Assignees
Labels
No labels