Skip to content

Performance tests on CESNET infra #16

@guillaumeeb

Description

@guillaumeeb

In #15, some first tests on using Xarray on CESNET with Swift are proposed. This shows that (thanks to @sebastian-luna-valero), it's quite easy (even with some manual steps) to write on the Swift object storage using Zarr and Dask.

One thing we might want to do know would be to perform some light but comprehensive benchmarks to identify what performances we could get on this Infrastructure.

A classical benchmark could be:

  • Define some example datasets: small, medium, large (up to some TiB?).
    • 10GiB
    • 100GiB
    • 1TiB
  • Write these datasets on Dask clusters varying in size:
    • 5 workers
    • 10 workers
    • 20 workers
  • Read back the datasets on varying Dask clusters
  • Compute things like troughput or other stats (maybe by analyzing the Dask task report)
  • Analyze the results.

We might want to start from something like https://github.com/pangeo-data/benchmarking on wihch @tinaok contributed.

We need also to ask CESNET for potential limits or constraints they'd have @sebastian-luna-valero.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions