Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filesystem concept (Feb.) #1313

Closed
8 of 11 tasks
Tracked by #1333
SCA-ZMT opened this issue Mar 28, 2024 · 16 comments
Closed
8 of 11 tasks
Tracked by #1333

Filesystem concept (Feb.) #1313

SCA-ZMT opened this issue Mar 28, 2024 · 16 comments
Assignees
Labels
PO issue Created by Product owners y8 NIH SPARC Y8 (originally in wrike)

Comments

@SCA-ZMT
Copy link
Contributor

SCA-ZMT commented Mar 28, 2024

NIH milestone

This is also a milestone for NIH, due by Y8Q2 (Feb. 2025)

Large Files

Improve handling of large files and reduce file-transfer operations as much as possible. This requires some research. The solution should ideally also work on-premise. Storage might need to be compeletey overhauled.

  • Mount
  • Cache

Data Accessibility [This is a Milestone for NIH Year 8 Q2]

Provide infrastructure to users to inspect/download/zip/delete files. This is a Milestone in NIH Year 8 Q2 (see #1635)

  • Extend webapi
  • Create an amazing frontend

Shared Folder(s) - Advanced

Provide users the possibility to mount arbitrary data into their services:

  • S3 data from other studies
  • other cloud storage (dropbox/google)
  • data from their own computer (sftp)

Tasks

Preview Give feedback
  1. matusdrobuliak66
  2. 4 of 6
    a:agent a:dynamic-sidecar
    GitHK matusdrobuliak66
  3. t:enhancement
    matusdrobuliak66
  4. Feedback type:enhancement y8
    ignapas jsaq007
  5. a:dynamic-sidecar a:simcore-sdk
    matusdrobuliak66

Eisbock

Preview Give feedback
  1. matusdrobuliak66
  2. Working Group
    GitHK matusdrobuliak66
    mguidon sanderegg
  3. matusdrobuliak66
  4. a:infra+ops efs-guardian
    matusdrobuliak66
  5. 2 of 2
    a:dask-service a:simcore-sdk
    GitHK matusdrobuliak66
@SCA-ZMT SCA-ZMT added the PO issue Created by Product owners label Mar 28, 2024
@SCA-ZMT
Copy link
Contributor Author

SCA-ZMT commented Mar 28, 2024

@mguidon could you please edit the issue description with more info / tasks?

@SCA-ZMT SCA-ZMT added this to the Leeroy Jenkins milestone May 9, 2024
@SCA-ZMT SCA-ZMT added Budget PO issue Created by Product owners and removed PO issue Created by Product owners Budget labels May 9, 2024
@matusdrobuliak66
Copy link
Contributor

matusdrobuliak66 commented May 14, 2024

First brainstorming took place with @sanderegg, @mguidon, @matusdrobuliak66. We would like to experiment with AWS Elastic File System in three stages:

  1. Caching images
  2. Caching workspace
  3. Reusing the same workspace across multiple EC2 instances

We may potentially investigate the AWS DataSync option, but for now, we prefer to stick with RClone.

Why?

  • Potentially huge service start/stop time and cost savings (not need of buffer machines).

@GitHK
Copy link
Contributor

GitHK commented May 14, 2024

@matusdrobuliak66 @sanderegg @mguidon why was rclone discarded? Has anybody thought about rclown mount which uses fuse and will stream data on the go on access and use a cache for writing back to S3.

This will allow us to start services without waiting. Also saving will go to S3 directly. So once the user is done interacting with the FS, it's as if the file was already on S3.

@sanderegg
Copy link
Member

@GitHK rclone is not discarded. we go stepwise. and until fixed, RClone presents some bad habit of blowing up without anyone knowing why.

@mguidon
Copy link
Member

mguidon commented May 14, 2024

@GitHK Indeed, we did not discarcd it, we go step by step. For now rclone sync should do exactly what we want without the need for fuse.

@GitHK
Copy link
Contributor

GitHK commented May 16, 2024

After some issues with rclone, it will be phased out in favour of aws s3 sync. If we require fuse, we could use s3fs-fuse which is also managed by AWS.

@matusdrobuliak66
Copy link
Contributor

matusdrobuliak66 commented May 21, 2024

Investigation

Caching images

We want to have a quick startup time and get rid of buffer machines.

1. EFS (docker image save/load)

  • Mount EFS to EC2
  • Save images to a tar archive in the EFS. On start load them via docker image load
  • It is slow for big images

2. EFS (moving Image data to EFS via symbolic link)

  • From Docker documentation, it is not recommended to mount the whole /var/lib/docker data on the distributed file system as docker stores there also information about containers and it might be problematic if used by more people. Therefore I was looking for a way how to move only .../image and .../overlay2 directories via symbolic link. In any case, it seems it is not possible to do it on EFS (nfs4 type) because I have encountered an issue with moving "backingFsBlockDev" file https://forums.docker.com/t/moving-var-lib-docker-to-amazon-efs/102070

3. EBS snapshot/volume (pre-baked AMI)

  • We can create a snapshot of your EBS with already pulled images.
  • From this snapshot, a volume can be created almost instantly
  • This volume can be attached to the EC2 machine and docker daemon can be restarted and pointed to that volume similarly as we do now via boot script.
  • Basically, we will do something similar that we do know, but we will add additional logic to manage the EBS volumes ourselves.

4. Multi-attach EBS

  • Not useful in our case.

Caching workspace

1. s3fs

  • https://surajblog.medium.com/mount-aws-s3-bucket-on-amazon-ec2-9f18b48d4f04
  • Generally, S3 cannot offer the same performance or semantics as a local file system. More specifically:
    • random writes or appends to files require rewriting the entire object, optimized with multi-part upload copy
    • metadata operations such as listing directories have poor performance due to network latency
    • non-AWS providers may have eventual consistency so reads can temporarily yield stale data (AWS offers read-after-write consistency since Dec 2020)
    • no atomic renames of files or directories
    • no coordination between multiple clients mounting the same bucket
    • no hard links
    • inotify detects only local modifications, not external ones by other clients or tools
  • It highlights a few important considerations when using s3fs, namely related to the inherent limitations of S3:
    • no file can be over 5GB
    • you can't partially update a file so changing a single byte will re-upload the entire file.
    • operation on many small files are very efficient (each is a separate S3 object after all) but large files are very inefficient
    • Though S3 supports partial/chunked downloads, s3fs doesn't take advantage of this so if you want to read just one byte of a 1GB file, you'll have to download the entire GB.
  • Some states it is not production ready
  • Repeated comments: "S3 is not a file system!"
  • Bye bye s3fs-fuse, hello S3 Mountpoint: https://blog.davidjeddy.com/2023/08/10/bye-bye-s3fs-fuse-hello-s3-mountpoint/

2. Mountpoint S3

  • pretty new + (repeated comments so we do not forget: "S3 is not a file system!")
  • Mountpoint for Amazon S3 allows your applications to access objects stored in Amazon S3 through file operations like open and read. This file access is optimized for applications that need high read throughput to large objects, potentially from many clients at once, and to write new objects sequentially from a single client at a time. While this model suits a wide range of applications, Mountpoint does not implement all the features of a POSIX file system, and there are some differences that may affect compatibility with your application.
  • While the rest of this document gives details on specific file system behaviors, we can summarize the Mountpoint approach in three high-level tenets:
    • Mountpoint does not support file behaviors that cannot be implemented efficiently against S3's object APIs. It does not emulate operations like rename that would require many API calls to S3 to perform.
    • Mountpoint presents a common view of S3 object data through both file and object APIs. It does not emulate POSIX file features that have no close analog in S3's object APIs, such as ownership and permissions.
    • When these tenets conflict with POSIX requirements, Mountpoint fails early and explicitly. We would rather cause applications to fail with IO errors than silently accept operations that Mountpoint will never successfully persist, such as extended attributes.

3. EFS

  • Serverless, fully elastic file storage
  • We would like to use it for caching user data
  • It can be shared across multiple EC2 instances (we can introduce shared folder between nodes/projects)
  • We can use aws s3 sync or rclone or data sync to sync between S3 and EFS.

General S3 comment

As mention by @sanderegg if we setup correctly S3 Endpoint + Region where EC2 <-> S3 are located we can have zero S3 costs!

Conclusion/Recommendation

  • Caching images --> Self managed EBS volume should do the job
  • Caching workspace --> If we want to use proper file system I would not experiment with mounting S3 directly, but go rather with EFS.

@sanderegg
Copy link
Member

sanderegg commented May 21, 2024

@matusdrobuliak66
Copy link
Contributor

matusdrobuliak66 commented May 22, 2024

Investigation (Part 2)

Caching Images

  • Testing docker pull (Sim4Life core image):
    • Our registry (current state) --> 3m58s
    • ECR registry --> 5m6s / 4m56s
    • In both cases, the extraction step takes more than half of that time.
  • I found an article about optimizing docker pull: Docker Speed Test. They achieved a speed of 0.7 sec / 100 MB. In our case, it would still take minutes.
  • time docker load --input /mnt/efs/fs1/docker_images/android.tar from ECR --> A 6.5GB image saved as a tar file in ECR took 4 minutes.
  • Testing stopping EC2 instances, where EBS volumes stay active:
    • After starting the EC2 machine (with the potential option to change the machine type), images are instantly available to run using docker run. This is a good approach.
    • EBS volumes are relatively cheap gp3 storage -> $0.08/GB-month. For 200GB, it would cost $16. I suggest we can have EBS buffers for multiple cached images to be instantly ready.

Caching Workspace

Two working examples (mounting EFS to docker container):

  • 1:
docker volume create --driver local --opt type=none --opt device=/mnt/efs/fs1/matus_docker_volume --opt o=bind my_limited_volume
docker run -d --name my_container -v my_limited_volume:/data redis
  • 2:
version: "3.7"
services:
   wp_gary_gitton:
       image: wordpress:6.3
       volumes:
           - wp_gary_gitton:/var/www/html/wp-content

volumes:
   wp_gary_gitton:
       driver_opts:
           type: nfs
           o: addr=fs-<some-number>.efs.us-east-1.amazonaws.com,rw,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport
           device: :/docker_compose_test

It seems there is not straight forward option how to limit docker volume size when mounting. EFS doesn't support quotas. In any case I think it is not necessary. We can use CloudWatch and for example Lambda to monitor and manage EFS usage programmatically.

  • Script to Monitor Directory Sizes: A script runs periodically to check the size of each project's directory and sends this data to CloudWatch.
  • CloudWatch Alarms: Alarms are set up to monitor the directory sizes and trigger actions when limits are exceeded.
  • Lambda Function: A Lambda function handles the alarms and takes appropriate actions, such as enforcing policies.
    • For example: Using EFS Access Points with IAM Policies -> we can restrict access based on some policy.

@sanderegg
Copy link
Member

@matusdrobuliak66 about differences between shutdown and termination, please look here: AWS reference, this link was also already referenced from before.
Now regarding what you say with the EBS volume as "active", since on start/stop mechanism the actual "machine" changes are you sure that the access times when you restart the machine are on par with a hot machine?
There are simple tests (like the pre-warm up test of AWS) to check that. if parsing the blocks takes a long time then that means the volume is there but 10-100 times slower. Please check it. Keeping volumes around is costly.

Another question, how do you setup the EBS volumes without EC2s?

@matusdrobuliak66
Copy link
Contributor

matusdrobuliak66 commented May 23, 2024

Conclusion/Summary

  • Introduce S3 endpoint (decrease costs/improve performance)
    • Changes: Terraform
  • Replace buffer machines with buffer EBS volumes with cached docker images. (decrease significantly costs (no need of buffer machines))
    • Changes: Autoscaling, AMIs
  • Introduce EFS (decrease starting/stoping project time + we can share data between user services)
    • Changes: Sidecar, director-v2, monitoring/management
  • Replace RClone with aws s3 sync (decrease starting/stoping project time)

@mguidon
Copy link
Member

mguidon commented May 23, 2024

Thanks @matusdrobuliak66. This looks like a plan. Lets keep rclone on the radar for mounting other filesystems (e.g. sftp and 3rd party providers).
I suggest to immediately go on with 1, 2 and 4 and continue investigating 3.

@matusdrobuliak66
Copy link
Contributor

matusdrobuliak66 commented May 24, 2024

EFS Experimentation

testing:

  • 14GB smash file
  • g4dn.4xlarge machine

Outputs:

  • open smash file locally (on Manuel's super computer) -> 4 min 17 sec

EFS (Bursting Mode)

  • copy between EFS mnt and root folder -> 2min 30 sec
  • open the smash file from EFS mnt -> 14min 25 sec

EFS (Elastic Mode)

  • copy between EFS mnt and root folder -> 2min 40 sec
  • open the smash file from EFS mnt -> 15 min

Note:

  • ZIPing operation takes ages, I guess nfs type is not optimized for that.

@matusdrobuliak66
Copy link
Contributor

@GitHK GitHK added this to the South Island Iced Tea milestone Jun 12, 2024
@matusdrobuliak66
Copy link
Contributor

This is how you can enable it for testing purposes:
https://git.speag.com/oSparc/osparc-ops-deployment-configuration/-/merge_requests/639

@GitHK GitHK modified the milestones: South Island Iced Tea, Tom Bombadil Jul 10, 2024
@elisabettai elisabettai added the y8 NIH SPARC Y8 (originally in wrike) label Aug 23, 2024
@elisabettai elisabettai mentioned this issue Aug 23, 2024
32 tasks
@elisabettai elisabettai changed the title Filesystem concept Filesystem concept (Feb.) Aug 26, 2024
@GitHK GitHK removed their assignment Sep 5, 2024
@mrnicegyu11 mrnicegyu11 removed this from the Tom Bombadil milestone Sep 11, 2024
@mrnicegyu11
Copy link
Member

mrnicegyu11 commented Sep 20, 2024

As agreed in the pre-planning meeting of the MartinKippenberger sprint, this ticket was split into three seperate tickets:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PO issue Created by Product owners y8 NIH SPARC Y8 (originally in wrike)
Projects
None yet
Development

No branches or pull requests

9 participants