-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternative cluster pinning option #48
Comments
Hey @teknomunk, 0.14 was released and include batched-pinning, which is basically what you wanted to do manually: merge multiple changes together to one update to the cluster. It's sadly still not a manual transaction, as I asked maybe could be implemented (the ticket tracking this request: ipfs-cluster/ipfs-cluster#1018 (comment)). But nevertheless the batching should in theory make the first mode of operation more viable again, as we produce a lot less commits. The space cleanup issue was also tackled with 0.14 and the space used by the db dropped significantly (ipfs-cluster/ipfs-cluster#1320 (comment)). This means the boiler plating previously necessary should be no longer needed. :) |
I don't see any way to do this cleanly without a lot of additional effort. Additionally single packages which won't update for a year or two would block all other packages unnecessarily from cleaning up. So we would have to traverse all folders and clean up already deleted packages every once in a while. |
I recently updated ipfs-cluster-follow to 0.14.0. If it automatically does batching to only add the new files, that would be appreciated. If I remember, I'll look at this later today. |
@teknomunk well, it would do batching and combine many new pins into one operation. But since we currently use a recursive pin of a folder this isn't changing anything. We would need to switch back to pin individual files to the cluster to take advantage of this. While the source code is still in the repo, I really don't like switching back. It feels pretty hacky tbh. I rather like the IPFS team to investigate why traversing between two folder versions and fetching the changes is so hard on IO. |
Maybe you could highlight our use-case in a bug report on https://github.com/ipfs/go-ipfs? |
This idea was based on the discussion at #42 about High I/O usage and the two ways that the cluster has distributed pin instructions to cluster follower nodes up to now:
As I run a cluster node, this affects the utilization of my hardware. I never noticed the high disk space utilization as a problem because of the amount of disk space I have (>10TB), but I have noticed the high disk utilization and have taken steps to mitigate the slowdown due to high disk I/O as it affected other processes I am running (SSD cache of the logical volume the data resides on).
This is an attempt to describe an idea that should have neither the high disk I/O utilization of pinning the root folder hash nor the high disk space utilization of pinning each updated file.
Under this option, the folder structure under /ipns/x86-64.archlinux.pkg.pacman.store/ is not changed at all from its current state at all. Instead we create a completely separate directory structure that contains the same package files with a different structure optimized for making the cluster members pin just the new packages without having to check all the other packages and directories in the repo.
As an example, consider that update with only the packages abiword and go-ipfs. You would create a directory like this:
/2021-01-22-001/
/2021-01-22-001/abiword-3.0.4-4-x86_64.pkg.tar.zst
/2021-01-22-001/go-ipfs-0.7.0-1-x86_64.pkg.tar.zst
in addition to updating /extra/ and /community/, then add the hash of the folder /2021-01-22-001/ to the cluster. This folder would exist only in the cluster, and only for the purpose of having the cluster members pin those two new packages. People not part of the cluster should never see these directories.
If you then got another set of package updates, you would create another folder for only those additional packages:
/2021-01-22-002/
/2021-01-22-002/dbus-broker-26-1-x86_64.pkg.tar.zst
/2021-01-22-002/fftw-3.3.9-1-x86_64.pkg.tar.zst
/2021-01-22-002/xorg-docs-1.7.1-3-any.pkg.tar.zst
/2021-01-22-002/yasm-1.3.0-4-x86_64.pkg.tar.zst
There are a number of ways do decide when to remove these update directories from the cluster:
Looking at rsync2ipfs-cluster/bin/rsync2cluster.sh, to implement this idea, I think you will only need to modify ipfs_mfs_add_file() to take a third parameter (the update folder path in MFS) along with adding the file's CID to the update folder, and add the update folder to the cluster pin set.
The text was updated successfully, but these errors were encountered: