-
Notifications
You must be signed in to change notification settings - Fork 61
Fix repo serialization with default commit metadata #863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems OK, but I'm not very confident around locks.
icechunk/src/repository.rs
Outdated
| virtual_chunk_credentials: HashMap<ContainerName, Credentials>, | ||
| #[serde(skip)] | ||
| default_commit_metadata: Arc<RwLock<Option<SnapshotProperties>>>, | ||
| default_commit_metadata: Arc<Mutex<Option<SnapshotProperties>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Mutex over RwLock here? I guess in general we don't expect multiple readers here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly. And here we can't use the async versions because of serialization. I can use rwlock if @paraseba would prefer tho
icechunk/src/repository.rs
Outdated
| virtual_chunk_credentials: HashMap<ContainerName, Credentials>, | ||
| #[serde(skip)] | ||
| default_commit_metadata: Arc<RwLock<Option<SnapshotProperties>>>, | ||
| default_commit_metadata: Arc<Mutex<Option<SnapshotProperties>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need locking? Can't the set_default_commit_metadata just take a &mut self? I think that would be the cleanest option, then PyRepository is the one that needs locking to call this method on the repo, which seems also right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we dont currently lock PyRepository at all and its a big change to introduce it. We use internal mutability everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what i'm proposing is maybe a smaller change? Repository today doesn't have internal mutability (which I like), it can easily be used from Rust in "the usual" way. But things like PySession hold a lock to the underlying Rust datastructure. What I'm proposing is we do the same for PyRepository, bringing these two to the same style:
pub struct PyRepository(Arc<Repository>);
pub struct PySession(pub Arc<RwLock<Session>>);
I may be missing something that would make this hard...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry thats what i meant. I was trying to avoid having to lock the Repository but i can do if we want it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can merge and I can give it a try tomorrow if you are busy. But, i'd prefer if we don't introduce internal mutability into the Repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can do it in this pr then! Thanks for the feedback
|
Need to do python next but have to switch off |
|
Ok @paraseba let me know what you think |
paraseba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @mpiannucci !! Only minor comments.
| let lock = self.0.read().await; | ||
| ( | ||
| lock.resolve_version(&version).await?, | ||
| Arc::clone(lock.asset_manager()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could do the Arc::clone once the lock is released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How? We need the lock to get the reference to the asset manager right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lock -> get the &Arc<AssetManager> -> unlock -> Clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, I didn't know you could keep a reference after unlocking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I didn't really think about that... I'm not sure now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah i have to clone from a scope where the lock is held. Which i handled by making an inner scope so it is released immediately after cloning the asset manager
|
Scoped down the locks where it seemed possible/appropriate. Hope it makes sense! |
* main: (29 commits) Release version 0.2.11 (#879) Release version v0.2.10 (#877) One more GC bugfix (#878) Remove cache entries during GC (#875) Add lookup_snapshot (#876) Add logging to GC and expiration (#874) Fix ref delete during ref expiration (#873) Bump the rust-dependencies group across 1 directory with 3 updates (#872) `expire_ref` can now edit snapshot pointed by refs (#870) Fix repo serialization with default commit metadata (#863) Fix bug in expiration that creates a commit loop (#869) Uncomment `delete_branch` in stateful repo ops test (#866) Add upstream dev CI (#862) Add optional default commit metadata to `Repository` (#860) Update GC docstrings (#858) Add expiration/GC notebook (#857) Small docs polish (#856) Release version 0.2.9 (#855) Add support for virtual chunks in GCS (#853) Cargo deny is smarter in new Rust version (#854) ...
No description provided.