-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Abstractify the Path Factory #55
Conversation
@isaak-willett great job! Let me get a few minor changes on this PR before we can approve it. Just FYI, this will also be necessary to support the new AWS S3 FUSE driver. |
wicker/core/storage.py
Outdated
""" | ||
s3_config = get_config().aws_s3_config | ||
store_concatenated_bytes_files_in_dataset = s3_config.store_concatenated_bytes_files_in_dataset | ||
if s3_root_path is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit for: s3_root_path = s3_root_path or s3_config.s3_datasets_path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Thanks, Isaak! I added one small nit, but it is ok to skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR @isaak-willett! 🙇🏻♂️
From my side:
- Just a few nits on the docstrings, that you can hopefully just quickly apply. 😌
- Would
test_datasets::TestS3Dataset
cover enough to catch potential regressions caused by this refactoring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @isaak-willett , code LGTM in general other than few minor comments. Mostly requesting change for the test artifacts (please feel free to share on doc)
wicker/core/storage.py
Outdated
|
||
def __eq__(self, other: Any) -> bool: | ||
return super().__eq__(other) and type(self) == type(other) and self.root_path == other.root_path | ||
|
||
def get_dataset_assets_path(self, dataset_id: DatasetID, s3_prefix: bool = True) -> str: | ||
def _get_dataset_assets_path(self, dataset_id: DatasetID, prefix: Optional[str] = None) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming prefix
here suggests that it is about to be added rather remove, should we rename to something like prefix_to_drop
? Similar comment applies elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about I do it in a later PR? The naming is not good on the s3_prefix param either and I want to rename them all at once in a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have just introduced the prefix
name in this PR, and it is minimal effort to change that also in this PR, I suggest to tackle it here and not introduce a new debt.
The s3_prefix
is external to this repo and I agree that we might need to do more work to change it elsewhere and it is better to be a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isaak-willett I also got a little bit confused, maybe we should call it prefix_remove
or prefix_trim
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to prefix_to_trim
wicker/core/storage.py
Outdated
|
||
Args:. | ||
root_path (str): File system loc of the root of the wicker file structure. | ||
store_concatenated_bytes_files_in_dataset (bool, optional): Whether to assume concat bytes files are stored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, do we mean "Whether to assume concatenated bytes files are stored in dataset folder"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Co-authored-by: Marc Carré <[email protected]>
Co-authored-by: Marc Carré <[email protected]>
Co-authored-by: Marc Carré <[email protected]>
Co-authored-by: Marc Carré <[email protected]>
Co-authored-by: Marc Carré <[email protected]>
Co-authored-by: Marc Carré <[email protected]>
Co-authored-by: Marc Carré <[email protected]>
Co-authored-by: Marc Carré <[email protected]>
Co-authored-by: Marc Carré <[email protected]>
Co-authored-by: Marc Carré <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @isaak-willett , Let's address the comment on prefix
naming for the newly introduced function _get_dataset_assets_path
before merging.
wicker/core/storage.py
Outdated
|
||
def __eq__(self, other: Any) -> bool: | ||
return super().__eq__(other) and type(self) == type(other) and self.root_path == other.root_path | ||
|
||
def get_dataset_assets_path(self, dataset_id: DatasetID, s3_prefix: bool = True) -> str: | ||
def _get_dataset_assets_path(self, dataset_id: DatasetID, prefix: Optional[str] = None) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have just introduced the prefix
name in this PR, and it is minimal effort to change that also in this PR, I suggest to tackle it here and not introduce a new debt.
The s3_prefix
is external to this repo and I agree that we might need to do more work to change it elsewhere and it is better to be a separate PR.
Overview
Abstractifies the file pathing class in Wicker. Wicker has one common pathing structure and it doesn't constrain itself to S3. This abstraction allows for adding different file pathers on top of this if needed while keeping the consistent structure.
Motivation
This set of PRs, #56, #57 aim to fix two problems/incorrect assumptions. Below is the context on the problems bulleted:
S3Dataset
andS3PathFactory
which can be seen as implementations of a base class but lack the base class. These are the only writing and access methods and there is no common class for interfacing.What this implements
S3PathFactory
into an implementation class for S3 and a base classWickerPathFactory
. This new path defines the path structure generically while theS3PathFactory
only appends on the relevant S3 pieces. This gives us a common class to build off where we can keep the access pattern for paths identical across both for easy swapping of user code.Testing
Compatibility
This PR is entirely backward compatible because it does not change any access patterns. The function signatures, names, and outputs are 1-1 so there is no requirement to change anything for user code if the upgraded is undertaken.