-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move to adaptor backend #298
base: main
Are you sure you want to change the base?
Move to adaptor backend #298
Conversation
pins/tests/_databackend/LICENSE
Outdated
@@ -0,0 +1,21 @@ | |||
MIT License |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically complying with MIT requires distributing the license
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this probably needs to be moved out of tests
because I'm not sure if that gets included in the wheel build (I think it doesn't).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests directory does get included in the wheel build! I'm not entirely sure where this file should be, but it might be easier to not vendor in this package, if possible 😄
pins/_adaptors.py
Outdated
from abc import abstractmethod | ||
from typing import TYPE_CHECKING, Any, ClassVar, TypeAlias, overload | ||
|
||
from typing_extensions import Self |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
I would prefer to not vendor in all of |
43ebacc
to
549679a
Compare
Okay, we are able to use databackend==0.0.3 now, rather than vendoring it all in 🎉 |
Various other type improvements
90355be
to
fe6092f
Compare
Cool, I've rebased away the commit with the vendored code and added the new dependency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so sorry for the long wait 😓 -- this is looking really good! I think the 3 small things I thought could benefit from a little tweaking are..
- It seems like _Adaptor._d should not be typed as a class variable
_Adaptor.write_json()
is overriden in_DfAdaptor
in a surprising way (it returns a string), in order to work for data previews. I wonder if the_DFAdaptor
version could be renamed to.to_json()
to reflect its doing a different job.save_data()
is fed an object pulled from_Adaptor._d
but then re-creates the adaptor. Could we allowsave_data()
to take an adaptor, to prevent this round tripping?
Thanks again for all the work you put into this! If it's useful, I'm happy to pick this up and finish -- especially since so much time has passed, and I've picked up the context again / probably owe it to folks 😭
pins/_adaptors.py
Outdated
_DataFrame: TypeAlias = _PandasDataFrame | ||
|
||
|
||
class _AbstractPandasFrame(AbstractBackend): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the file _adaptors.py
starts with an underscore, it seems okay for the contents to not use an underscore (e.g. AbstractPandasFrame
).
(though also totally okay to punt this, since it's all internal; especially if other PRs are building on this one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree with that.
pins/_adaptors.py
Outdated
|
||
|
||
class _Adaptor: | ||
_d: ClassVar[Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the use of ClassVar
right here? It seems like _d
is not a class variable (it's set on the instance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah also agreed. I can't recall why I did that, it might have been some misguided thoughts about pyright behaviour.
pins/_adaptors.py
Outdated
self._d = data | ||
|
||
@overload | ||
def write_json(self, file: str) -> None: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior was modified to sometimes return a string, which seems to violate command-query separation (maybe to reflect the implementation in the pandas adaptor?). Can we revert so that the adaptor refactors, but does not extend the original behavior?
edit: see comment in _DFAdaptor.write_json()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, good idea.
pins/_adaptors.py
Outdated
def write_json(self, file: str) -> None: ... | ||
@overload | ||
def write_json(self, file: None) -> str: ... | ||
def write_json(self, file: str | None = None) -> str | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I think I understand -- this looks like an override of the original .write_json()
method, but its job seems different (it's used for the data preview). Its job looks more like the added .shape
or .columns
properties.
Can we do this?:
- rename
.write_json()
here to something else (maybe.to_json()
) - change the parent annotation for
write_json()
to always return None (not sometimes a str)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See what I've done in this commit:
pins/drivers.py
Outdated
@@ -135,6 +129,8 @@ def save_data( | |||
# as argument to board, and then type dispatchers for explicit cases | |||
# of saving / loading objects different ways. | |||
|
|||
adaptor = _create_adaptor(obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we allow obj
to be either _Adaptor | Any
(the typing is redundant, but maybe a helpful signal)? Then, if obj is not an adaptor, this line could create one.
I'm guessing keeping the original save_data
behavior is useful for testing, but it'd be nice not to have to roundtrip by calling it on save_data(_Adaptor._d, ...)
in this board method:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you've done that here:
I will:
- Write a test case for this
- Refactor to take advantage of this
I'll take a look at this later today - but I'm happy for you to pick it up if you prefer. Thank you for taking the time to revisit the project :) |
I can take a quick pass right now, since I've got the basic pins stuff in mind! |
Ah, we're both pushing to this PR -- I'll leave it to you, thanks for working on this! I noticed that a doctest test fails now, and seems to be related to a more recent release of fsspec. I'm guessing the ffspec github backend doesn't like the two If it's useful, we can always ignore that for now, and handle as a separate from this PR |
Sorry yeah unfortunately I think we both started working on it at the same time. Sounds good - I'm in favour of fixing it in another PR first so after I've addressed the issues here I'll work on that. |
To set us up for #263 and #254.
At @machow's suggestion here: #263 (comment)
I definitely agree this is definitely a nicer way of doing things.