-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for all capabilities existing in AutoExtract #14
Conversation
b83b42c
to
c8472ac
Compare
Co-authored-by: Mikhail Korobov <[email protected]>
Co-authored-by: Mikhail Korobov <[email protected]>
Co-authored-by: Mikhail Korobov <[email protected]>
@kmike I have rewritten the documentation. Now I'm proposing by default the recommended approach. You can see the new version here https://autoextract-poet.readthedocs.io/en/more_page_types/enrich.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to merge it in a current state, but it'd be good if someone else can check it, because I made some changes to @ivanprado's code.
Most importantly, I dropped most of the tutorials; they're good, but they might require discussion, so the idea is to get other things merged first.
Thank you @kmike for all these changes 👍 I'll re-review them and get the PR merged. |
autoextract_poet/items.py
Outdated
|
||
@classmethod | ||
def from_dict(cls, item: Optional[Dict]): | ||
# XXX: why is None not preserved for pagination? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, None
is preserved for pagination. Maybe the use of {}
in the get lead to confusion. I've updated the code to use the new procedure here: edd90b1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ivanprado! {}
is what got me confused indeed. In edd90b1 you updated it for article lists; would you mind fixing it here (for product lists), and for reviews as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done here: d997217
tests/test_items.py
Outdated
|
||
|
||
# XXX: should we make tests below to pick up types based on type annotations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make tests below to pick up types based on type annotations instead of hardcoding all attributes?
@kmike I've implemented automatic type checking here 30d499e but I wouldn't remove the tests you introduced because they are more strict (detect the absence of data when expected) and redundancy is positive in unit tests.
It is now more clear the list of available items or pages. Based on https://stackoverflow.com/a/62613202/3887420
Thank you @kmike for the review and the changes. Especially for finding the definitions that I missed :-) |
The current PR contains all the definitions required for all the current existing page types in Zyte AutoExtract API.
It also contains initial documentation for the project. It can be seen at https://autoextract-poet.readthedocs.io/en/more_page_types/index.html