Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for all capabilities existing in AutoExtract #14

Merged
merged 27 commits into from
Aug 4, 2021

Conversation

ivanprado
Copy link
Contributor

@ivanprado ivanprado commented Jul 7, 2021

The current PR contains all the definitions required for all the current existing page types in Zyte AutoExtract API.

It also contains initial documentation for the project. It can be seen at https://autoextract-poet.readthedocs.io/en/more_page_types/index.html

CHANGELOG.rst Outdated Show resolved Hide resolved
LICENSE Outdated Show resolved Hide resolved
LICENSE Outdated Show resolved Hide resolved
docs/enrich.rst Outdated Show resolved Hide resolved
docs/enrich.rst Outdated Show resolved Hide resolved
docs/enrich.rst Outdated Show resolved Hide resolved
docs/enrich.rst Outdated Show resolved Hide resolved
docs/enrich.rst Outdated Show resolved Hide resolved
@ivanprado
Copy link
Contributor Author

@kmike I have rewritten the documentation. Now I'm proposing by default the recommended approach. You can see the new version here https://autoextract-poet.readthedocs.io/en/more_page_types/enrich.html

@ivanprado ivanprado requested a review from kmike July 14, 2021 11:30
@kmike kmike changed the base branch from unkonwn_atrribs_pass_through to master July 19, 2021 22:42
Copy link
Member

@kmike kmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to merge it in a current state, but it'd be good if someone else can check it, because I made some changes to @ivanprado's code.

Most importantly, I dropped most of the tutorials; they're good, but they might require discussion, so the idea is to get other things merged first.

@ivanprado
Copy link
Contributor Author

Thank you @kmike for all these changes 👍 I'll re-review them and get the PR merged.


@classmethod
def from_dict(cls, item: Optional[Dict]):
# XXX: why is None not preserved for pagination?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, None is preserved for pagination. Maybe the use of {} in the get lead to confusion. I've updated the code to use the new procedure here: edd90b1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivanprado! {} is what got me confused indeed. In edd90b1 you updated it for article lists; would you mind fixing it here (for product lists), and for reviews as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done here: d997217



# XXX: should we make tests below to pick up types based on type annotations,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make tests below to pick up types based on type annotations instead of hardcoding all attributes?

@kmike I've implemented automatic type checking here 30d499e but I wouldn't remove the tests you introduced because they are more strict (detect the absence of data when expected) and redundancy is positive in unit tests.

@ivanprado
Copy link
Contributor Author

I've improved the API documentation by using autosummary instead of automodule. The list of available items is now more clear:
image

I think the PR is mature enough. I'm going to merge it.

@ivanprado ivanprado merged commit 794d7e7 into master Aug 4, 2021
@ivanprado
Copy link
Contributor Author

Thank you @kmike for the review and the changes. Especially for finding the definitions that I missed :-)

@ivanprado ivanprado deleted the more_page_types branch August 4, 2021 10:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants