New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Support for all capabilities existing in AutoExtract #14

Merged

ivanprado merged 27 commits into master from more_page_types

Aug 4, 2021

Contributor

ivanprado commented Jul 7, 2021 •

edited

Loading

The current PR contains all the definitions required for all the current existing page types in Zyte AutoExtract API.

It also contains initial documentation for the project. It can be seen at https://autoextract-poet.readthedocs.io/en/more_page_types/index.html

ivanprado requested review from kmike and sortafreel

July 7, 2021 11:31

ivanprado added 4 commits

July 12, 2021 13:04


          Support for all the existing capabilities

b455cc0


          Control what is exported in the modules using @export annotation

a0a4541


          Initial documentation with Sphinx

6145d07


          Show bases in the doc

c8472ac

ivanprado force-pushed the more_page_types branch from b83b42c to c8472ac Compare

July 12, 2021 12:06

kmike reviewed

View reviewed changes

CHANGELOG.rst Outdated Show resolved Hide resolved

kmike reviewed

View reviewed changes

LICENSE Outdated Show resolved Hide resolved

kmike reviewed

View reviewed changes

LICENSE Outdated Show resolved Hide resolved

kmike reviewed

View reviewed changes

docs/enrich.rst Outdated Show resolved Hide resolved

kmike reviewed

View reviewed changes

docs/enrich.rst Outdated Show resolved Hide resolved

kmike reviewed

View reviewed changes

docs/enrich.rst Outdated Show resolved Hide resolved

kmike reviewed

View reviewed changes

docs/enrich.rst Outdated Show resolved Hide resolved

kmike reviewed

View reviewed changes

docs/enrich.rst Outdated Show resolved Hide resolved

ivanprado and others added 10 commits

July 14, 2021 10:10


          Update LICENSE

d2708bb

Co-authored-by: Mikhail Korobov <[email protected]>


          Update LICENSE

51ec769

Co-authored-by: Mikhail Korobov <[email protected]>


          Update docs/enrich.rst

e0abefa

Co-authored-by: Mikhail Korobov <[email protected]>


          enrich documentation by extending the items

1e32e60


          Fixes in the doc

e2ce730


          Fixing CHANGELOG.rst

09705af


          ToScrapeProductPage introduction

57c6ab4


          Fix doc

1371b8c


          More doc fixes

2e45389


          Update author

28d787f

Contributor Author

ivanprado commented Jul 14, 2021

@kmike I have rewritten the documentation. Now I'm proposing by default the recommended approach. You can see the new version here https://autoextract-poet.readthedocs.io/en/more_page_types/enrich.html

ivanprado requested a review from kmike

July 14, 2021 11:30

kmike added 3 commits

July 20, 2021 01:25


          add docs build folder to gitignore

f186619


          add "docs" to tox envlist

361bc7e


          Remove large chunks of documentation, to unblock merging.

d5a0740

Let's handle it separately.


          add missing type conversions

1424a32

kmike changed the base branch from unkonwn_atrribs_pass_through to master

July 19, 2021 22:42


          simplify from_dict methods

262c8b2

kmike reviewed

View reviewed changes

Member

kmike left a comment

I'm happy to merge it in a current state, but it'd be good if someone else can check it, because I made some changes to @ivanprado's code.

Most importantly, I dropped most of the tutorials; they're good, but they might require discussion, so the idea is to get other things merged first.

kmike approved these changes

View reviewed changes

BurnzZ mentioned this pull request

add support for Article List Extraction #15

Closed

Contributor Author

ivanprado commented Aug 3, 2021

Thank you @kmike for all these changes 👍 I'll re-review them and get the PR merged.


          Using _apply_types also in ArticleList

edd90b1

ivanprado commented

View reviewed changes

autoextract_poet/items.py Outdated

+                  @classmethod
+                  def from_dict(cls, item: Optional[Dict]):
+                      # XXX: why is None not preserved for pagination?

Contributor Author

ivanprado Aug 3, 2021

Actually, None is preserved for pagination. Maybe the use of {} in the get lead to confusion. I've updated the code to use the new procedure here: edd90b1

Member

kmike Aug 3, 2021

Thanks @ivanprado! {} is what got me confused indeed. In edd90b1 you updated it for article lists; would you mind fixing it here (for product lists), and for reviews as well?

Contributor Author

ivanprado Aug 4, 2021

Sure, done here: d997217

ivanprado added 2 commits

August 4, 2021 10:13


          Automatic check for typing compliance in item tests

30d499e


          get_args alternative for python 3.6 and 3.7

e554e12

ivanprado commented

View reviewed changes

tests/test_items.py Outdated



		# XXX: should we make tests below to pick up types based on type annotations,

Contributor Author

ivanprado Aug 4, 2021

should we make tests below to pick up types based on type annotations instead of hardcoding all attributes?

@kmike I've implemented automatic type checking here 30d499e but I wouldn't remove the tests you introduced because they are more strict (detect the absence of data when expected) and redundancy is positive in unit tests.

ivanprado added 5 commits

August 4, 2021 10:44


          Extending the use of _apply_types to all items

d997217


          Remove outdated comment

65e3c6c


          Minor doc amend

f261d2e


          Changelog updated

a3343d1


          Using autosummary for the documentation

fcf424f

It is now more clear the list of available items or pages.

Based on https://stackoverflow.com/a/62613202/3887420

Contributor Author

ivanprado commented Aug 4, 2021

I've improved the API documentation by using autosummary instead of automodule. The list of available items is now more clear:

I think the PR is mature enough. I'm going to merge it.

ivanprado merged commit 794d7e7 into master

Contributor Author

ivanprado commented Aug 4, 2021

Thank you @kmike for the review and the changes. Especially for finding the definitions that I missed :-)

ivanprado deleted the more_page_types branch

August 4, 2021 10:47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet