Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to guess page layout (try to preserve some of the formatting) #11

Merged
merged 54 commits into from
Sep 25, 2018

Commits on Aug 24, 2018

  1. Configuration menu
    Copy the full SHA
    0ae6d24 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    566dc9b View commit details
    Browse the repository at this point in the history

Commits on Aug 27, 2018

  1. Configuration menu
    Copy the full SHA
    587e9a7 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    6c9d27e View commit details
    Browse the repository at this point in the history

Commits on Aug 28, 2018

  1. Configuration menu
    Copy the full SHA
    c22f3fa View commit details
    Browse the repository at this point in the history
  2. add tests guess_page_layout

    Kebniss committed Aug 28, 2018
    Configuration menu
    Copy the full SHA
    8a78fc5 View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2018

  1. remove old test

    Kebniss committed Aug 29, 2018
    Configuration menu
    Copy the full SHA
    a783e31 View commit details
    Browse the repository at this point in the history

Commits on Aug 30, 2018

  1. Configuration menu
    Copy the full SHA
    cb8dc1c View commit details
    Browse the repository at this point in the history
  2. fix tests

    Kebniss committed Aug 30, 2018
    Configuration menu
    Copy the full SHA
    fb599bc View commit details
    Browse the repository at this point in the history
  3. fixed tests

    Kebniss committed Aug 30, 2018
    Configuration menu
    Copy the full SHA
    90e37b7 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    ae26d29 View commit details
    Browse the repository at this point in the history

Commits on Aug 31, 2018

  1. Configuration menu
    Copy the full SHA
    bb33d4b View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2018

  1. Configuration menu
    Copy the full SHA
    3069a73 View commit details
    Browse the repository at this point in the history
  2. add test

    Kebniss committed Sep 6, 2018
    Configuration menu
    Copy the full SHA
    dd03201 View commit details
    Browse the repository at this point in the history

Commits on Sep 7, 2018

  1. fix indentation

    Kebniss committed Sep 7, 2018
    Configuration menu
    Copy the full SHA
    0f2fb2b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    e8da507 View commit details
    Browse the repository at this point in the history

Commits on Sep 8, 2018

  1. Configuration menu
    Copy the full SHA
    0b9d139 View commit details
    Browse the repository at this point in the history
  2. add new tags to handle

    Kebniss committed Sep 8, 2018
    Configuration menu
    Copy the full SHA
    ba7cdc0 View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2018

  1. handle more tags

    Kebniss committed Sep 10, 2018
    Configuration menu
    Copy the full SHA
    952d895 View commit details
    Browse the repository at this point in the history

Commits on Sep 11, 2018

  1. Configuration menu
    Copy the full SHA
    9dafbf0 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    b3229d6 View commit details
    Browse the repository at this point in the history
  3. remove newline

    Kebniss committed Sep 11, 2018
    Configuration menu
    Copy the full SHA
    695b458 View commit details
    Browse the repository at this point in the history
  4. add test html without text

    Kebniss committed Sep 11, 2018
    Configuration menu
    Copy the full SHA
    03259b9 View commit details
    Browse the repository at this point in the history
  5. fix newline + space bug

    Kebniss committed Sep 11, 2018
    Configuration menu
    Copy the full SHA
    cba531f View commit details
    Browse the repository at this point in the history
  6. add bad punct test

    Kebniss committed Sep 11, 2018
    Configuration menu
    Copy the full SHA
    9811349 View commit details
    Browse the repository at this point in the history
  7. add newline

    Kebniss committed Sep 11, 2018
    Configuration menu
    Copy the full SHA
    d47138c View commit details
    Browse the repository at this point in the history
  8. add tests on real webpages

    Kebniss committed Sep 11, 2018
    Configuration menu
    Copy the full SHA
    76f9028 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    05c7702 View commit details
    Browse the repository at this point in the history
  10. remove pathlib import

    Kebniss committed Sep 11, 2018
    Configuration menu
    Copy the full SHA
    4505e24 View commit details
    Browse the repository at this point in the history
  11. fix test

    Kebniss committed Sep 11, 2018
    Configuration menu
    Copy the full SHA
    a27e4c8 View commit details
    Browse the repository at this point in the history

Commits on Sep 12, 2018

  1. remove space

    Kebniss committed Sep 12, 2018
    Configuration menu
    Copy the full SHA
    b926c8c View commit details
    Browse the repository at this point in the history

Commits on Sep 19, 2018

  1. handle list of selectors

    Kebniss committed Sep 19, 2018
    Configuration menu
    Copy the full SHA
    73f49ad View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    15d22e0 View commit details
    Browse the repository at this point in the history

Commits on Sep 20, 2018

  1. Configuration menu
    Copy the full SHA
    8f68b2c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cf02b94 View commit details
    Browse the repository at this point in the history
  3. update readme

    Kebniss committed Sep 20, 2018
    Configuration menu
    Copy the full SHA
    7aec8d2 View commit details
    Browse the repository at this point in the history
  4. update history

    Kebniss committed Sep 20, 2018
    Configuration menu
    Copy the full SHA
    7653bf9 View commit details
    Browse the repository at this point in the history
  5. update readme

    Kebniss committed Sep 20, 2018
    Configuration menu
    Copy the full SHA
    4300fe6 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    4772061 View commit details
    Browse the repository at this point in the history
  7. change documentation

    Kebniss committed Sep 20, 2018
    Configuration menu
    Copy the full SHA
    05b979a View commit details
    Browse the repository at this point in the history

Commits on Sep 21, 2018

  1. DOC cleanup README

    kmike committed Sep 21, 2018
    Configuration menu
    Copy the full SHA
    ad95bff View commit details
    Browse the repository at this point in the history
  2. DOC cleanup function docstring

    kmike committed Sep 21, 2018
    Configuration menu
    Copy the full SHA
    59d2d54 View commit details
    Browse the repository at this point in the history
  3. revert formatting change

    kmike committed Sep 21, 2018
    Configuration menu
    Copy the full SHA
    51947d4 View commit details
    Browse the repository at this point in the history
  4. minor cleanup

    kmike committed Sep 21, 2018
    Configuration menu
    Copy the full SHA
    1370647 View commit details
    Browse the repository at this point in the history

Commits on Sep 24, 2018

  1. add pytest files to gitignore

    kmike committed Sep 24, 2018
    Configuration menu
    Copy the full SHA
    ab3f776 View commit details
    Browse the repository at this point in the history
  2. refactor _html_to_text function for readability:

    * prev is always a string now, never a 1-element list
    * unify newline and text handling between text and tail
    * another workaround for mutable variable in the outer scope (Context class)
    * append to a list instead of using a generator
    kmike committed Sep 24, 2018
    Configuration menu
    Copy the full SHA
    22a7fa1 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    e161e92 View commit details
    Browse the repository at this point in the history
  4. TST mark test as xfail, change desired output

    guess_punct_space doesn't provide good output in this case => xfail
    kmike committed Sep 24, 2018
    Configuration menu
    Copy the full SHA
    8b466f8 View commit details
    Browse the repository at this point in the history

Commits on Sep 25, 2018

  1. Configuration menu
    Copy the full SHA
    729e11a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    2973ee0 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    607b04a View commit details
    Browse the repository at this point in the history
  4. typo fix in comment

    kmike committed Sep 25, 2018
    Configuration menu
    Copy the full SHA
    13394ba View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    7a1b57b View commit details
    Browse the repository at this point in the history
  6. remove PY3-only assert

    not worths it to use six.string_types
    kmike committed Sep 25, 2018
    Configuration menu
    Copy the full SHA
    732c87d View commit details
    Browse the repository at this point in the history