Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix unwanted joins for inline tags #2

Merged
merged 6 commits into from
May 29, 2017
Merged

Fix unwanted joins for inline tags #2

merged 6 commits into from
May 29, 2017

Commits on May 26, 2017

  1. Add whitespace even for inline tags

    Thanks @codinguncut for suggestion. Still needs testing.
    re.sub is replicating xpath's normalize-space behaviour.
    See GH-1
    lopuhin committed May 26, 2017
    Configuration menu
    Copy the full SHA
    6135ba6 View commit details
    Browse the repository at this point in the history
  2. Cache regexp

    python 2 does not cache re.sub regexps,
    and it's faster even on python 3
    lopuhin committed May 26, 2017
    Configuration menu
    Copy the full SHA
    43f1bd4 View commit details
    Browse the repository at this point in the history

Commits on May 29, 2017

  1. guess_punct_space: remove whitespace before punct

    This is similar to webstruct.utils.smart_joins
    (https://github.com/scrapinghub/webstruct/blob/5a3f39e2ec78a04ca021a12dff58f66686d86251/webstruct/utils.py#L61),
    but is applied only on the tag boundaries.
    This mode is just a little bit slower than default.
    lopuhin committed May 29, 2017
    Configuration menu
    Copy the full SHA
    f020f4b View commit details
    Browse the repository at this point in the history
  2. Slightly faster and cleaner default path

    It's fine to apply whitespace cleaning regexp at the end
    lopuhin committed May 29, 2017
    Configuration menu
    Copy the full SHA
    73bf2ac View commit details
    Browse the repository at this point in the history
  3. Cache method lookup, more readable loop conditions

    Thanks for the idea @kmike!
    lopuhin committed May 29, 2017
    Configuration menu
    Copy the full SHA
    e9cf9b8 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    1fb2ec4 View commit details
    Browse the repository at this point in the history