Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export in CDRv2 format #3

Merged
merged 3 commits into from
Mar 16, 2016
Merged

Export in CDRv2 format #3

merged 3 commits into from
Mar 16, 2016

Commits on Mar 16, 2016

  1. Export in CDRv2 format

    Also remove export of found forms, and do not save pages
    from other domains.
    lopuhin committed Mar 16, 2016
    Configuration menu
    Copy the full SHA
    06ce8f3 View commit details
    Browse the repository at this point in the history
  2. Extract text using string() xpath selector

    Following @kmike suggestion. This gives cleaner output with less
    extra newlines.
    lopuhin committed Mar 16, 2016
    Configuration menu
    Copy the full SHA
    cb37620 View commit details
    Browse the repository at this point in the history
  3. Always use CDR format, add extracted_metadata

    What was previously stored in PageItem and FormItem
    is now stored in extracted_metadata: is_page, depth, forms.
    lopuhin committed Mar 16, 2016
    Configuration menu
    Copy the full SHA
    5533d16 View commit details
    Browse the repository at this point in the history