Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Capturing unneeded elements #42

Open
Anangaya opened this issue Nov 9, 2020 · 4 comments
Open

Capturing unneeded elements #42

Anangaya opened this issue Nov 9, 2020 · 4 comments
Assignees

Comments

@Anangaya
Copy link

Anangaya commented Nov 9, 2020

This plugin usually captures some unneeded elements for me so far. For example it works terrible with scribblehub. If there is even one comment in the comment section the main content is completely ignored and only the comments are captured. When that happens epub starts with the string "Error: Parse Error:".

Even when the main content is captured there are some unneeded elements capture both before and after the necessary content. It would be nice if we can specify which elements are going to be captured or not, preferably by using Xpath expression of the needed elements.

@alexadam
Copy link
Owner

I can't reproduce it. Please send the link that's causing problems

Screenshot 2020-11-12 at 17 26 49

@Anangaya
Copy link
Author

@Anangaya
Copy link
Author

Ok it seems the bug only exists in firefox extension. Chrome gave the epub like it's suppose to.

@Anangaya
Copy link
Author

Anangaya commented Nov 23, 2020

A method to actually select the html element to capture would be nice. It would be great in cases where the single html element spans muliple webpages in which case it's not possible to select all the text at once. Go to 24symbols.com and try a free book for example. The save page option can capture a chapter almost perfectly, save for an unwanted footer at the end of each chapter (which is still great because it's actually inside an iframe and the footers can be removed easily afterwards). But the save selection method fails spectacularly in this case (chapter spans multiple pages even though the entire chapter gets loaded in each page).

On a side note, I'm really grateful if you can answer this question. How is 24symbols preventing us from accessing the page source of the webpages of the books? (what it gives is completely a different page source)

Ok here is a webpage I saved from 24symbols (with SingleFile plugin),

Aftershock - A Stone Braide Chronicles Story by Bonnie S. Calhoun - Read book online (11_24_2020 12_25_01 PM).zip

The book was just something that used as guinea pig I still don't have any idea what's it about!
The page source can be viewed from this file. Which is not the case when I try it directly at the site.
The entire chapter is there in the page source but only a part of it's visible from the webpage thus it's impossible to select it all from Save Selection option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants