Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix page breaks in reflowable Epub books #229

Open
wants to merge 5 commits into
base: develop
Choose a base branch
from

Conversation

olivierkorner
Copy link
Contributor

CSS properties page-break-inside/after/before with values always/auto/avoid are sometimes used in Epub books to force or prevent text to the next page. This is a print-only CSS property and as such it's rendered in the webview.

By adding a CSS preprocessor content filter in the SDK, all occurences of page-break-* in CSS stylesheets and style elements and attributes in HTML content are replaced by matching column-break-*, enforcing or preventing the page breaks accordingly in the reader.

readium/readium-shared-js#127

page-breaks-demo

@danielweck
Copy link
Member

That's a great use of ContentFilterChain, thanks @olivierkorner!

Do you think that the replacement of page-break-* with column-break-* will work in every conceivable native webviews? (right now: Android UIWebView and ChromiumView, iOS + OSX UIWebView and WKWebView, no Windows-specific MSEdgeIEView so far)

https://github.com/readium/readium-sdk/pull/229/files#diff-639376f74408a17718d6668606828a45R29

static REGEX_NS::regex CSSMatcher("page\\-break\\-(after|before|inside) *\\: *(always|avoid|left|right)", regexFlags);
...
static const std::string PageBreakReplacement = "-webkit-column-break-$1: $2; column-break-$&: $2";

@olivierkorner
Copy link
Contributor Author

@danielweck I haven't been able to find up-to-date information on the support of column breaks across webviews. It'll work with webkit-based webviews, but I can't say for sure for the rest.

@danielweck
Copy link
Member

Another comment: this renames page-break CSS statements to column-break in a indiscriminate manner, that is to say regardless of the actual rendering context within which the HTML document is rendered (e.g. scroll vs. column-paginated view modes).

Also, this only works with standalone stylesheet files, not inline CSS styles.

@olivierkorner
Copy link
Contributor Author

@danielweck It works with both standalone stylesheet files and inline CSS styles: it replaces page-break statements in CSS files and in <style> element and style attributes in HTML files.

You're right though, it does replace regardless of the context.
As far as I know, it's not possible to pass a context from the reader to the filter, is it?

I agree that having column breaks in a non-columnized view is not the most rigorous, but it's worth checking if it has any consequences on the rendering. I just tested on iOS: in scroll view mode, the column-break statements are just ignored.

I will do with more testing.

@danielweck
Copy link
Member

Thank you for the correction (stylesheet files AND inline CSS).
And yes, Content Filters handle raw data, they hhave no information about the document rendering context.

@olivierkorner
Copy link
Contributor Author

I tested on several platforms: works on iOS, OS X, Android and it would work with MSEdgeIEView as it supports webkit prefixed statements for -webkit-column-break-*.
I also checked if the substitution had any effect in scroll mode: on all these platforms, the column break statements are ignored.

@rkwright
Copy link
Member

Thanks for creating and testing this @olivierkorner

@danielweck I am OK with it, If you are OK with this, please go ahead and merge it.

@danielweck
Copy link
Member

Note: this PR incorrectly targets the master branch (should be develop).

// auto mediaType = item->MediaType();
// return (mediaType == "audio/mp4" || mediaType == "audio/mpeg" || mediaType == "video/mp4" || mediaType == "video/mpeg");
auto mediaType = item->MediaType();
return (mediaType == "application/xhtml+xml" || mediaType == "text/html" || mediaType == "text/css");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change intentional? To me it looks like an unintentional commit, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah thank you!
Yes, I'll fix it along with the target branch.

@danielweck
Copy link
Member

danielweck commented Jul 27, 2016

I added an inline code comment.

Also, for ePub3/ePub/initialization.cpp to compile with the new CSSPreprocessor C++ class in Android (or any target platform other than iOS/OSX readium-sdk/plaform/Apple/) we need to ensure that CSSPreprocessor.cpp is included in the build. If I am not mistaken, the Gradle build utilizes "wildcard" includes:
https://github.com/readium/readium-sdk/blob/develop/Platform/Android/epub3/Stable.mk#L231
$(wildcard $(EPUB3_PATH)/ePub/*.cpp)
and
https://github.com/readium/readium-sdk/blob/develop/Platform/Android/epub3/build_experimental.gradle#L8

android.sources {
        main {
            jni {
                source {
                    srcDirs = [
                            '../../../ePub3/ePub'

...which means Android should compile without any modifications in the build configuration. @olivierkorner is that your experience as well?


void CSSPreprocessor::Register()
{
FilterManager::Instance()->RegisterFilter("CSSPreprocessor", ValidationComplete, CSSFilterFactory);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ValidationComplete the most adequate Content Filter priority? I think so, but just double-checking (must be after decryption).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked: ValidationComplete is the lowest priority, the filter is applied last.

@danielweck
Copy link
Member

when merged, this will have to be documented here: https://github.com/readium/readium-shared-js/wiki/ContentModifications

@olivierkorner
Copy link
Contributor Author

@danielweck I compiled a clean SDKLauncher-Android: no need to modify the build configuration.
However, when I merged with the develop branch, I had a compile error in iOS and OS X. Maybe it should be fixed before I push my modifications.

@danielweck
Copy link
Member

I think the iOS / OSX compile error is because of one of my previous Android fixes. It's a simple code patch (misplaced preprocessor directive) in ObfuscatedFonts ContentFilter, related to OpenSSL.

@danielweck
Copy link
Member

Hello, can this PR be "updated" to target the develop branch?
Also, sync https://github.com/ArtBookMagazine/readium-sdk/tree/feature/page_breaks with the current tip of the develop branch, to see if everything compiles / runs fine.
Thanks!

@olivierkorner olivierkorner changed the base branch from master to develop August 25, 2016 09:09
@olivierkorner
Copy link
Contributor Author

@danielweck So much easier now that GitHub allows to change the base branch!

@danielweck
Copy link
Member

Yes, that's great! :)

@JayPanoz
Copy link

JayPanoz commented Mar 16, 2018

Sorry for being so late to the party, but I can give further details about that.

Intro

To put it simply, this has evolved significantly and is now part of the CSS Fragmentation Module. UA are even required to alias page-break-* properties to the new ones (break-*) for compatibility.

There’s an intent to implement from Chromium back to 2016.

I thought Webkit aliased as well but I’m apparently wrong.

Anyway, what’s important is that the spec doesn’t alias the super old -webkit-column-break-* properties, which is something I reported to the CSS multicol spec editors.

Which brings me to the iOS UIWebView setPagination API. Apparently, this has been an issue to Apple as well, because there is a paginationBreakingMode available, and it allows app implementers to choose between page-break-* and -webkit-column-break-*.

And, to quote Quirksmode:

This is an area of severe incompatibilities and immense amounts of keywords.

Which is consistent with my personal experience.

Bugs

Now to the fun parts…

Issue 1: at some point in time on iOS/MacOS, if a stylesheet had both, none would be taken into account – that’s a bug I reported myself ± 18 months ago.

Issue 2. fragmentation is not necessarily updated in some contexts (e.g. columns on html), and, on a reflow, for instance, Apple has to add <hr style="page-break-before: always"/> to make sure an element with page-break-inside: avoid works as expected (so they insert it before the element when the user changes the font-size, or change the size of the window on desktop for instance). I can confirm I had to force recalcs in ReadiumCSS because of nasty fragmentation update issues.

Issue 3. break-inside has super nasty bugs if the element is too long to fit into one column, e.g. vertical alignment can suddenly become completely off, and the element is laid out 40% from the top of the column. That might screw pagination as well, because you’ve got this extra 40% which is not visible in JS (offset, clientRect, etc.). Or lines of text being cut off.

Support

You won’t find those properties in CanIUse break feature, but you’ll find the mention to webkit-column-break-* support in CSS Multicol feature.

The biggest pain point is avoid is not supported for before and after, obviously.

But to sum things up, this probably should be fixed in browsers because bugs can arise pretty quickly, and from personal experience, actually supporting it can do more harm than good. I’ve been very very cautious when using those properties when authoring EPUB files because of the amount of rendering issues some can trigger.

[edit] There’s also a weird mention in the iBooks guidelines:

if you include page breaks to mark a chapter break, use page-break-after to create a break at the end of a chapter, not page-break-before to insert the break at the beginning of the chapter. This modification improves performance with the table of contents.

It won’t be unreasonable to think Webkit has issues with that, considering the CSS multicol implementation is quite terrible in all browsers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants