Add a start landmark (to the PDF Profile) #88

mwbenowitz · 2022-07-06T22:01:17Z

This adds a simple PDF profile to cover Manifests that are comprised only of PDF files (a requirement that all resources have a media type of application/pdf).

This accounts for a structure being used in some projects that represents collections of PDF files as a single resource that can be read by users. This enables such things as representing a resource as a set of PDF files with one per section/chapter, while still allowing for a unified reading experience.

This does not make any alterations to the Manifest, it simply requires that conforming manifests meet the standards laid out in it. The profile also specifies that start parameters may be specified in link href strings to allow manifests to specify start pages to enable a feature to skip white space at the start of files.

This adds a simple PDF profile to cover Manifests that are comprised only of PDF files (media type of `application/pdf`). Beyond this the profile carries no other requirements at this time, though several possible extensions are possible for properties of link objects. These possibilities are: - `tagged` A boolean property that would indicate if a PDF is semantically tagged to denote its structure. This can have implications for navigation and accessibility within PDF clients/renderers. - `version` A controlled string property that would define the specific version of the PDF (e.g. `1.3`, `1.5`, `2.0`, ...) - `archival` A boolean property that would work in conjunction with `version` to indicate if a PDF is a [PDF-A](https://www.loc.gov/preservation/digital/formats/fdd/fdd000318.shtml) file.

This adds a new section that describes the link parametrs that can be provided to `href` strings in profiles that conform. At present this is solely the `start=n` parameter, but further parameters can be added.

mickael-menu · 2022-07-07T07:50:22Z

profiles/pdf.md

+
+| Key   | Semantics | Type     | Values    | 
+| ----- | --------- | -------- | --------- | 
+| [start](#start) | Specifies the initial page of the PDF to display when displaying this resource  | Integer  | 1 to (page count of current resource)  | 


Allowing fragments in the reading order might introduce a lot of complexities in the toolkits. We'll need to talk about this, would you be able to come to one of our weekly Zoom meetings (next one)?

That being said, you could use the fragment identifier page instead of start, as it is widely supported for PDF:

The list of PDF-open parameters and the action they imply is:

page=<pagenum> Open the specified (physical) page.

https://datatracker.ietf.org/doc/html/rfc3778#section-3

Agree with @mickael-menu, we want to avoid fragments in the reading order.

This use case isn't restricted to PDF either and would be better addressed in the main spec IMO:

if this matches a resource, specifying start as a rel in the readingOrder is enough

if this matches a fragment of a resource, then a new Link Object in links would be the better option

Thanks for the feedback, would be happy chat about this! I may not be able to make next weeks, but will try to attend one soon. I'm curious about the fragment discussion and what we can do to support features like this without introducing too much complexity.

@HadrienGardeur can you explain your second point regarding representing fragments as a new object in links? I'm not sure I follow how this addresses the functionality we're looking for.

Let's say that we have the following RWPM:

{ "metadata": { "title": "Publication containing multiple PDF files", "conformsTo": "http://readium.org/webpub-manifest/profiles/pdf" }, "readingOrder": [ { "href": "introduction.pdf", "type": "application/pdf" }, { "href": "chapter1.pdf", "type": "application/pdf" }, { "href": "chapter2.pdf", "type": "application/pdf" } ] }

If you want to only include a subset of each PDF in the full publication, that's not something that we support with RWPM.

But if the first time that you open that publication, you'd like the jump straight to the first chapter, this could be supported using start in readingOrder:

{ "metadata": { "title": "Publication containing multiple PDF files", "conformsTo": "http://readium.org/webpub-manifest/profiles/pdf" }, "readingOrder": [ { "href": "introduction.pdf", "type": "application/pdf" }, { "rel": "start", "href": "chapter1.pdf", "type": "application/pdf" }, { "href": "chapter2.pdf", "type": "application/pdf" } ] }

If you need to point to a fragment instead of a resource, this could be handled using links:

{ "metadata": { "title": "Publication containing multiple PDF files", "conformsTo": "http://readium.org/webpub-manifest/profiles/pdf" }, "links": [ "rel": "start", "href": "introduction.pdf?page=5" ], "readingOrder": [ { "href": "introduction.pdf", "type": "application/pdf" }, { "href": "chapter1.pdf", "type": "application/pdf" }, { "href": "chapter2.pdf", "type": "application/pdf" } ] }

mickael-menu · 2022-07-13T16:16:43Z

We talked about this on today's call. Nothing set in stone yet, but what came up:

It's generally a good idea, we need such profile.
A start relation to indicates the start of the publication would be very useful.
We don't want to support fragments inside the reading order, so in your example with blank PDF pages, this would need to be addressed at the authoring stage.
Having multiple PDF resources in the reading order is a useful use case. It might be tricky to implement in navigators but there are workaround, such as displaying an affordance at the end of the current PDF resource.
A pageCount link property would be welcome if we have multiple PDF resources, to be able to compute a positions list without opening all the resources.

mwbenowitz · 2022-07-14T18:18:34Z

Thank you for the feedback! I was unable to join the call, I actually don't think I have the meeting link, is there a way I could get that? As for the specifics here:

I think having start as a relation is a great idea, hadn't considered that but it would be useful for us as well.
What about supporting fragments in the Table of Contents? We don't author these PDFs and would like to support this functionality. I understand that the reading order has a specific meaning that might not be amenable to incorporating fragments. I'd like to find some way we could represent that as an attribute or property.
I think we've been able to handle the multiple PDF resource question on our end. If you'd be interested I can probably put you in touch with the devs who did that work
pageCount is something that I think we would like to see. This was intended to be a starting point, and features like these are definitely the direction I'd like to see this go.

mickael-menu · 2022-07-18T08:15:01Z

We paused the weekly calls for the summer but we'll be back end of August, I think. You can send a mail to [email protected] to request access to the Readium Slack workspace (mention this PR in the mail). The link and time is shared on the #general channel.

What about supporting fragments in the Table of Contents? We don't author these PDFs and would like to support this functionality.

Only the readingOrder and resources cannot have fragments, but you can have them in tableOfContents, links, etc. It would look like this:

{
  "href":"chapter1.pdf#page=32",
  "type": "application/pdf"
}

I think we've been able to handle the multiple PDF resource question on our end. If you'd be interested I can probably put you in touch with the devs who did that work

Sure, that would be very interesting, thanks. Which PDF engine(s) are you using?

kristojorg · 2022-12-30T12:11:31Z

Hey y'all, just checking in here on the progress of this. I have two pieces of input:

Having the start rel seems like a good idea and combined with a fragment in the links, should enable us to skip a first blank page, though it wouldn't allow us to do that for every resource in the reading order, only the first. As @mwbenowitz said, we don't author the PDFs and would like to find a way to describe a collection like this in the reading order. If you have any other suggestions there, we're all ears.
I am curious how fragments are specified in the RWPM? Is there a standard set of fragments that are allowed? It seems they should be defined in a profile either for an individual media type or for a RWPM as a whole. For example the t=3.2 fragment makes sense for audiobooks while the page=4 fragment works for PDFs, but neither work for EPUB. What are your thoughts here?

mickael-menu · 2023-01-02T17:37:52Z

If you have any other suggestions there, we're all ears.

RWPM doesn't support fragments in readingOrder. This would make some stuff really complicated, for example when computing the position list of an EPUB, or when navigating backwards.

To get the best compatibility, and since you have the info, you could process the PDFs by removing the blank pages before packaging them.

I am curious how fragments are specified in the RWPM?

Fragment identifiers are not directly specified in the Link object, but we have a convention of using them in some cases as #anchor in href.

However they are mentioned in the specification of the Locator object. As they are specific to each media type, t=3.2 wouldn't be valid for a PDF:

They're by nature media-specific and should always be understood in the context of the resource that the locator points to (by looking at href and type).

It also identifies which fragment identifier specs are recognized:

Specification Scope Examples

HTML HTML or XHTML id

Media Fragment URI 1.0 Audio, Video and Images t=67, xywh=160,120,320,240

PDF PDF page=12, viewrect=50,50,640,480

In practice though, it really depends on what is implemented in each Navigator. For example viewrect is currently not supported in the official PDF navigators.

llemeurfr · 2023-04-22T18:12:50Z

I created a simpler PR to get something released soon, #97.
We may update this profile if we agree to add content from @mwbenowitz proposal later.

llemeurfr · 2023-04-22T18:14:50Z

Note that I would be in favour of a generic start relationship, not only used in the PDF profile.

llemeurfr · 2023-05-01T15:18:41Z

Note also that this "start" is in fact a landmark. There is a mechanism defined in Web Publications for landmarks, based on the EPUB solution. It is like a TOC, and therefore can handle fragments.

mwbenowitz added 2 commits July 30, 2021 16:56

NOREF Add Link Parameter section

b2a5d64

This adds a new section that describes the link parametrs that can be provided to `href` strings in profiles that conform. At present this is solely the `start=n` parameter, but further parameters can be added.

mickael-menu reviewed Jul 7, 2022

View reviewed changes

llemeurfr mentioned this pull request Apr 22, 2023

Add a (simple) PDF profile #97

Merged

llemeurfr changed the title ~~Add PDF Profile~~ Add a start landmark (to the PDF Profile) May 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a start landmark (to the PDF Profile) #88

Add a start landmark (to the PDF Profile) #88

mwbenowitz commented Jul 6, 2022

mickael-menu Jul 7, 2022

HadrienGardeur Jul 7, 2022

mwbenowitz Jul 7, 2022

HadrienGardeur Jul 11, 2022

mickael-menu commented Jul 13, 2022

mwbenowitz commented Jul 14, 2022

mickael-menu commented Jul 18, 2022 •

edited

Loading

kristojorg commented Dec 30, 2022

mickael-menu commented Jan 2, 2023

llemeurfr commented Apr 22, 2023

llemeurfr commented Apr 22, 2023

llemeurfr commented May 1, 2023

Add a start landmark (to the PDF Profile) #88

Are you sure you want to change the base?

Add a start landmark (to the PDF Profile) #88

Conversation

mwbenowitz commented Jul 6, 2022

mickael-menu Jul 7, 2022

Choose a reason for hiding this comment

HadrienGardeur Jul 7, 2022

Choose a reason for hiding this comment

mwbenowitz Jul 7, 2022

Choose a reason for hiding this comment

HadrienGardeur Jul 11, 2022

Choose a reason for hiding this comment

mickael-menu commented Jul 13, 2022

mwbenowitz commented Jul 14, 2022

mickael-menu commented Jul 18, 2022 • edited Loading

kristojorg commented Dec 30, 2022

mickael-menu commented Jan 2, 2023

llemeurfr commented Apr 22, 2023

llemeurfr commented Apr 22, 2023

llemeurfr commented May 1, 2023

mickael-menu commented Jul 18, 2022 •

edited

Loading