Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade PDFJS #203

Closed
wants to merge 8 commits into from
Closed

Conversation

carlinmack
Copy link
Contributor

@carlinmack carlinmack commented May 8, 2024

❤️ Thank you for your contribution!

Close #170
Close https://github.com/zenodo/ops/issues/438
Close https://github.com/zenodo/ops/issues/278

Description

Please describe briefly your pull request.

Upgrade PDF JS from 1.x to 3.x. This corrects the issue with displaying characters incorrectly and self printing PDFs

This is a large PR so here is how the files are structured:

  • the JavaScript now lives in /static/js/pdfjs/ and is a subset of the the files from https://www.npmjs.com/package/pdfjs-dist/v/3.11.174
  • the CSS lives in assets/semantic-ui/css/viewer.css
  • the HTML lives in templates/semantic-ui/invenio_previewer/pdfjs.html
  • images have been moved from:
    • /assets/semantic-ui/js/invenio_previewer/pdfjs/web/images
    • /static/js/pdfjs/web/images/
      to:
    • invenio-previewer/invenio_previewer/assets/semantic-ui/css/invenio_previewer/pdfjs/images
  • locale data has been moved from:
    • /assets/semantic-ui/js/invenio_previewer/pdfjs/web/locale
    • /static/js/pdfjs/web/locale/
      to:
    • /static/js/pdfjs/
  • cmaps/ and standard_fonts/ have been added to static/js/pdfjs

The changes I have made are:

  • the HTML needs to be changed by us to hide buttons, add translation strings etc
  • same with the CSS for retheming
  • I have changed the JavaScript minorly and added CHANGED comments where I have made changes (for some reason the PDF url is hardcoded by default)

The files to be reviewed are:

image

Before and after (demonstrating that the character rendering issues have been fixed)

image

To do:

  • They recommend re-theming, so we should make it blue like before
  • Disable self printing PDFs
    • seems to be disabled by default
  • Remove unwanted functionality like editing
  • Add fullscreen functionality
  • Translation strings need to be re-added
  • Remove JS and script calls from pdfjs.html

Shelve:

  • Would be a lot better if this used code from the npm module directly
    • I have tried but getting the same problems as before
  • Try upgrade to v4
    • I haven't tried but I think it's not worth delaying the PR over
  • Writing documentation or notes about how this can be upgraded
    • If someone can suggest where, I will write up the situation

Checklist

Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:

Frontend

Reminder

By using GitHub, you have already agreed to the GitHub’s Terms of Service including that:

  1. You license your contribution under the same terms as the current repository’s license.
  2. You agree that you have the right to license your contribution under the current repository’s license.

@carlinmack carlinmack marked this pull request as ready for review May 10, 2024 14:15
@carlinmack carlinmack changed the title init: upgrade pdfjs to 3.11.174 Upgrade PDFJS May 10, 2024
@ntarocco
Copy link
Contributor

ntarocco commented May 16, 2024

Thanks a lot for the work, this is not an easy one :)

About your comments:

Would be a lot better if this used code from the npm module directly: I have tried but getting the same problems as before

I think we should really explore this option, to avoid adding all the built file in the repo. I guess you have already tried it, but what was the issue when doing the following?

  1. in the webpack.py, include all the JS/CSS that you need, example:
"pdfjs_worker_min_js": "./node_modules/pdfjs-dist/build/pdf.worker.min.mjs",
"pdfjs_min_js": "./node_modules/pdfjs-dist/build/pdf.min.mjs",
  1. include them when rendering:
def preview(file):
    ...
    return render_template(
        "invenio_previewer/pdfjs.html",
        file=file,
        js_bundles=current_previewer.js_bundles + ["pdfjs_min_js.js", "pdfjs_worker_min_js.js"],
        css_bundles=...,
    )

This should expose a pdfjsLib JS object globally, in window.
3. In your JS, use it:

const document = pdfjsLib.getDocument(pdfUrl);
...

This should avoid building PDF.js with webpack.

And about:

Try upgrade to v4: I haven't tried but I think it's not worth delaying the PR over

Why did you start with v3, and not directly with v4?

@ntarocco
Copy link
Contributor

Recap of our discussion on how to proceed. The main point is that we would like to remove all hardcoded static assets from the GH repo for the various obvious reasons.
To achieve that, we should explore adding a new hook in the Webpack config that allows an Invenio module to define a set of files/dirs that will be copied from the assets folder to the static folder when building. This has been already done with TinyMCE here, but it is hardcoded.

In invenio-assets, we can inject Python configuration via the config.js file, which is generated on the file just before invoking the webpack build here.

This is an example of a config.json:

{
  "aliases": {
    ...
    "@scss/invenio_previewer": "scss/invenio_previewer"
  },
  "build": {
    "assetsPath": "/Users/nico/.pyenv/versions/3.9.16/envs/cdsvideos39/var/instance/static/dist",
    "assetsURL": "/static/dist/",
    "context": "/Users/nico/.pyenv/versions/3.9.16/envs/cdsvideos39/var/instance/assets",
    "debug": false,
    "staticPath": "/Users/nico/.pyenv/versions/3.9.16/envs/cdsvideos39/var/instance/static",
    "staticURL": "/static/"
  },
  "entry": {
    "adminlte": "./js/invenio_theme/admin.js",
    "base": "./js/invenio_theme/base.js",    
    ....
    "theme": "./scss/invenio_theme/theme.scss",
    "theme-admin": "./scss/invenio_theme/admin.scss"
  }
}

You can see an example how the various config are used in the Webpack config, for example here.
If we need to add a new section, we will probably have to enhance the WebpackBundle class, and the various subclasses, and add the new params.
The params are injected in the webpack.py in each module, for example here.

What we want to achieve is to make the CopyWebpackPlugin parametrizable via the config in the webpack.py, probably introducing a new dict object similarly to:

...
    copy: [{
       from: <path>,
       to: <path>
    }]

(naming and final object structure maybe different)
And then that new config is injected and used in the Webpack config. Take into account that multiple modules might declare a copy section, so they should be appended to have a final global list.
Note: the <path should be relative to <virtualenv>/var/instance. The user should not be allowed to copy from/to other locations.

Possible tasks:

  1. Implement the new hook and test it
  2. Move the hardcoded TinyMCE stuff probably to invenio-theme (or where it makes sense)
  3. Add the copy of PDF.js assets

@max-moser
Copy link
Contributor

max-moser commented Sep 24, 2024

I've started an implementation of copy.{to,from} instructions based on (my understanding of) the comment by @ntarocco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bump pdfjs-dist to >3
3 participants