Feature request: Pandoc integration #2491

maggie44 · 2021-01-15T23:19:26Z

I was thinking Pandoc integration as an optional module. It would add some efficiencies to the various exports by keeping the assets seperate as discussed above (and potentially resolve some other outstanding issues), but also provide a bunch of additional options, such as EPUB (#1949), Word doc, video export support (#883; #2412) and a bunch more.

Here are a few shortcuts to try it out:

Here is Pandoc: https://pandoc.org
In most repositories so apt-get install pandoc or brew install pandoc should do the trick (if installing in a docker container, may need to install build-essential and/or curl).
An example Markdown I have tested with:

test.md

# Test file
Test MD File.

[![Build Status](https://cdn.vox-cdn.com/thumbor/zEZJzZFEXm23z-Iw9ESls2jYFYA=/89x0:1511x800/1600x900/cdn.vox-cdn.com/uploads/chorus_image/image/55717463/google_ai_photography_street_view_2.0.jpg)](https://travis-ci.org/joemccann/dillinger)
Dillinger is a cloud-enabled, mobile-ready, offline-storage, AngularJS powered HTML5 Markdown editor.

  - Type some Markdown
  - Convert some Markdown

![](https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4)

# New Features!

  - sdfsdf
  - sdfsdvldkvnc
 
You can also:
  - send

Execute the command:

pandoc test.md -o example2.html --extract-media ./assets

More info relating to this originally discussed in: #2412

The text was updated successfully, but these errors were encountered:

maggie44 · 2021-01-15T23:42:22Z

@ssddanbrown in response to the last comment over in #2412, indeed, these ostensibly simple things often get more complex very quickly.

In terms of workflow, after giving it some thought perhaps a similar integration as WKHTMLTOPDF. The user installs Pandoc manually, using the Pandoc docs for their environment (apt-get Pandoc for example in Ubuntu). Then adds in a PANDOC=True variable to the .env file so that BookStack doesn't have any responsibility for the Pandoc install.

When PANDOC=True there could be some new fields in the export dropdown menu: EPUB; HTML Archive (or something more logically named instead of HTML Archive.

Hopefully then passing the same content being pulled for the current export features to Pandoc on the system locally, followed by a return of the output to download.

By using the same method as WKHTMLTOPDF, it doesn't make as mission critical to maintain and allows for some dev experimentation. Similarly, only using EPUB and HTML Archive rather than replacing the current PDF and html export processes, as certainly not confident enough in it to recommend that off the bat.

I realise a lot of this is preaching to the choir, but seems you have plenty of tickets and things on your plate, so figure the more thought/detail given to a feature request and the use case considered before making the request the better.

Big thanks for the work on this, it is going to become quite a central part of our EdTech COVID response work.

maggie44 · 2021-01-24T17:19:07Z

After further thought, how about simplifying this down to allowing the original markdown that bookstack uses to be exported? When included in the api this would allow us to utilise third party processing of exported data (like pandoc) without the extra support burden.

ssddanbrown · 2021-01-24T19:57:01Z

Hi @Maggie0002 ,
If you're using the Markdown editor to edit pages, The pages API should already provide the stored markdown content (pages.show endpoint).

maggie44 · 2021-01-24T22:58:31Z

Hi @Maggie0002 ,
If you're using the Markdown editor to edit pages, The pages API should already provide the stored markdown content (pages.show endpoint).

Whoops, sorry, thought it defaulted to Markdown. I meant an API point to export the WYSIWYG content as is, rather than converting first to HTML or PDF. I don't see that in the API docs.

ssddanbrown · 2021-01-25T22:17:57Z

That (pages => read) endpoint should give you the HTML that's used when viewing a page. This is pretty much the same as the HTML loaded in the WYSIWYG editor but with a pass to remove some potentially dangerous elements.

maggie44 · 2021-01-26T00:53:28Z

Helpful, and interesting, thanks. My understanding then is the difference is just that the export -> html function takes that same html seen in the pages -> read endpoint, passes it to a processor that converts pictures etc into an embedded html file. But without headers, which presumably is what the html processor takes care of (among other things).

Will experiment with that endpoint and report back anything useful.

maggie44 · 2021-01-26T02:24:11Z

Helpful, and interesting, thanks. My understanding then is the difference is just that the export -> html function takes that same html seen in the pages -> read endpoint, passes it to a processor that converts pictures etc into an embedded html file. But without headers, which presumably is what the html processor takes care of (among other things).

Will experiment with that endpoint and report back anything useful.

Didn't get very far. Turns out the HTML the API pipes out is missing headings, css, all the formatting, would be a lot of work to go from there to something usable.

Is there a way to access the HTML used by the exporter but with the original HREF to the images and/or video rather than the embedded images? It would be a fairly simple (in theory) mirror of that page to then get it with exported content. Wget for example has a --mirror option I could experiment with as a light-weight solution.

ssddanbrown · 2021-01-26T23:12:32Z

Is there a way to access the HTML used by the exporter but with the original HREF to the images and/or video rather than the embedded images?

No way to get that directly, Although the main content HTML is what you'd get out of the API; The export just wraps it up in a template with some extra styles. The export uses this template, With these export styles.

maggie44 · 2021-05-27T19:57:36Z

Having given it some more thought, how would you feel about PanDoc as an optional exporter similar to how wkhtmltopdf is currently integrated? This wrapper is proving useful: https://github.com/ueberdosis/pandoc

Would also help resolve some other issues that I don't think we will find a way around:

linuxserver/docker-bookstack#80
#2459

ssddanbrown · 2021-05-31T14:54:33Z

Hi @Maggie0002,
Sorry for my lack of response.

To be honest, I'd not be very keen. Supporting both of the existing PDF export options has already proved a lot more challenging than hoped and consumed a lot of my time in the various requests & issues that have generated from it. The range of conversion formats that pandoc would open up would worry me, and I think that it's optimistic that it'll solve more issues than it'll create as an alternative PDF generator, especially since I believe pandoc will use WKHTMLtoPDF by default anyway for HTML to PDF conversions.

ssddanbrown added the 🔨 Feature Request label Jan 15, 2021

maggie44 mentioned this issue Feb 1, 2021

Add Pandoc to API #2524

Closed

2 tasks

maggie44 closed this as completed May 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: Pandoc integration #2491

Feature request: Pandoc integration #2491

maggie44 commented Jan 15, 2021 •

edited

Loading

maggie44 commented Jan 15, 2021 •

edited

Loading

maggie44 commented Jan 24, 2021

ssddanbrown commented Jan 24, 2021

maggie44 commented Jan 24, 2021 •

edited

Loading

ssddanbrown commented Jan 25, 2021

maggie44 commented Jan 26, 2021 •

edited

Loading

maggie44 commented Jan 26, 2021

ssddanbrown commented Jan 26, 2021

maggie44 commented May 27, 2021 •

edited

Loading

ssddanbrown commented May 31, 2021

Feature request: Pandoc integration #2491

Feature request: Pandoc integration #2491

Comments

maggie44 commented Jan 15, 2021 • edited Loading

maggie44 commented Jan 15, 2021 • edited Loading

maggie44 commented Jan 24, 2021

ssddanbrown commented Jan 24, 2021

maggie44 commented Jan 24, 2021 • edited Loading

ssddanbrown commented Jan 25, 2021

maggie44 commented Jan 26, 2021 • edited Loading

maggie44 commented Jan 26, 2021

ssddanbrown commented Jan 26, 2021

maggie44 commented May 27, 2021 • edited Loading

ssddanbrown commented May 31, 2021

maggie44 commented Jan 15, 2021 •

edited

Loading

maggie44 commented Jan 15, 2021 •

edited

Loading

maggie44 commented Jan 24, 2021 •

edited

Loading

maggie44 commented Jan 26, 2021 •

edited

Loading

maggie44 commented May 27, 2021 •

edited

Loading