Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown to PDF workflow automation #391

Merged
merged 2 commits into from
Jul 24, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 63 additions & 0 deletions .github/workflows/markdown-to-pdf.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
## Code

name: Markdown to PDF

on:
push:
branches:
- main
paths:
- '1_1_vulns/translations/**'
pull_request:
branches:
- main
paths:
- '1_1_vulns/translations/**'

env:
LANGUAGES: '["de", "it", "pt", "hi", "zh"]' # Add or remove language codes as needed

jobs:
convert-markdown-to-pdf:
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v4

- name: Set up Node.js
uses: actions/setup-node@v4
with:
node-version: '20' # Using Node.js version 20

- name: Configure locale
run: |
sudo locale-gen en_US.UTF-8
echo "LC_ALL=en_US.UTF-8" >> $GITHUB_ENV
echo "LANG=en_US.UTF-8" >> $GITHUB_ENV
echo "LANGUAGE=en_US.UTF-8" >> $GITHUB_ENV

- name: Install necessary fonts
run: |
sudo apt-get update
sudo apt-get install -y fonts-noto fonts-noto-cjk fonts-noto-color-emoji fonts-indic fonts-arphic-ukai fonts-arphic-uming fonts-ipafont-mincho fonts-ipafont-gothic fonts-unfonts-core

- name: Install md-to-pdf
run: npm install -g md-to-pdf

- name: Run markdown_to_pdf.sh for each language
run: |
for lang in $(echo $LANGUAGES | jq -r '.[]'); do
./markdown_to_pdf.sh --language $lang
done
working-directory: ./markdown-to-pdf

- name: Get current date and time
id: date
run: echo "date=$(date '+%Y-%m-%d-%H-%M-%S')" >> $GITHUB_ENV

- name: Upload generated PDFs as artifact
uses: actions/upload-artifact@v4
with:
name: pdf-translations-zipfile-${{ env.date }}
path: ./markdown-to-pdf/generated/*.pdf
20 changes: 19 additions & 1 deletion 1_1_vulns/translations/de/LLM00_Introduction.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
<div class="frontpage">
<div class="smalllogo">
<img src="/img/OWASP-title-logo.svg"></img>
</div>
<div class="doctitle">
OWASP Top 10 für LLM-Applikationen
</div>
<div class="docversion">
VERSION 1.1
</div>
<div class="docdate">
<b>Veröffentlicht am</b>: 10. Juni 2024
</div>
<div class="doclink">
https://llmtop10.com
</div>
</div>

## Einleitung

### Die Entstehung der Liste
Expand Down Expand Up @@ -95,4 +113,4 @@ Dieses Diagramm kann als visueller Leitfaden verwendet werden, um zu verstehen,

![Abb_1](images/fig_5_2.jpg)

##### Abbildung 1: OWASP Top 10 für LLM-Applikationen visualisiert
##### Abbildung 1: OWASP Top 10 für LLM-Applikationen visualisiert
19 changes: 18 additions & 1 deletion 1_1_vulns/translations/hi/LLM00_Introduction.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,21 @@

<div class="frontpage">
<div class="smalllogo">
<img src="/img/OWASP-title-logo.svg"></img>
</div>
<div class="doctitle">
OWASP टॉप 10 फॉर LLM एप्लिकेशंस
</div>
<div class="docversion">
संस्करण 1.1
</div>
<div class="docdate">
<b>प्रकाशित:</b> 16 अक्टूबर, 2023
</div>
<div class="doclink">
https://llmtop10.com
</div>
</div>

## परिचय

### सूची की उत्पत्ति
Expand Down
18 changes: 18 additions & 0 deletions 1_1_vulns/translations/it/LLM00_Introduction.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
<div class="frontpage">
<div class="smalllogo">
<img src="/img/OWASP-title-logo.svg"></img>
</div>
<div class="doctitle">
OWASP Top 10 per le applicazioni LLM
</div>
<div class="docversion">
Versione 1.1
</div>
<div class="docdate">
<b>Pubblicato</b>: 16 Ottobre 2023
</div>
<div class="doclink">
https://llmtop10.com
</div>
</div>

## Introduzione

L'introduzione sul mercato di massa dei chatbot pre-addestrati a fine 2022 ha innescato un'ondata di frenetico interesse per i modelli di linguaggio a grandi dimensioni (LLM). Le aziende, desiderose di sfruttare il potenziale degli LLM, li stanno integrando rapidamente nei loro sistemi e nelle offerte destinate ai clienti. Tuttavia, la velocità con cui gli LLM vengono adottati ha superato il tempo necessario per stabilire protocolli di sicurezza esaustivi, lasciando molte applicazioni vulnerabili a seri problemi di sicurezza.
Expand Down
1 change: 0 additions & 1 deletion 1_1_vulns/translations/pt/LLM00_Introduction.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

<div class="frontpage">
<div class="smalllogo">
<img src="/img/OWASP-title-logo.svg"></img>
Expand Down
18 changes: 18 additions & 0 deletions 1_1_vulns/translations/zh/LLM00_Introduction.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,21 @@
<div class="frontpage">
<div class="smalllogo">
<img src="/img/OWASP-title-logo.svg"></img>
</div>
<div class="doctitle">
OWASP 大语言模型人工智能应用Top 10 安全威胁
</div>
<div class="docversion">
版 1.1
</div>
<div class="docdate">
<b>发布日期<b>:2023 年 10 月 16 日
</div>
<div class="doclink">
https://llmtop10.com
</div>
</div>

## 介绍
2022 年底,随着ChatGPT进入大众市场,人们对大型语言模型 (LLM) 的关注尤为浓厚。渴望利用大语言模型潜力的企业正在迅速将其整合到其运营和面向客户的产品中。然而,大语言模型的采用速度已经超过了全面安全协议的建立速度,导致许多应用程序容易受到高风险问题的影响。很明显,大语言模型还没有统一的资源来解决这些安全问题。很多开发者对于与LLM相关的安全风险不够了解,所以相关资源很分散。而OWASP正好能够协助推进这项技术的更安全应用。

Expand Down
91 changes: 91 additions & 0 deletions markdown-to-pdf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# OWASP Top Ten for LLMs - Markdown to PDF

The contents of this directory are used to generate the PDFs of translated versions of the OWASP Top Ten for LLMs using the md-to-pdf npm package.

## How to Contribute Translations

To contribute translations to the OWASP Top Ten for LLMs project, please follow these steps:

1. Fork this repository on GitHub by clicking on the "Fork" button at the top right corner of the repository page.

2. On your copy of this repo, create an ISO two-letter subdirectory in the `1_1_vulns` directory. This subdirectory should contain all the markdown files of the translation. You can match the same format as other languages.

3. Copy the Markdown files of the English version to your new directory and start translating. Make sure to follow the instructions in `_template.md` to ensure consistent styling. (There is no need to copy the _template.md file) The Markdown to PDF generator relies on this consistency.

4. Aim to replicate the translation as accurately as possible and avoid deviating from the original meaning of the Top Ten for LLMs.

5. In the `LLM00_Introduction.md` file, there is a section **About this translation**. You can add your name as a translator in this section.

6. Once the translation is complete, open a descriptive pull request to this repository to get it merged in.

7. There is no need to generate the PDF using the process in this document, but if you want to validate that your Markdown is in the correct format (and possibly add some styling if it needs tweaking), follow the instructions below.

8. If you are validating a translation, you can open an issue and tag the original translator to make the change. Once both the original translator and reviewer agree, you can open a pull request to this repository. (Remember to add your name to the About this translation section)

9. You should aim to keep a summary of the discussion around translations in the Github issue even if you were chatting in the OWASP Slack channel, which is located here: [OWASP Slack Channel](https://owasp.slack.com/archives/C063W2E791U).


## How to generate a Translated PDF

### Requirements
1. To generate PDFs from the markdown files you'll need to have the [md-to-pdf](https://www.npmjs.com/package/md-to-pdf) npm package installed globally. You can do this by installing globally if you have NPM installed on your machine:
```shell
npm i -g md-to-pdf
```

2. You will require the translated Markdown files described above.

3. You will also need a CSS style file for the language in the `styles` directory. For languages based on latin characters you can copy the Portuguese file `topten-pt.css` as a starting point.


### Descriptions of contents

- ``markdown-to-pdf/generated`` directory: This directory is where the PDFs are stored once they are generated. After the Markdown files are converted to PDF format, the resulting PDF files are placed in this directory for easy access and distribution.

- ``markdown-to-pdf/img`` directory: This directory is used to store all the images that will be included in the PDF files. When converting Markdown to PDF, any referenced images are typically embedded in the PDF document. The images are stored in this directory so that they can be easily referenced and included during the conversion process.

- ``markdown-to-pdf/styles`` directory: The styles directory contains custom CSS files for each language. When converting Markdown to PDF, the Markdown is first converted to HTML, and then the HTML is "printed" using Puppeteer to generate the PDF. The custom CSS files in the styles directory ensure that the PDFs have consistent styling and alignment, closely resembling the original Markdown files. Each language may have its own CSS file to handle language-specific formatting requirements.

- ``markdown-to-pdf/frontmatter.md``: This file serves as the configuration for Puppeteer, the tool used to generate the PDFs. It specifies how the PDFs should be generated and, importantly, defines the header and footer for each page of the PDF. The header and footer typically contain information such as page numbers, document title, and other relevant details. **It is crucial to note that on line 57 of frontmatter.md, the title is translated and needs to be changed before generating a PDF.** This ensures that the PDFs have the correct translated title.

- ``markdown-to-pdf/markdown_to_pdf.sh``: This file is responsible for executing the conversion process from Markdown to PDF. It contains the necessary commands and instructions to convert the Markdown files to PDF format using the md-to-pdf npm package. The usage of this file is typically explained in the project documentation or README file, providing step-by-step instructions on how to run the script and generate the PDFs.


### Usage

To generate PDFs from the markdown files, follow these steps:

1. Modify line 57 of `frontmatter.md` to show the correct title of the OWASP Top Ten in the appropriate language

2. Validate that the ISO code directory for the language exists in the `../1_1_vulns` directory and that the corresponding CSS file for the language exists in the `styles` directory.

3. Run the following command to generate the PDF:

```shell
./markdown_to_pdf.sh --language <language_iso_code>
```

Example

```shell
./markdown_to_pdf.sh --language pt
```

The generated PDF will be saved in generated directory with the ISO code as the filename. If a file already exists it will be overwritten.

4. Validate that the contents of the file look similar to that of the main English file.


### Options


- **Keep Markdown** If you add the ``--keep-markdown`` flag at the end, the script will not delete the temporary markdown file generated from all the cocatenated ones. Please note that the temporary file is located in ``./generated/tmp``. eg:
```shell
./markdown_to_pdf.sh --language pt --keep-markdown
```



## License

This project is licensed under the terms of the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/).
99 changes: 99 additions & 0 deletions markdown-to-pdf/frontmatter.md

Large diffs are not rendered by default.

Empty file.
3 changes: 3 additions & 0 deletions markdown-to-pdf/img/OWASP-title-logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added markdown-to-pdf/img/diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added markdown-to-pdf/img/header-background.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added markdown-to-pdf/img/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added markdown-to-pdf/img/title-background.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
109 changes: 109 additions & 0 deletions markdown-to-pdf/markdown_to_pdf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
#!/bin/bash

# Ensure UTF-8 encoding in the environment
export LC_ALL=C.UTF-8
export LANG=C.UTF-8

# Check if a directory and stylesheet filename are provided
if [ -z "$2" ] || [ "$1" != "--language" ]; then
echo "Usage: $0 --language <language>"
exit 1
fi
language="$2"

# Define directories and files
current_directory=$(pwd)
directory="$current_directory/../1_1_vulns/translations/$language"
dir_name=$(basename "$directory")
generated_folder="$current_directory/generated"
tmp_folder="$generated_folder/tmp"
output_file="$tmp_folder/${dir_name}.md"
temp_pdf_file="$tmp_folder/${dir_name}.pdf"
pdf_file="$generated_folder/${dir_name}.pdf"
frontmatter="$current_directory/frontmatter.md"
stylesheet="$current_directory/styles/topten-$language.css"
intro_file="${directory}/LLM00_Introduction.md"

# Check if file exists
if [[ -f "$intro_file" ]]; then
# Use awk to handle multi-line patterns and extract the title, ensuring UTF-8 handling
header_title=$(awk '/<div class="doctitle">/,/<\/div>/{ if ($0 ~ /<\/div>/) { print p; p=""; next } if ($0 ~ /<div class="doctitle">/) next; p=p $0 }' "$intro_file" | xargs)
echo "Extracted header title: $header_title"
else
echo "Error: File does not exist."
fi

# Check if the provided argument is a directory
if [ ! -d "$directory" ]; then
echo "Error: '$directory' is not a directory."
exit 1
fi

# Check if the provided stylesheet exists
if [ ! -f "$stylesheet" ]; then
echo "Error: '$stylesheet' does not exist."
exit 1
fi

# Create the 'generated' directory if it doesn't exist
if [ ! -d "$generated_folder" ]; then
mkdir "$generated_folder"
fi

# Create the 'tmp' directory if it doesn't exist
if [ ! -d "$tmp_folder" ]; then
mkdir "$tmp_folder"
fi

# Delete the PDF and Markdown file if they already exist
if [ -f "$pdf_file" ]; then
echo "Deleting existing PDF file: $pdf_file"
rm "$pdf_file"
fi
# Delete the PDF and Markdown file if they already exist
if [ -f "$pdf_file" ]; then
echo "Deleting existing temporary PDF file: $temp_pdf_file"
rm "$pdf_file"
fi
if [ -f "$output_file" ]; then
echo "Deleting existing temporary Markdown file: $output_file"
rm "$output_file"
fi

# Start with a clean output file
> "$output_file"

# Add the frontmatter if it exists
if [ -f "$frontmatter" ]; then
cat "$frontmatter" >> "$output_file"
echo "" >> "$output_file" # Adds a newline after the frontmatter
fi

# Sort markdown files alphabetically and concatenate them
for file in $(find "$directory" -maxdepth 1 -name '*.md' | sort); do
# Skip the frontmatter
if [[ "$file" != "$frontmatter" ]]; then
cat "$file" >> "$output_file"
echo "" >> "$output_file" # Adds a newline between files
fi
done

echo "Combined markdown files into $output_file"

# Convert the combined Markdown file to PDF
md-to-pdf --basedir "$current_directory" --stylesheet "$stylesheet" --document-title "$header_title" "$output_file"
mv "$temp_pdf_file" "$pdf_file"

if [ -f "$output_file" ] && [ "$3" != "--keep-markdown" ]; then
echo "Deleting temporary Markdown file: $output_file"
rm "$output_file"
fi

if [ -f "$pdf_file" ]; then
echo -e "\033[32m###############################################################################################################\033[0m"
echo -e "\033[32m########################################### Success!! ##################################################\033[0m"
echo -e "\033[32m###############################################################################################################\033[0m"
echo "PDF file generated: $pdf_file"
echo -e "\033[32m###############################################################################################################\033[0m"
echo -e "\033[32m###############################################################################################################\033[0m"
fi
Loading