Skip to content

Commit

Permalink
added markdown to pdf folder
Browse files Browse the repository at this point in the history
  • Loading branch information
talesh committed Jul 24, 2024
1 parent 0fb393b commit 91daa0c
Show file tree
Hide file tree
Showing 16 changed files with 1,492 additions and 0 deletions.
91 changes: 91 additions & 0 deletions markdown-to-pdf/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# OWASP Top Ten for LLMs - Markdown to PDF

The contents of this directory are used to generate the PDFs of translated versions of the OWASP Top Ten for LLMs using the md-to-pdf npm package.

## How to Contribute Translations

To contribute translations to the OWASP Top Ten for LLMs project, please follow these steps:

1. Fork this repository on GitHub by clicking on the "Fork" button at the top right corner of the repository page.

2. On your copy of this repo, create an ISO two-letter subdirectory in the `1_1_vulns` directory. This subdirectory should contain all the markdown files of the translation. You can match the same format as other languages.

3. Copy the Markdown files of the English version to your new directory and start translating. Make sure to follow the instructions in `_template.md` to ensure consistent styling. (There is no need to copy the _template.md file) The Markdown to PDF generator relies on this consistency.

4. Aim to replicate the translation as accurately as possible and avoid deviating from the original meaning of the Top Ten for LLMs.

5. In the `LLM00_Introduction.md` file, there is a section **About this translation**. You can add your name as a translator in this section.

6. Once the translation is complete, open a descriptive pull request to this repository to get it merged in.

7. There is no need to generate the PDF using the process in this document, but if you want to validate that your Markdown is in the correct format (and possibly add some styling if it needs tweaking), follow the instructions below.

8. If you are validating a translation, you can open an issue and tag the original translator to make the change. Once both the original translator and reviewer agree, you can open a pull request to this repository. (Remember to add your name to the About this translation section)

9. You should aim to keep a summary of the discussion around translations in the Github issue even if you were chatting in the OWASP Slack channel, which is located here: [OWASP Slack Channel](https://owasp.slack.com/archives/C063W2E791U).


## How to generate a Translated PDF

### Requirements
1. To generate PDFs from the markdown files you'll need to have the [md-to-pdf](https://www.npmjs.com/package/md-to-pdf) npm package installed globally. You can do this by installing globally if you have NPM installed on your machine:
```shell
npm i -g md-to-pdf
```

2. You will require the translated Markdown files described above.

3. You will also need a CSS style file for the language in the `styles` directory. For languages based on latin characters you can copy the Portuguese file `topten-pt.css` as a starting point.


### Descriptions of contents

- ``markdown-to-pdf/generated`` directory: This directory is where the PDFs are stored once they are generated. After the Markdown files are converted to PDF format, the resulting PDF files are placed in this directory for easy access and distribution.

- ``markdown-to-pdf/img`` directory: This directory is used to store all the images that will be included in the PDF files. When converting Markdown to PDF, any referenced images are typically embedded in the PDF document. The images are stored in this directory so that they can be easily referenced and included during the conversion process.

- ``markdown-to-pdf/styles`` directory: The styles directory contains custom CSS files for each language. When converting Markdown to PDF, the Markdown is first converted to HTML, and then the HTML is "printed" using Puppeteer to generate the PDF. The custom CSS files in the styles directory ensure that the PDFs have consistent styling and alignment, closely resembling the original Markdown files. Each language may have its own CSS file to handle language-specific formatting requirements.

- ``markdown-to-pdf/frontmatter.md``: This file serves as the configuration for Puppeteer, the tool used to generate the PDFs. It specifies how the PDFs should be generated and, importantly, defines the header and footer for each page of the PDF. The header and footer typically contain information such as page numbers, document title, and other relevant details. **It is crucial to note that on line 57 of frontmatter.md, the title is translated and needs to be changed before generating a PDF.** This ensures that the PDFs have the correct translated title.

- ``markdown-to-pdf/markdown_to_pdf.sh``: This file is responsible for executing the conversion process from Markdown to PDF. It contains the necessary commands and instructions to convert the Markdown files to PDF format using the md-to-pdf npm package. The usage of this file is typically explained in the project documentation or README file, providing step-by-step instructions on how to run the script and generate the PDFs.


### Usage

To generate PDFs from the markdown files, follow these steps:

1. Modify line 57 of `frontmatter.md` to show the correct title of the OWASP Top Ten in the appropriate language

2. Validate that the ISO code directory for the language exists in the `../1_1_vulns` directory and that the corresponding CSS file for the language exists in the `styles` directory.

3. Run the following command to generate the PDF:

```shell
./markdown_to_pdf.sh --language <language_iso_code>
```

Example

```shell
./markdown_to_pdf.sh --language pt
```

The generated PDF will be saved in generated directory with the ISO code as the filename. If a file already exists it will be overwritten.

4. Validate that the contents of the file look similar to that of the main English file.


### Options


- **Keep Markdown** If you add the ``--keep-markdown`` flag at the end, the script will not delete the temporary markdown file generated from all the cocatenated ones. Please note that the temporary file is located in ``./generated/tmp``. eg:
```shell
./markdown_to_pdf.sh --language pt --keep-markdown
```



## License

This project is licensed under the terms of the [Creative Commons Attribution-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-sa/4.0/).
99 changes: 99 additions & 0 deletions markdown-to-pdf/frontmatter.md

Large diffs are not rendered by default.

Empty file.
3 changes: 3 additions & 0 deletions markdown-to-pdf/img/OWASP-title-logo.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added markdown-to-pdf/img/diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added markdown-to-pdf/img/header-background.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added markdown-to-pdf/img/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added markdown-to-pdf/img/title-background.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
109 changes: 109 additions & 0 deletions markdown-to-pdf/markdown_to_pdf.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
#!/bin/bash

# Ensure UTF-8 encoding in the environment
export LC_ALL=C.UTF-8
export LANG=C.UTF-8

# Check if a directory and stylesheet filename are provided
if [ -z "$2" ] || [ "$1" != "--language" ]; then
echo "Usage: $0 --language <language>"
exit 1
fi
language="$2"

# Define directories and files
current_directory=$(pwd)
directory="$current_directory/../1_1_vulns/translations/$language"
dir_name=$(basename "$directory")
generated_folder="$current_directory/generated"
tmp_folder="$generated_folder/tmp"
output_file="$tmp_folder/${dir_name}.md"
temp_pdf_file="$tmp_folder/${dir_name}.pdf"
pdf_file="$generated_folder/${dir_name}.pdf"
frontmatter="$current_directory/frontmatter.md"
stylesheet="$current_directory/styles/topten-$language.css"
intro_file="${directory}/LLM00_Introduction.md"

# Check if file exists
if [[ -f "$intro_file" ]]; then
# Use awk to handle multi-line patterns and extract the title, ensuring UTF-8 handling
header_title=$(awk '/<div class="doctitle">/,/<\/div>/{ if ($0 ~ /<\/div>/) { print p; p=""; next } if ($0 ~ /<div class="doctitle">/) next; p=p $0 }' "$intro_file" | xargs)
echo "Extracted header title: $header_title"
else
echo "Error: File does not exist."
fi

# Check if the provided argument is a directory
if [ ! -d "$directory" ]; then
echo "Error: '$directory' is not a directory."
exit 1
fi

# Check if the provided stylesheet exists
if [ ! -f "$stylesheet" ]; then
echo "Error: '$stylesheet' does not exist."
exit 1
fi

# Create the 'generated' directory if it doesn't exist
if [ ! -d "$generated_folder" ]; then
mkdir "$generated_folder"
fi

# Create the 'tmp' directory if it doesn't exist
if [ ! -d "$tmp_folder" ]; then
mkdir "$tmp_folder"
fi

# Delete the PDF and Markdown file if they already exist
if [ -f "$pdf_file" ]; then
echo "Deleting existing PDF file: $pdf_file"
rm "$pdf_file"
fi
# Delete the PDF and Markdown file if they already exist
if [ -f "$pdf_file" ]; then
echo "Deleting existing temporary PDF file: $temp_pdf_file"
rm "$pdf_file"
fi
if [ -f "$output_file" ]; then
echo "Deleting existing temporary Markdown file: $output_file"
rm "$output_file"
fi

# Start with a clean output file
> "$output_file"

# Add the frontmatter if it exists
if [ -f "$frontmatter" ]; then
cat "$frontmatter" >> "$output_file"
echo "" >> "$output_file" # Adds a newline after the frontmatter
fi

# Sort markdown files alphabetically and concatenate them
for file in $(find "$directory" -maxdepth 1 -name '*.md' | sort); do
# Skip the frontmatter
if [[ "$file" != "$frontmatter" ]]; then
cat "$file" >> "$output_file"
echo "" >> "$output_file" # Adds a newline between files
fi
done

echo "Combined markdown files into $output_file"

# Convert the combined Markdown file to PDF
md-to-pdf --basedir "$current_directory" --stylesheet "$stylesheet" --document-title "$header_title" "$output_file"
mv "$temp_pdf_file" "$pdf_file"

if [ -f "$output_file" ] && [ "$3" != "--keep-markdown" ]; then
echo "Deleting temporary Markdown file: $output_file"
rm "$output_file"
fi

if [ -f "$pdf_file" ]; then
echo -e "\033[32m###############################################################################################################\033[0m"
echo -e "\033[32m########################################### Success!! ##################################################\033[0m"
echo -e "\033[32m###############################################################################################################\033[0m"
echo "PDF file generated: $pdf_file"
echo -e "\033[32m###############################################################################################################\033[0m"
echo -e "\033[32m###############################################################################################################\033[0m"
fi
169 changes: 169 additions & 0 deletions markdown-to-pdf/styles/base.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
@import url('https://fonts.googleapis.com/css2?family=Roboto+Condensed:ital,wght@0,100..900;1,100..900&display=swap');
@import url('https://fonts.googleapis.com/css2?family=Roboto:ital,wght@0,100;0,300;0,400;0,500;0,700;0,900;1,100;1,300;1,400;1,500;1,700;1,900&display=swap');


.smalllogo {
text-align: center;
margin-top: 100px;
margin-bottom: 100px;
}

.doctitle{
font-size: 80px;
line-height: 1;
padding-top: 100px;
letter-spacing: -0.05em;
font-weight: 900;
text-align: center;
}
.docversion{
font-size: 20px;
font-weight: 500;
line-height: 1;
letter-spacing: 0.20em;
text-transform: uppercase;
color: #030C16;
text-align: center;
padding-top: 40px;

}
.doclink {
border-radius: 6px;
background-color: #1D7BD7;
color: white;
text-transform: uppercase;
font-weight: bold;
font-size: 17px;
font-family: 'Roboto Condensed', sans-serif;
font-weight: 900;
padding: 5px 10px;
margin: 0 280px;
text-align: center;
}
.docdate{
font-size: 13px;
font-style: italic;
line-height: 1;
padding-bottom: 250px;
white-space: nowrap;
color: #030C16;
text-align: center;
padding-top: 20px;

}
.frontpage {
background: url('/img/title-background.png') no-repeat;
background-position: center;
background-size: 100% 137%;
height: 950px;
overflow: hidden;
}


.published {
font-weight: bold;
font-style: italic;
}
body {
background-color: #fff;
font-family: 'Roboto', sans-serif;
font-weight: 400;
font-size: 13.33px;
line-height: 20px;
color: #030C16;
margin: 0;
padding: 0;
overflow: visible;
}

h2 {
width: 100vw;
height: 100px;
background: url('/img/header-background.png') no-repeat center center;
background-size: cover;
color: #fff;
margin-bottom: 20px;
margin-left: 0;
margin-right: 0;
margin-top: 0;
text-align: center;
vertical-align: middle;
font-family: 'Roboto', sans-serif;
font-weight: 700;
font-size: 32px;
page-break-after: avoid;
letter-spacing: -0.02em;
line-height: 100px;
page-break-before: always;
overflow: visible;
}

h3 {
font-family: 'Roboto', sans-serif;
font-weight: 700;
font-size: 21.33px;
line-height: 17.33px;
page-break-after: avoid;
margin-top: 20px;
margin-left: 100px;
margin-right: 100px;
color: #030C16;
}

h4 {
font-family: 'Roboto Condensed', sans-serif;
font-weight: 700;
font-size: 17px;
line-height: 14px;
page-break-after: avoid;
margin-top: 20px;
margin-left: 100px;
margin-right: 100px;
color: #030C16;
}

a {
color: #030C16; /* Default color */
font-weight: 500;
text-decoration: underline;
}

p {
margin-left: 100px;
}

p,
ul,
ol {
margin-right: 100px;
margin-left: 100px;
margin-bottom: 20px;
}

p,
li {
page-break-inside: avoid;
}

li {
margin-left: 5px;
}

ul ul,
ul ol,
ol ul,
ol ol {
margin-left: 0;
}

@page {
size: a4 portrait;
margin-top: 22mm;
}

@media print {
h2 {
width: 10%;
margin-top: -5mm !important;
}
}
Loading

0 comments on commit 91daa0c

Please sign in to comment.