Skip to content

Latest commit

 

History

History
487 lines (347 loc) · 16.3 KB

File metadata and controls

487 lines (347 loc) · 16.3 KB

Translation Guide

This document describes how translations work for the Nextflow training materials.

Warning

All translations are generated and maintained by AI. Do not submit manual translations - they will be overwritten the next time the translation workflow runs. Instead, improve the translation prompts to fix issues permanently.

Overview

Translations are managed through a combination of:

  1. LLM prompts that define translation rules and glossaries
  2. GitHub Actions that regenerate translations when triggered
  3. Human review to catch errors and improve prompts

The key insight: to fix a translation, fix the prompt - not the translated file.

Contents


How to Improve Existing Translations

Found a translation error or want to suggest an improvement? Here's how to fix it the right way.

flowchart TD
    A[Find translation error] --> B{Type of issue?}
    B -->|Wrong term| C[Update glossary in<br>docs/LANG/llm-prompt.md<br>or glossary.yml]
    B -->|Wrong style/tone| D[Update grammar rules in<br>docs/LANG/llm-prompt.md]
    B -->|Structural issue| E[Update rules in<br>_scripts/general-llm-prompt.md]
    C --> F[Open PR with prompt change]
    D --> F
    E --> F
    F --> G[Merge prompt PR to master]
    G --> H[Trigger translation workflow<br>via Actions → Translate → Run workflow]
    H --> I[Review translation PR]
    I --> J{Correct?}
    J -->|Yes| K[Merge translation PR]
    J -->|No| B
Loading

Note

The translation workflow can only run on branches in the main repository, not on forks. If you'd like to contribute translation improvements and need write access, please open an issue to request it.

The Right Way: Update the Prompt

The only sustainable way to fix translations is to improve the LLM prompts:

  1. For language-specific issues (terminology, tone, grammar):

    • Edit docs/<lang>/llm-prompt.md
    • Add glossary terms, clarify rules, provide examples
  2. For structural issues (code blocks, formatting, links):

    • Edit _scripts/general-llm-prompt.md
    • Add rules with before/after examples
  3. Merge the prompt change to master

  4. Trigger the translation workflow via GitHub Actions:

    • Go to ActionsTranslateRun workflow
    • Select language and mode (normal)
    • The workflow detects that prompts changed and re-translates all files for that language
  5. Review the translation PR created by the workflow

Why Not Edit Translations Directly?

  • Direct edits are overwritten when English content changes
  • There's no way to track why a translation differs from the AI output
  • Future maintainers won't know which changes were intentional
  • The same error will reappear in new content

Reporting Issues

If you find errors but can't fix the prompts yourself:

  1. Open a GitHub issue
  2. Include: language, file path, current text, expected text
  3. Explain why the current translation is wrong
  4. A maintainer will update the prompt and re-run

Understanding the Translation Prompts

The translation system uses two types of prompts:

General Prompt (_scripts/general-llm-prompt.md)

Contains rules that apply to ALL languages:

  • What to translate vs keep in English
  • Code block handling (never translate syntax, always translate comments)
  • Formatting preservation (links, anchors, admonitions)
  • Technical term lists (Nextflow keywords, operators, directives)
  • Validation requirements

Language-Specific Prompts (docs/<lang>/llm-prompt.md)

Contains rules specific to each language:

  • Grammar and tone (formal/informal)
  • Terms that get translated (with exact translations)
  • Admonition titles
  • Common expressions
  • Language-specific mistakes to avoid

Prompt Precedence

When prompts conflict, language-specific rules override general rules. This allows languages to customize behavior (e.g., French keeps "workflow" in English even though it could theoretically be translated).

Contributing Prompt Improvements

If you're a native speaker and want to improve translation quality:

  1. Read the current prompt for your language in docs/<lang>/llm-prompt.md
  2. Identify the issue type:
    • Wrong term → Add/update entry in "Terms to Translate"
    • Wrong tone → Clarify in "Grammar & Tone" section
    • Repeated error → Add to "Common Mistakes" section
  3. Provide examples showing wrong vs correct translations
  4. Test your change by running the translation workflow
  5. Submit a PR with both prompt change and sample output

Good prompt improvements include:

  • Adding missing glossary terms
  • Clarifying ambiguous rules with examples
  • Adding common mistakes that keep occurring
  • Regional spelling/grammar preferences

How Translation Updates Work

Translation runs are triggered manually via the GitHub Actions workflow (ActionsTranslateRun workflow), typically ahead of a new release of the training materials (but can be done at any time). The workflow detects what has changed since the last translation run and updates accordingly:

  1. English source files changed → Outdated translations are updated
  2. Language-specific prompt changed → That language's translations are re-translated
  3. General prompt changed → All languages are re-translated
flowchart TD
    A[Maintainer triggers<br>translation workflow] --> B[Detect baseline from<br>last translation PR]
    B --> C{What changed<br>since baseline?}
    C -->|English content| D[Find outdated translations]
    C -->|Language prompt| E[Mark ALL translations<br>for that language outdated]
    C -->|General prompt| F[Mark ALL languages outdated]
    D --> G{Any outdated?}
    G -->|No| H[Done]
    G -->|Yes| I[AI updates changed sections<br>using incremental diffs]
    E --> J[AI re-translates<br>affected files]
    F --> J
    I --> K[Post-process and verify]
    J --> K
    K --> L[Create PR with<br>updated translations]
    L --> M[Human review]
    M --> N{Approved?}
    N -->|Yes| O[Merge PR promptly]
    N -->|No| P[Update llm-prompt.md<br>and re-run workflow]
    P --> A
Loading

Key Points

  • Translation runs are triggered manually, not automatically on push
  • Runs are typically done ahead of a new release, but can be triggered at any time
  • The AI makes minimal changes, updating only sections that changed in English
  • When prompts change, all translations for the affected language(s) are re-translated
  • Translations preserve line-by-line structure for easy diff review
  • Each run creates a single PR containing updates for all affected languages
  • The system uses the last merged translation PR's commit SHA as the baseline for change detection
  • Translation PRs should be merged promptly to avoid them stacking up or becoming outdated

Excluding Directories from Translation

To prevent specific directories under docs/en/docs/ from being translated, add them to _scripts/translate/translate_config.yml:

exclude_dirs:
  - example_docs_set
  - archive

Excluded directories are skipped during sync (both new and outdated detection) and are not flagged as orphaned. Pages in excluded directories still appear on translated sites, showing the English content with a "Translation in Progress" admonition.


Reviewing Translation PRs

When reviewing a translation PR, follow these guidelines:

What to Check

  1. Technical accuracy

    • Are Nextflow concepts correctly explained?
    • Are code examples unchanged (only comments translated)?
    • Are technical terms used consistently with the glossary?
  2. Formatting preservation

    • Are code blocks intact and properly formatted?
    • Are admonitions (note, tip, warning) correctly structured?
    • Are heading anchors preserved ({ #anchor-name })?
    • Are links working (URLs unchanged, only link text translated)?
  3. Language quality

    • Is the tone appropriate (formal/informal per language)?
    • Are translations natural and readable?
    • Are there any obvious errors or awkward phrasings?

How to Handle Issues

Caution

Do NOT suggest changes directly to translation PRs. Direct edits will be overwritten the next time the translation workflow runs. Instead, update the translation prompts and re-run the translation.

When you find an issue during review:

flowchart TD
    A[Find translation error] --> B{Type of issue?}
    B -->|Wrong term| C[Update glossary in<br>docs/LANG/llm-prompt.md<br>or glossary.yml]
    B -->|Wrong style/tone| D[Update grammar rules in<br>docs/LANG/llm-prompt.md]
    B -->|Structural issue| E[Update rules in<br>_scripts/general-llm-prompt.md]
    C --> F[Close translation PR]
    D --> F
    E --> F
    F --> G[Merge prompt fix to master]
    G --> H[Re-trigger translation workflow]
    H --> I[Review new translation PR]
Loading

Workflow for Fixing Issues

  1. Close the translation PR (the translations will be regenerated)

  2. Merge the prompt fix to master:

    • Language-specific issues → edit docs/<lang>/llm-prompt.md
    • General formatting issues → edit _scripts/general-llm-prompt.md
  3. Re-trigger the translation workflow:

    • Go to ActionsTranslateRun workflow
    • Select the language and normal mode
    • The workflow creates a new PR with updated translations
  4. Review the new translation PR to verify the fix

Example: Fixing a Wrong Term

If "workflow" is incorrectly translated as "flujo" instead of "flujo de trabajo" in Spanish:

  1. Close the current translation PR

  2. Edit docs/es/llm-prompt.md and add or update the glossary entry:

    English Spanish
    workflow flujo de trabajo (NOT "flujo")
  3. Merge the prompt change to master

  4. Re-trigger the translation workflow

  5. Verify the fix in the new translation PR

Approving PRs

Once you've verified the translation quality:

  1. Check that CI passes (build succeeds)
  2. Approve the PR
  3. Merge (squash merge recommended)

How to Add a Missing Course

If a language exists but is missing content (e.g., Portuguese has hello_nextflow/ but not nf4_science/):

Using GitHub Actions (Recommended)

  1. Go to ActionsTranslateRun workflow
  2. Select language (e.g., pt)
  3. Select mode: normal
  4. The workflow detects missing translations and creates a PR

Using the CLI (Requires API Key)

For maintainers with ANTHROPIC_API_KEY access:

cd _scripts

# Translate one file at a time
uv run python -m translate translate nf4_science/index.md --lang pt

# Or sync all (update outdated + add missing + remove orphaned)
uv run python -m translate sync pt

After Translation

  1. Review the generated translations
  2. Check if any prompt updates are needed
  3. Submit a PR

How to Add a New Language

Step 1: Create Language Structure

Use the GitHub Actions workflow or CLI:

cd _scripts
uv run python docs.py new-lang <lang-code>

This creates:

  • docs/<lang>/mkdocs.yml - MkDocs config (inherits from English)
  • docs/<lang>/llm-prompt.md - Translation prompt (requires customization)
  • docs/<lang>/ui-strings.yml - UI strings (copied from English, requires translation)
  • docs/<lang>/docs/ - Directory for translated content

Step 2: Customize the Translation Prompt

Edit docs/<lang>/llm-prompt.md to define:

  1. Grammar preferences

    • Formal or informal tone
    • Regional spelling conventions
    • Specific grammar rules
  2. Glossary

    • Terms to keep in English
    • Terms to translate (with exact translations)
    • Common mistakes to avoid
  3. Admonition titles

    • Translations for Note, Tip, Warning, Exercise, Solution

See existing language prompts for examples (e.g., docs/pt/llm-prompt.md).

Step 3: Create the Post-Processing Glossary

Create docs/<lang>/glossary.yml with deterministic translations that are enforced by post-processing (independent of LLM output):

  • translation_notice: AI translation notice text and homepage admonition
  • tab_labels: canonical Before/After translations (must be consistent for MkDocs tab sync)
  • admonition_titles: default titles for bare admonitions (note, tip, warning, etc.)
  • admonition_title_glossary: translations for common titled admonitions (Command output, Directory contents, etc.)

See docs/pt/glossary.yml for an example. These translations are applied deterministically during post-processing and override whatever the LLM produces, ensuring consistency across all files.

Step 4: Register the Language

  1. Add the language code to docs/language_names.yml
  2. Run uv run docs.py sync-language-picker to update the language switcher

Step 5: Generate Initial Translations

Use GitHub Actions:

  1. Go to ActionsTranslateRun workflow
  2. Select the new language
  3. Select mode: normal

Step 6: Review and Iterate

  1. Review the PR with generated translations
  2. Update llm-prompt.md to fix any issues
  3. Re-run translations as needed
  4. Merge when satisfied

Directory Structure

docs/
├── en/                     # English (source)
│   ├── mkdocs.yml          # Main config
│   ├── overrides/          # Theme customization
│   └── docs/               # English content
├── pt/                     # Portuguese
│   ├── mkdocs.yml          # Inherits from en
│   ├── llm-prompt.md       # Translation rules (LLM guidance)
│   ├── glossary.yml        # Post-processing glossary (deterministic fixes)
│   ├── ui-strings.yml      # Translated UI strings
│   └── docs/               # Translated content
├── es/                     # Spanish
│   └── ...
└── ...

_scripts/
├── pyproject.toml          # Package metadata and dependencies
├── translate/              # Translation CLI package (python -m translate)
│   ├── __init__.py         # Package marker
│   ├── __main__.py         # Entry point
│   ├── config.py           # Constants and configuration
│   ├── translate_config.yml# Translation scope config (exclude dirs, etc.)
│   ├── models.py           # Data structures
│   ├── paths.py            # Path utilities
│   ├── prompts.py          # Prompt loading
│   ├── git_utils.py        # Git operations
│   ├── api.py              # Claude API calls
│   ├── postprocess.py      # Translation post-processing and glossary enforcement
│   ├── verify.py           # Translation verification
│   ├── progress.py         # Progress tracking
│   ├── core.py             # Translation orchestration
│   └── cli.py              # CLI commands
├── general-llm-prompt.md   # Shared translation rules
└── docs.py                 # Build/serve CLI

CLI Reference (For Maintainers)

Note

The CLI requires ANTHROPIC_API_KEY for translation commands. Community contributors should use GitHub Actions instead.

All commands run from _scripts/ directory:

cd _scripts

Translation Commands

# Sync all translations (update outdated + add missing + remove orphaned)
uv run python -m translate sync <lang>

# Sync with lower parallelism (default: 50 concurrent translations)
uv run python -m translate sync <lang> --parallel 10

# Sync with filter pattern
uv run python -m translate sync <lang> --include hello_nextflow

# Translate a single file
uv run python -m translate translate <path> --lang <lang>

Preview Commands (No API key required)

# Serve docs locally
uv run docs.py serve <lang>

# Build docs
uv run docs.py build-lang <lang>

References