Skip to content
Draft
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ jobs:
# https://github.com/docker/build-push-action
- name: Build and push Docker image
id: build-and-push
uses: docker/build-push-action@0565240e2d4ab88bba5387d719585280857ece09 # v5.0.0
uses: docker/build-push-action@263435318d21b8e681c14492fe198d362a7d2c83 # v6.18.0
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,5 @@ __pypackages__/

# Added by cargo
/target

docs/installation-troubleshooting/_generated_index.md
92 changes: 92 additions & 0 deletions docs/installation-troubleshooting/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Installation Troubleshooting

This directory contains concise, platform-specific troubleshooting notes focused on reproducible diagnostics and resolutions.

Files:

- `macos.md` — macOS troubleshooting index and links to per-entry files.
- `macos-poppler-build.md` — Poppler build failure entry.
- `macos-poppler-runtime.md` — Poppler runtime binaries entry.
- `macos-tesseract.md` — Tesseract / pytesseract entry.
- `linux.md` — placeholder for Linux troubleshooting entries.
- `windows.md` — placeholder for Windows troubleshooting entries.

Contribution guidance

When adding a troubleshooting entry, include the following fields to make the entry reproducible and searchable:

- Title: short, descriptive title.
- Symptom: exact error message or observable behavior.
- Environment: OS, architecture (x86_64 / arm64), Python version, Pixi version (if relevant), and any other relevant software versions.
- Reproduction steps: exact commands or minimal steps to reproduce the issue.
- Resolution: exact commands and a brief explanation of why the resolution works.
- References: links to upstream docs, issues, or relevant resources.


Format example (minimal template with YAML frontmatter for new troubleshooting entries):

```yaml
---
title: "Brief title"
date: "YYYY-MM-DD" # Use YYYY-MM-DD (e.g., "2025-08-28")
verified_on: null # Set to validation date (e.g., "2025-08-29") after verification
os: macos # single value or a list: ["macos", "linux", "windows"]
arch: any # one of: "x86_64", "arm64", "any"
severity: low # one of: "low", "medium", "high"
status: active # "active" or "deprecated"
tags: [technology, error-type] # e.g., [poppler, pdf2image]
reproducible: true
---
```

Symptom: copy-paste of error/output
Environment: e.g. macOS 13.5, arm64, Python 3.12, pixi 0.1
Reproduction steps:
1. command1
2. command2
Resolution:
- command(s)
- short explanation
References: link(s)

Notes:

- Keep entries concise and diagnostic-first; avoid duplicating general installation instructions already present in `README.rst` or Pixi docs.
- Per-OS files for clarity and searchability. If an entry is cross-platform, note that in the Environment field and add it to each relevant file.

Maintenance guidance:

- Each entry should include `date` and, when possible, `verified_on` to indicate when it was validated. Maintainers should mark `status: deprecated` if an entry becomes obsolete and point to the canonical install docs.
- Automated checks should verify frontmatter presence and basic fields.

## Docs tooling

The repository provides a small local tool to validate troubleshooting entries and generate a consolidated index. This tool is optional and non-blocking by default.

Run the validator and index generator inside the project's Pixi environment:

```bash
pixi run python3 tools/generate_troubleshooting_index.py
```

Use `--strict` to make the tool exit non-zero when required frontmatter is missing (useful for CI once the team decides to enforce metadata):

```bash
pixi run python3 tools/generate_troubleshooting_index.py --strict
```

Minimal frontmatter template for an entry (keep entries short and link to canonical README installation instructions):

```yaml
---
title: "Short descriptive title"
date: 2025-08-28
os: macos
arch: any
status: active
tags: [poppler, pdf2image]
---
```

The tool writes `docs/installation-troubleshooting/_generated_index.md`. You do not need to commit the generated file; it can be regenerated on demand. For now, the script will only warn about missing metadata unless `--strict` is used.

10 changes: 10 additions & 0 deletions docs/installation-troubleshooting/linux.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Linux troubleshooting (landing)

This page points to Linux-specific troubleshooting files. Create per-issue markdown files in this directory and follow the frontmatter template in `index.md`.

Examples to consider:

- package manager issues (apt/yum) for Poppler/Tesseract
- Python wheel compatibility (GLIBC/C++ ABI) problems

Add new entries as focused, reproducible files; avoid duplicating canonical installation docs.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: "Pixi warning: conda-packages overridden by PyPI"
date: "2025-08-28"
verified_on: "2025-08-28"
os: macos
arch: any
severity: low
status: active
tags: [pixi, conda, pip, environment]
reproducible: true
---

Symptom
-------

When running `pixi shell -e pdev`, you may see a warning such as:

```
WARN These conda-packages will be overridden by pypi:
yarg
```

Environment
-----------

- Using Pixi to create or enter a project environment that mixes Conda and PyPI package sources.

Reproduction steps
------------------

1. Run `pixi shell -e pdev` (or the equivalent Pixi environment creation command).
2. Observe the warning during environment resolution.

Resolution
----------

This warning indicates that a package available from both Conda and PyPI will be installed from PyPI instead of the Conda channel. Usually this is safe, but if you rely on the Conda-built binary (for example, for performance or compatibility), align the package source by:

- Pinning the package to the Conda channel in the Pixi/Pixienv configuration.
- Removing the PyPI override from the project's requirements if Conda should be authoritative.

Notes
-----

Most users can safely ignore this warning unless they encounter runtime issues tied to a specific build of the package.
77 changes: 77 additions & 0 deletions docs/installation-troubleshooting/macos-poppler-build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: "Poppler build failure: missing macOS SDK headers"
date: "2025-08-28"
verified_on: "2025-08-28"
os: macos
arch: arm64
severity: medium
status: active
tags: [poppler, build, xcode]
reproducible: true
---

Symptom
-------

Compilation fails with errors like:

```
fatal error: 'ctime' file not found
```

or other C/C++ header errors when installing packages that build native extensions (for example, Poppler utilities or bindings).

Environment
-----------

- macOS host (any recent release)
- Building Python packages that include C/C++ extensions

Reproduction steps
------------------

1. Create or update the Pixi environment that installs packages requiring native builds (or run `pip install` for such packages).
2. Observe the build failure during the compilation step.

Resolution
----------

1. Install or update the Xcode Command Line Tools:

```bash
xcode-select --install
```

2. If the system points to the wrong developer directory, reset it:

```bash
sudo xcode-select --reset
```

3. If prompted, accept the Xcode license:

```bash
sudo xcodebuild -license
```

4. If the build still cannot find headers, export a valid SDK path for the build process and retry:

```bash
export SDKROOT=$(xcrun --sdk macosx --show-sdk-path)
# then re-run the install or build
```

Notes
-----

These steps provide the compiler and headers required to build C/C++ extensions. Prefer installing Poppler via the OS package manager (Homebrew) when possible to avoid local compilation.

References
----------

- https://developer.apple.com

Verified on
----------

- Apple M1 Pro (2021), macOS 15.6.1 (verification date: 2025-08-28)
62 changes: 62 additions & 0 deletions docs/installation-troubleshooting/macos-poppler-runtime.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: "Poppler runtime missing binaries"
date: "2025-08-28"
verified_on: "2025-08-28"
os: macos
arch: arm64
severity: medium
status: active
tags: [poppler, pdf2image, runtime]
reproducible: true
---

Symptom
-------

Runtime errors indicate Poppler utilities are missing, for example:

```
FileNotFoundError: [Errno 2] No such file or directory: 'pdftoppm'
```

Environment
-----------

- macOS host where Python calls `pdf2image` or other Poppler-dependent tooling

Reproduction steps
------------------

1. Run a PDF conversion that relies on Poppler (e.g., code that uses `pdf2image`).
2. Observe the FileNotFoundError or similar runtime error.

Resolution
----------

Install Poppler via Homebrew:

```bash
brew install poppler
```

Verify the binaries are on PATH:

```bash
which pdftoppm
which pdftotext
```

Notes
-----

`pdf2image` requires host-installed Poppler binaries. Pixi manages Python packages but not host-level system packages; install the latter with Homebrew, apt, or your OS package manager.

References
----------

- https://poppler.freedesktop.org/

Verified on
----------

- Apple M1 Pro (2021), macOS 15.6.1 (verification date: 2025-08-28)
59 changes: 59 additions & 0 deletions docs/installation-troubleshooting/macos-syntaxwarning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
title: "SyntaxWarning: invalid escape sequence in crawl4ai/prompts.py"
date: "2025-08-28"
verified_on: "2025-08-28"
os: macos
arch: arm64
severity: low
status: active
tags: [python, warning, syntax]
reproducible: true
---

Symptom
-------

You may see a warning like:

```
SyntaxWarning: invalid escape sequence '\`'
```

Environment
-----------

- A Python file in the repository (example: `crawl4ai/prompts.py`) contains a string with a backslash followed by a character that doesn't form a valid escape sequence.

Reproduction steps
------------------

1. Run code that imports or executes the affected module (or run linting/tests).
2. Observe the SyntaxWarning printed at import time or during execution.

Resolution
----------

Update the affected string to use a valid escape or a raw string. For example, either double-escape the backslash:

```py
# before
text = "This contains a backslash and a backtick: \`"

# after
text = "This contains a backslash and a backtick: \\\`"
```

or mark the string as raw (if that semantics is acceptable):

```py
text = r"This contains a backslash and a backtick: \`"
```

Notes
-----

This is a low-severity, linter-level issue; it won't usually break runtime behavior but fixing it keeps console output clean and avoids noisy CI logs.

References
----------
- Python string literal docs: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
Loading