Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,5 @@ __pypackages__/

# Added by cargo
/target

docs/installation-troubleshooting/_generated_index.md
92 changes: 92 additions & 0 deletions docs/installation-troubleshooting/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Installation Troubleshooting

This directory contains concise, platform-specific troubleshooting notes focused on reproducible diagnostics and resolutions.

Files:

- `macos.md` — macOS troubleshooting index and links to per-entry files.
- `macos-poppler-build.md` — Poppler build failure entry.
- `macos-poppler-runtime.md` — Poppler runtime binaries entry.
- `macos-tesseract.md` — Tesseract / pytesseract entry.
- `linux.md` — placeholder for Linux troubleshooting entries.
- `windows.md` — placeholder for Windows troubleshooting entries.

Contribution guidance

When adding a troubleshooting entry, include the following fields to make the entry reproducible and searchable:

- Title: short, descriptive title.
- Symptom: exact error message or observable behavior.
- Environment: OS, architecture (x86_64 / arm64), Python version, Pixi version (if relevant), and any other relevant software versions.
- Reproduction steps: exact commands or minimal steps to reproduce the issue.
- Resolution: exact commands and a brief explanation of why the resolution works.
- References: links to upstream docs, issues, or relevant resources.


Format example (minimal template with YAML frontmatter for new troubleshooting entries):

```yaml
---
title: "Brief title"
date: "YYYY-MM-DD" # Use YYYY-MM-DD (e.g., "2025-08-28")
verified_on: null # Set to validation date (e.g., "2025-08-29") after verification
os: macos # single value or a list: ["macos", "linux", "windows"]
arch: any # one of: "x86_64", "arm64", "any"
severity: low # one of: "low", "medium", "high"
status: active # "active" or "deprecated"
tags: [technology, error-type] # e.g., [poppler, pdf2image]
reproducible: true
---
```

Symptom: copy-paste of error/output
Environment: e.g. macOS 13.5, arm64, Python 3.12, pixi 0.1
Reproduction steps:
1. command1
2. command2
Resolution:
- command(s)
- short explanation
References: link(s)

Notes:

- Keep entries concise and diagnostic-first; avoid duplicating general installation instructions already present in `README.rst` or Pixi docs.
- Per-OS files for clarity and searchability. If an entry is cross-platform, note that in the Environment field and add it to each relevant file.

Maintenance guidance:

- Each entry should include `date` and, when possible, `verified_on` to indicate when it was validated. Maintainers should mark `status: deprecated` if an entry becomes obsolete and point to the canonical install docs.
- Automated checks should verify frontmatter presence and basic fields.

## Docs tooling

The repository provides a small local tool to validate troubleshooting entries and generate a consolidated index. This tool is optional and non-blocking by default.

Run the validator and index generator inside the project's Pixi environment:

```bash
pixi run python3 tools/generate_troubleshooting_index.py
```

Use `--strict` to make the tool exit non-zero when required frontmatter is missing (useful for CI once the team decides to enforce metadata):

```bash
pixi run python3 tools/generate_troubleshooting_index.py --strict
```

Minimal frontmatter template for an entry (keep entries short and link to canonical README installation instructions):

```yaml
---
title: "Short descriptive title"
date: 2025-08-28
os: macos
arch: any
status: active
tags: [poppler, pdf2image]
---
```

The tool writes `docs/installation-troubleshooting/_generated_index.md`. You do not need to commit the generated file; it can be regenerated on demand. For now, the script will only warn about missing metadata unless `--strict` is used.

10 changes: 10 additions & 0 deletions docs/installation-troubleshooting/linux.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Linux troubleshooting (landing)

This page points to Linux-specific troubleshooting files. Create per-issue markdown files in this directory and follow the frontmatter template in `index.md`.

Examples to consider:

- package manager issues (apt/yum) for Poppler/Tesseract
- Python wheel compatibility (GLIBC/C++ ABI) problems

Add new entries as focused, reproducible files; avoid duplicating canonical installation docs.
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---
title: "Pixi warning: conda-packages overridden by PyPI"
date: "2025-08-28"
verified_on: "2025-08-28"
os: macos
arch: any
severity: low
status: active
tags: [pixi, conda, pip, environment]
reproducible: true
---

Symptom
-------

When running `pixi shell -e pdev`, you may see a warning such as:

```
WARN These conda-packages will be overridden by pypi:
yarg
```

Environment
-----------

- Using Pixi to create or enter a project environment that mixes Conda and PyPI package sources.

Reproduction steps
------------------

1. Run `pixi shell -e pdev` (or the equivalent Pixi environment creation command).
2. Observe the warning during environment resolution.

Resolution
----------

This warning indicates that a package available from both Conda and PyPI will be installed from PyPI instead of the Conda channel. Usually this is safe, but if you rely on the Conda-built binary (for example, for performance or compatibility), align the package source by:

- Pinning the package to the Conda channel in the Pixi/Pixienv configuration.
- Removing the PyPI override from the project's requirements if Conda should be authoritative.

Notes
-----

Most users can safely ignore this warning unless they encounter runtime issues tied to a specific build of the package.
77 changes: 77 additions & 0 deletions docs/installation-troubleshooting/macos-poppler-build.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: "Poppler build failure: missing macOS SDK headers"
date: "2025-08-28"
verified_on: "2025-08-28"
os: macos
arch: arm64
severity: medium
status: active
tags: [poppler, build, xcode]
reproducible: true
---

Symptom
-------

Compilation fails with errors like:

```
fatal error: 'ctime' file not found
```

or other C/C++ header errors when installing packages that build native extensions (for example, Poppler utilities or bindings).

Environment
-----------

- macOS host (any recent release)
- Building Python packages that include C/C++ extensions

Reproduction steps
------------------

1. Create or update the Pixi environment that installs packages requiring native builds (or run `pip install` for such packages).
2. Observe the build failure during the compilation step.

Resolution
----------

1. Install or update the Xcode Command Line Tools:

```bash
xcode-select --install
```

2. If the system points to the wrong developer directory, reset it:

```bash
sudo xcode-select --reset
```

3. If prompted, accept the Xcode license:

```bash
sudo xcodebuild -license
```

4. If the build still cannot find headers, export a valid SDK path for the build process and retry:

```bash
export SDKROOT=$(xcrun --sdk macosx --show-sdk-path)
# then re-run the install or build
```

Notes
-----

These steps provide the compiler and headers required to build C/C++ extensions. Prefer installing Poppler via the OS package manager (Homebrew) when possible to avoid local compilation.

References
----------

- https://developer.apple.com

Verified on
----------

- Apple M1 Pro (2021), macOS 15.6.1 (verification date: 2025-08-28)
62 changes: 62 additions & 0 deletions docs/installation-troubleshooting/macos-poppler-runtime.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
title: "Poppler runtime missing binaries"
date: "2025-08-28"
verified_on: "2025-08-28"
os: macos
arch: arm64
severity: medium
status: active
tags: [poppler, pdf2image, runtime]
reproducible: true
---

Symptom
-------

Runtime errors indicate Poppler utilities are missing, for example:

```
FileNotFoundError: [Errno 2] No such file or directory: 'pdftoppm'
```

Environment
-----------

- macOS host where Python calls `pdf2image` or other Poppler-dependent tooling

Reproduction steps
------------------

1. Run a PDF conversion that relies on Poppler (e.g., code that uses `pdf2image`).
2. Observe the FileNotFoundError or similar runtime error.

Resolution
----------

Install Poppler via Homebrew:

```bash
brew install poppler
```

Verify the binaries are on PATH:

```bash
which pdftoppm
which pdftotext
```

Notes
-----

`pdf2image` requires host-installed Poppler binaries. Pixi manages Python packages but not host-level system packages; install the latter with Homebrew, apt, or your OS package manager.

References
----------

- https://poppler.freedesktop.org/

Verified on
----------

- Apple M1 Pro (2021), macOS 15.6.1 (verification date: 2025-08-28)
59 changes: 59 additions & 0 deletions docs/installation-troubleshooting/macos-syntaxwarning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
---
title: "SyntaxWarning: invalid escape sequence in crawl4ai/prompts.py"
date: "2025-08-28"
verified_on: "2025-08-28"
os: macos
arch: arm64
severity: low
status: active
tags: [python, warning, syntax]
reproducible: true
---

Symptom
-------

You may see a warning like:

```
SyntaxWarning: invalid escape sequence '\`'
```

Environment
-----------

- A Python file in the repository (example: `crawl4ai/prompts.py`) contains a string with a backslash followed by a character that doesn't form a valid escape sequence.

Reproduction steps
------------------

1. Run code that imports or executes the affected module (or run linting/tests).
2. Observe the SyntaxWarning printed at import time or during execution.

Resolution
----------

Update the affected string to use a valid escape or a raw string. For example, either double-escape the backslash:

```py
# before
text = "This contains a backslash and a backtick: \`"

# after
text = "This contains a backslash and a backtick: \\\`"
```

or mark the string as raw (if that semantics is acceptable):

```py
text = r"This contains a backslash and a backtick: \`"
```

Notes
-----

This is a low-severity, linter-level issue; it won't usually break runtime behavior but fixing it keeps console output clean and avoids noisy CI logs.

References
----------
- Python string literal docs: https://docs.python.org/3/reference/lexical_analysis.html#string-and-bytes-literals
Loading