PDF Redaction Text Recovery & Display Tool

This repository contains a Python utility for extracting selectable (but visually redacted) text from PDF files and presenting it in a clear, human-readable format while preserving pagination and layout as closely as possible.

The tool is intended for document analysis, archival review, research, and verification of redaction practices It does not bypass encryption or security controls; it only extracts text that remains present in the PDF content stream.

Note - not all files can be unredacted. This tool only works for pooly redacted files. If you get blank spaces, the file has been properly redacted.

What This Tool Does

Many PDFs are “redacted” by placing opaque black rectangles over text without actually removing the underlying text objects. In such cases, the text remains selectable and copy-pastable.

This tool:

Extracts that underlying text using positional information
Reconstructs lines to avoid word overlap and run-on text
Preserves original page size and pagination
Produces display-friendly output in one of two modes

Output Modes

1) Side-by-Side (Recommended)

Each output page is double-width:

Left: Original PDF page (unchanged)
Right: Rebuilt, unredacted text positioned to match the original layout

This mode is ideal for:

Review and comparison
Presentations or exhibits
Auditing redaction practices

Example:

2) White-Text Overlay

The extracted text is drawn in white directly on top of the original PDF.

If black redaction bars are present, the text often becomes visible without explicitly detecting or modifying the bars.

This mode is useful for:

Visual inspection
Demonstrating improper redactions

How It Works

pdfplumber extracts words along with their bounding boxes
Words are grouped into lines based on vertical proximity
Horizontal spacing is reconstructed from word gaps
PyMuPDF (pymupdf) is used to:
- Embed original pages
- Draw rebuilt text with precise positioning
- Generate side-by-side or overlay output

No OCR is performed.

Installation

uv sync

Use

uv run redact_extract.py

usage: redact_extract.py [-h] [-o OUTPUT] [--mode {side_by_side,overlay_white}] [--line-tol LINE_TOL] [--space-unit SPACE_UNIT]
                         [--min-spaces MIN_SPACES]
                         input_pdf
redact_extract.py: error: the following arguments are required: input_pdf

Statistics

Track what text was actually recovered from under redaction bars with the --stats flag:

python redact_extract.py example.pdf --stats

Output:

🔍 Unredaction Results
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Redaction boxes found:   42
Words recovered:         387
Characters recovered:    2,156
Recovery rate:           12.3% of text was hidden
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total extracted:         3,429 words
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Export stats to JSON:

python redact_extract.py example.pdf --stats-json stats.json

The tool detects black-filled rectangles (redaction boxes) and measures which extracted words were hidden underneath them.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
examples		examples
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
redact_extract.py		redact_extract.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF Redaction Text Recovery & Display Tool

What This Tool Does

Output Modes

1) Side-by-Side (Recommended)

2) White-Text Overlay

How It Works

Installation

Use

Statistics

About

Uh oh!

Releases

Packages

Contributors 6

Languages

License

leedrake5/unredact

Folders and files

Latest commit

History

Repository files navigation

PDF Redaction Text Recovery & Display Tool

What This Tool Does

Output Modes

1) Side-by-Side (Recommended)

2) White-Text Overlay

How It Works

Installation

Use

Statistics

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages