scrubfile

Scrub PII from PDFs, images, and DOCX files. Local-only. One command.

$pip install scrubfile

Why scrubfile?

🔒

100% Local

No cloud APIs. No data leaves your machine. Zero network calls after model download.

📄

Multi-Format

PDF, PNG, JPG, TIFF, BMP, DOCX. One tool handles them all.

🧠

Auto-Detect PII

Names, SSNs, emails, phones, addresses, credit cards, and 20+ entity types via Presidio + spaCy.

⚙️

Permanent Redaction

Text removed from PDF content stream. Not a visual overlay — the data is gone.

🤖

MCP Server

AI agents (Claude, Cursor) can redact documents directly via Model Context Protocol.

📦

JSON + CLI + API

Machine-readable output. Python API. CLI with rich output. Built for automation.

Simple by design

# Redact specific terms
$ scrubfile document.pdf -r "John Doe" -r "123-45-6789"

# Auto-detect all PII
$ scrubfile document.pdf --auto

# Python API
from scrubfile import redact
result = redact("document.pdf", auto=True)

How it compares

scrubfile Adobe Acrobat Google DLP Presidio
Local-only Yes Yes No Yes
Multi-format PDF, images, DOCX PDF only Text/images Text only
CLI Yes No No No
Auto-detect PII Yes No Yes Yes
Agent-ready (MCP) Yes No No No
Free Yes $240/yr Pay per call Yes