Redact
Permanently redact sensitive data — not just cover it up
Pattern-based content-stream redaction for SSNs, emails, phone numbers, credit cards, and custom regex. Removes the underlying text, not a black overlay. Built for HIPAA, GDPR, CCPA, and FOIA workflows.
Download Complimentary TrialThe compliance distinction
True redaction is destruction. A black box is concealment.
Most "redaction" tools paint a black rectangle on top of the text. The visible page looks redacted. The PDF's content stream — the sequence of Tj and TJ operators that draw glyphs — still contains the original characters in plaintext. Anyone with Acrobat, a hex editor, or a free Python library can recover them in seconds. This is the failure mode behind every public news story about a leaked court filing, redacted SEC document, or mishandled FOIA release.
True redaction removes the matched text from the content stream itself. There is no underlying character data left to recover. The replacement — either a black rectangle that occupies the original text's footprint, or a fixed string like [REDACTED] — is what gets drawn in its place. PDF Batch Editor performs true redaction by default; it does not ship a "highlighter" mode.
This is the distinction every compliance auditor checks for. HIPAA, GDPR, CCPA, FOIA — none of them treat a visual overlay as sufficient. They check the file. So should the tool that produces it.
Built-in patterns
Detectors for the data formats that appear in every batch
Every built-in detector is a tunable regex. Edit it, disable it, copy it as a starting point for your own.
Social Security Numbers
Matches the standard XXX-XX-XXXX format, the space-separated XXX XX XXXX form, and the unhyphenated nine-digit form — bounded so it doesn't catch fragments of larger numbers. Tune the regex if your data uses non-standard variants.
Email addresses
RFC-aligned regex catching standard local-part and domain combinations, including subdomains and country-code TLDs. Matches typical real-world addresses with + tags and dots in the local part; skips stray @ symbols in code or commit hashes.
Phone numbers
North American (NANP) ten-digit format with or without parentheses, hyphens, or dots. International E.164 format with country-code prefix. Edit the pattern if your dataset has formats the default doesn't cover.
Credit card numbers
Sixteen-digit Visa, Mastercard, and Discover formats with or without separators; fifteen-digit American Express. Combine with a custom regex for less-common card formats your processor accepts.
Custom regex
Add unlimited patterns for organization-specific formats — medical record numbers (MRN-\d{8}), employee IDs (E\d{6}), internal account codes, classification markings ((SECRET|TOP SECRET)//[A-Z]+). Each pattern has its own label, regex, and enable toggle.
Pattern management
Build the pattern set you need for a workflow, then disable rather than delete the patterns you don't need on a particular run — the next run will probably want them again. Patterns are stored as a labeled list with one regex per row.
Beyond the visible page
Metadata, annotations, and the layers most tools forget
Sensitive content does not only live on the rendered page. A PDF carries a document Info dictionary (/Title, /Author, /Subject, /Keywords, /Creator, /Producer) that often contains the author's name, the original file path, and the application that produced it. It carries an XMP metadata packet with timestamps, revision history, and reader-supplied tags. Annotations — comments, sticky notes, redaction markups themselves — have their own contents fields. Form fields preserve every value the user typed. OCR'd scans contain a hidden text layer behind the page image.
PDF Batch Editor's redaction operation removes matched text from page content streams — the visible page text and any hidden OCR layer that sits in the same stream. For full document sanitization before public release, the recommended workflow chains operations together: redact first, then optimize (which strips unused objects from the file), then apply the security/encryption pass to control how the released file can be used downstream. The Batch Pipeline module saves this entire chain as a one-click workflow.
The point: a clean visible page is necessary, not sufficient. A real release-ready file requires checking every place text and identity can hide.
Compliance frameworks
HIPAA, GDPR, CCPA, FOIA — what auditors actually check
HIPAA (US healthcare)
The Privacy Rule's de-identification standard (45 CFR § 164.514) requires removing 18 categories of Protected Health Information — names, dates, geographic identifiers smaller than a state, MRNs, account numbers, biometrics, and more. Auditors check the disclosed file, not a screenshot of it. True content-stream removal is the baseline.
GDPR (EU personal data)
Article 17 ("right to erasure") and Article 32 (security of processing) both require that personal data, when removed, is actually removed — not concealed under an overlay that the data subject can lift. Anonymization that is technically reversible is not anonymization under Recital 26.
CCPA / CPRA (California)
The California Privacy Rights Act treats reversibly-concealed data the same as disclosed data for most purposes. If a recipient can extract the underlying text from a "redacted" file with standard tools, the disclosure has happened. Use true content-stream redaction or accept the disclosure.
FOIA (US government)
FOIA exemptions (b)(6) and (b)(7)(C) require redacting personal-privacy information before release. DOJ's FOIA processing guidelines reference techniques that destroy underlying data — not techniques that overlay it. There is a long public history of agency releases where reviewers thought they had redacted, but the underlying text was still in the file. Don't add to that list.
The summaries above are educational and describe widely-published statutory and regulatory provisions. They are not legal advice. Compliance decisions about specific document sets, data subjects, or disclosure contexts require counsel familiar with your jurisdiction.
Preview, execute, document
Match preview and per-file audit log
Redaction is destructive. The workflow is built around verifying matches before they are committed. The preview pass scans every selected file and returns a per-pattern match count across the batch — exactly how many SSNs, emails, and phone numbers will be touched if you commit. You can tune a pattern that is producing false positives, disable it for this run, and only execute once the preview matches what you expected.
When you execute, every redaction is recorded in the operation log: file name, pattern label, match count, output path, timestamp. This log is your chain-of-custody record. Auditors who want to know what was redacted from a 200-file production set get a clean per-file answer instead of a forensic reconstruction.
Output modes follow the same conventions as the rest of the app: write to a new folder (with optional subfolder structure preservation), add a suffix to filenames, or overwrite in place. For sensitive workflows, prefer "save to folder" so the originals remain available as an unmodified reference until the audit closes.
Use Cases
When sensitive data must be permanently removed
Pre-Audit Cleanup
A compliance officer scrubs SSNs from 500 employee records before an external audit. The built-in SSN pattern catches every format variation. Preview confirms 1,247 matches across the set; execute writes redacted copies to a separate output folder; the operation log goes into the audit binder.
FOIA Response
A government agency processes a Freedom of Information request. Personal email addresses, phone numbers, and SSNs must be removed from 200 documents before public release. Custom patterns add the agency's internal employee ID format. One execute, one log file, ready for the response packet.
Healthcare De-Identification
A hospital prepares clinical records for a research collaboration. The 18 HIPAA identifiers map onto built-in and custom patterns — MRN, DOB, full address, phone, name — and apply across 5,000 charts in a single pipeline run.
Frequently Asked Questions
What is the difference between true redaction and a black-box overlay?
An overlay is a black rectangle painted on top of the visible text. The underlying characters are still in the PDF's content stream — anyone can copy-paste them, search for them, or open the file in a text editor and read them in plaintext. True redaction removes the matched text from the content stream entirely. There is nothing left to recover. PDF Batch Editor performs true redaction; tools that ship a "black highlighter" fail every compliance audit that checks the actual file contents.
Which patterns are built in?
Social Security Numbers (with and without hyphens), email addresses, US and international phone numbers, and credit card numbers. Each pattern is a tunable regex you can edit, disable, or extend. You can also add unlimited custom patterns for organization-specific data — medical record numbers, employee IDs, internal account codes, classification markings.
Can I review every match before redacting?
Yes. The preview pass scans every selected file and returns a per-pattern match count across the batch — exactly how many SSNs, emails, and phone numbers will be touched if you commit. Tune a pattern that is producing false positives, disable it for this run, and only execute once the preview matches what you expected.
Black box or replacement text?
Both. Black-box style replaces the matched run with a black rectangle that fills the original text's footprint. Replacement-text style writes a fixed string in the same location — common choices are [REDACTED], xxx, or a category code like [PHI]. In either case, the original text is removed from the content stream; the choice is purely about what is rendered on top.
What about metadata, annotations, and hidden text layers?
Sensitive content can hide outside the visible page — in the document Info dictionary, the XMP metadata stream, form-field values, annotation contents, and hidden OCR layers. PDF Batch Editor's redaction removes matched text from page content streams; for full document sanitization, chain redaction with Optimize (which strips unused objects) and Security (which controls access to the released file) using the Batch Pipeline.
Does redaction satisfy HIPAA, GDPR, CCPA, and FOIA?
True content-stream redaction is the technical baseline for compliance with all four — none of them accept overlay or highlighter techniques as adequate. The remaining work (deciding what to redact, documenting it in an audit log, training the people who run the tool) is a process question, not a tool question. PDF Batch Editor handles the technical removal correctly and produces a per-file operation log you can attach to your audit trail.
Protect sensitive data at scale
Permanent, irreversible redaction across every PDF. Complimentary 14-day trial.
Download Complimentary Trial