Find & Replace
Batch find and replace text across hundreds of PDFs
Content-stream editing with .NET regex, capture groups, per-pair page ranges, and CSV-driven multi-replace. Preserves fonts, layout, and digital signatures on the pages you do not touch. Built-in on-device OCR now handles scanned and image-only PDFs in the same pass — no separate OCR step — and forensic-grade de-identification removes the original text permanently from the file bytes.
Download Complimentary TrialUnder the hood
Why content-stream find and replace is different from search-and-replace in a viewer
A PDF reader's Find dialog walks the document, highlights matches, and asks the user to retype each one. That works for one file. It does not work for two hundred. PDF Batch Editor instead opens each PDF's content stream — the sequence of text-showing operators (Tj, TJ, ', ") that put glyphs on the page — locates every match, and rewrites the operators with new strings.
The result inherits the original font, size, and color. The replacement is written into the same content stream as the original, so the file is still a valid, signable, searchable PDF when you are done. Text-fitting modes adjust the replacement string when its rendered width differs from the original — Adaptive combines font scaling and horizontal compression for the best visual fit, Preserve Width keeps the original bounding box and adjusts character spacing, Fit to Page rescales the font, and None writes at the original size and lets the layout shift.
When the file already carries a digital signature, PDF Batch Editor uses an incremental update by default: changes are appended to the file after the existing %%EOF, leaving the original revision intact. Signatures attached to pages you did not modify stay valid. Signatures attached to a page you did modify are invalidated — which is exactly what the PDF specification requires.
De-identification
Forensic-grade removal — the original text is gone, not just hidden
For routine editing — updating dates, fixing typos, rebranding boilerplate — incremental save is exactly right: it keeps signatures valid on pages you didn't touch, and the original text living in a prior revision of the file is harmless. For de-identification, that prior revision is the problem. A scanned name, a patient identifier, an account number: if any of it survives in the file bytes, a forensic tool, hex editor, or PDF revision-history walker can pull it back out. This is the root cause of nearly every public PDF redaction failure on record — from the 2016 court filing to ongoing FOIA disclosures.
PDF Batch Editor's find & replace edits the text-showing operators inside the content stream directly — the replaced characters are genuinely gone from the page content, not hidden behind a black rectangle or a white overlay. When the output mode includes the post-replace optimizer pass (default for de-identification workflows), the file is then collapsed to a single clean revision and every superseded original is dropped. After this step the original text cannot be recovered from the file bytes by any forensic tool, content-stream extractor, hex editor, or PDF revision-history utility — the bytes simply do not exist in the file.
Verifying this is easy: a single, clean revision contains exactly one %%EOF marker. Two or more means prior revisions — possibly with the original text — are still present. Open one of the cleaned outputs in any hex editor and search for the original value; it won't be there.
This is the foundation of the app's HIPAA-conscious workflow for medical document processing — clinics use it to swap patient names with internal codes (or strip them entirely) before sending charts to third-party processors, with the assurance that what they're sending genuinely no longer contains the PHI.
Scanned & image-only PDFs
Built-in OCR — find and replace inside scans, no separate step
Earlier versions could only touch the content stream — the actual text operators drawn on the page. A flatbed scan has none: it is a picture of a document, and there was nothing to find. You had to run a separate OCR tool first. That is no longer the case. PDF Batch Editor now classifies every file as you load it — born-digital, scanned, or mixed — and routes each one to the right engine automatically.
When a page is a scan, a built-in OCR engine rasterizes it, recognizes the text in the image, and locates every occurrence of your search term — the same terms and CSV pairs you would use on a digital file. Because the text only exists as pixels, the replacement is not a content-stream edit: PDF Batch Editor burns the matched region out of the page image and draws your replacement text on top. The original characters are gone from the pixels — not whited out, not boxed over — which makes this the right tool for de-identifying scanned charts, statements, and forms, not just relabeling them.
A mixed document — a born-digital contract with a scanned signature page, say — is handled in one run: digital pages go through content-stream replacement, scanned pages through OCR-and-burn. You define the pairs once and PDF Batch Editor picks the correct mechanism per page.
OCR runs entirely on your machine — no page image is ever uploaded to a cloud service, which matters when the scans are medical records or signed agreements. Recognition quality depends on scan resolution; for faint or low-DPI source material you can raise the rasterization DPI. Pages that are not a single full-page image are flagged for review rather than altered unsafely, so a stray logo or stamp never gets silently painted over.
Pattern matching
Regex, case sensitivity, and whole-word matching
Each pair has independent regex, case-sensitivity, and whole-word toggles. The regex engine is .NET System.Text.RegularExpressions — full support for character classes, quantifiers, anchors, lookaheads, lookbehinds, and capture groups. Replacement strings reference captured groups with $1, $2, etc.
A few patterns that come up in real document sets:
Reformat US dates from MM/DD/YYYY to ISO YYYY-MM-DD
Find: \b(\d{2})/(\d{2})/(\d{4})\b
Replace: $3-$1-$2
Mask all but the last four digits of an account number
Find: \b\d{12}(\d{4})\b
Replace: ************$1
Update a versioned product name across an entire library
Find: \bAcme Suite v\d+(\.\d+)*\b
Replace: Acme Suite 2026
Strip a draft watermark line from every page header
Find: ^DRAFT — do not distribute$
Replace: (empty)
Whole-word matching uses an IsLetterOrDigit boundary check on the characters before and after the match. Searching Smith with whole-word on matches Smith, but skips Smithfield and Smithy. Case-sensitive matching is on by default; flip it off to make Acme, ACME, and acme equivalent.
Multi-replace
CSV-driven substitutions for hundreds of pairs
Annual updates are rarely one substitution. They are dates, addresses, contact names, internal codes, version strings — sometimes hundreds of pairs across a contract library. Build the change set in Excel or Google Sheets, save as CSV, and import:
find,replace
2025,2026
Q4 2025,Q1 2026
123 Old St.,500 New Ave.
"Acme Holdings","Acme Holdings, Inc."
v3.4,v3.5
Each row becomes one pair, applied in order. Pairs chain through temp files internally, so a later pair sees the result of an earlier one — useful when a substitution depends on an earlier one. Mix imported pairs with hand-edited pairs, expand any row after import to override its options individually, and export the current pair list back to CSV for version control or sharing with colleagues.
Per-pair controls
Page ranges, signatures, formatting, and conditional skipping
Each find/replace pair has its own independent settings, accessible by expanding the pair row.
Page range
Restrict replacement to specific pages with comma-separated values and hyphenated ranges — e.g. 1-3, 5, 8-10. Pages outside the range are written through unchanged. Useful when a contract template re-uses the same boilerplate but the substitution should only land in the cover page or the signature page.
Incremental save
Default on. Appends changes to the file after the existing %%EOF rather than rewriting the document, which is what protects digital signatures attached to unmodified pages. For de-identification (HIPAA, PII removal), keep incremental save on and let the app's post-replace optimizer pass collapse the result to a single clean revision — this drops every prior revision so the original text cannot be recovered, while preserving embedded subset fonts faithfully. See the De-identification section above for the full pattern.
Skip if no match
When set, files where this pair found no matches are not written to the output folder — no empty copy, no suffix collision. Combined with multi-pair processing, this lets you target a small substitution across a large library and only produce output for the files actually affected.
Bold, underline, strikethrough, color, highlight
Visual markup applied to the replacement text. Useful when the find-and-replace pass is itself a review step — replace each instance of "to be confirmed" with the same string but underlined and red, so legal can spot every remaining placeholder at a glance.
Edge cases
Scanned PDFs, signed pages, encoding, and form fields
Scanned and image-only PDFs. These are handled automatically — no separate OCR step. PDF Batch Editor classifies each file on load; when a page is a scan, the built-in OCR engine rasterizes it, recognizes the text, locates your search term in the image, and burns the matched pixels out, drawing the replacement on top. See the Scanned & image-only PDFs section above for how this works. Recognition depends on scan quality, so raise the rasterization DPI for faint or low-resolution source material; pages that are not a single full-page image are flagged for review rather than altered.
Signed PDFs. Incremental save (default on) preserves the prior revision, so signatures over pages you do not modify continue to verify. Signatures attached to a modified page are invalidated by design — that is the entire point of a signature. If you need to keep all signatures intact, restrict the pair's page range so it does not touch a signed page.
Custom encodings and Identity-H fonts. Most PDFs use a ToUnicode CMap that maps glyph IDs back to readable characters. PDF Batch Editor extracts text through this CMap, so well-formed PDFs work transparently. PDFs generated by older or hand-rolled tools sometimes ship without a CMap or with a broken one — in those cases the live preview's match count may under-report even though the underlying content stream still contains your search text. If a file's preview shows zero matches but you can clearly read the text on the page, the PDF likely lacks a usable text-extraction layer.
AcroForm fields. Find/Replace targets the visible page content stream. AcroForm field values live in a separate dictionary; once a form widget's appearance has been generated and embedded into the page (the usual case after a form is filled), the visible text is matched and replaced like any other content. Empty, unfilled forms are best handled by the dedicated batch form-fill module.
Use Cases
Real-world batch text replacement
Legal & Contracts
A paralegal needs to replace a party name across 47 case documents. Load the entire matter folder, define one pair, run. Two minutes instead of four hours, and incremental save keeps the witness affidavits' existing signatures intact.
Annual Policy Updates
HR updates 300 policy documents every January — new dates, addresses, benefit numbers. The whole change set lives in a CSV; one import, one execute, the entire library is current. Skip-if-no-match means policies that don't reference the changed values are not touched.
Template Customization
A sales team maintains 50 proposal templates. When the company renames a product or moves a tagline, a single regex pair updates every template at once — capture groups preserve any version suffix the original copy carried.
Frequently Asked Questions
Does this work on scanned PDFs?
Yes — there is no separate OCR step anymore. PDF Batch Editor classifies every document on load. Born-digital PDFs are edited through the content stream; for scanned or image-only pages, built-in on-device OCR locates your search term inside the page image, burns the matched pixels out, and draws the replacement on top. Nothing is uploaded to a server. Pages that are not a single full-page image are flagged for review rather than altered unsafely. For faint or low-resolution scans, raise the rasterization DPI to improve recognition.
Can I use capture groups like $1 and $2 in the replacement?
Yes. Enable the Replace regex toggle on the pair, and you can reference $1, $2, etc. in the replacement string. Combined with the Find regex toggle, this lets you reformat structured strings — capture a date in (\d{2})/(\d{2})/(\d{4}) and rewrite it as $3-$1-$2. The find pattern uses .NET Regex syntax.
Will replacing text invalidate digital signatures?
Only on the pages that change. PDF Batch Editor uses incremental updates (enabled by default per pair), which append modifications to the file rather than rewriting the whole document. Signatures attached to pages you do not touch stay valid; signatures attached to a modified page are invalidated — the correct PDF spec behavior.
How do I limit replacement to specific pages?
Each pair has a Page range field that accepts comma-separated pages and hyphenated ranges, e.g. 1-5, 8, 12-15. Pages outside the range are not modified. Leave the field blank to apply across the entire document.
What does Whole word matching actually do?
It checks that each match is bounded by a non-letter, non-digit character (or by the start/end of the document). Searching Smith with whole-word on matches Smith. but skips Smithfield and Smithy. The check uses .NET's IsLetterOrDigit, which works across most Latin and Unicode scripts.
Can I import find/replace pairs from a spreadsheet?
Yes. Save the spreadsheet as a CSV with two columns — find,replace — and import. Each row becomes one pair. Per-pair options (regex, case sensitivity, page range, formatting) inherit from the defaults you set; expand any pair after import to override its options individually.
When I replace text, is the original really gone — or could a forensic tool still recover it?
For de-identification, it's really gone. The replace operation edits the text-showing operators inside the page content stream directly, so the replaced characters are not just visually hidden — they are removed from the page content itself. The app then finalizes the output through a clean-rewrite pass that drops every prior file revision, leaving a single-revision PDF (exactly one %%EOF marker) with the original text no longer present in the bytes. You can verify by opening the output in a hex editor or running it through a PDF text-extraction tool — the original value will not appear. This is the foundation of the HIPAA-conscious workflow medical clinics use to strip PHI before sending charts to third parties.
Stop editing PDFs one at a time
Replace text across your entire PDF library in seconds. Complimentary 14-day trial.
Download Complimentary Trial