Compress PDFs by knowing where the size actually is

Image downsampling at a target DPI, font subsetting to the glyphs actually used, unused-object removal, and stream recompression. Web, Balanced, Print, and Custom profiles with before/after measurement.

Download Complimentary Trial

200 reports · 60% smaller · Ready for email

Where the bytes in a PDF actually go

"Compressing a PDF" is a vague action because PDFs aren't a single thing. A 30 MB document gets to 30 MB through some combination of high-DPI embedded images, fully-embedded font glyph tables, accumulated unused objects from prior revisions, redundant resource dictionaries, and uncompressed content streams. Optimization that doesn't know which of those surfaces is doing the damage produces unimpressive results — you save 5%, you keep the bloat.

PDF Batch Editor's optimizer addresses each surface independently. The next four sections walk through what each technique does and when it pays off. The compression profiles (Web, Balanced, Print, Custom) are presets that combine these techniques at different intensities; the Custom profile lets you control each independently.

Image downsampling and recompression

In most documents that aren't pure text, embedded raster images dominate the file size. A scan saved at 600 DPI is twice as many pixels per inch in each dimension as a 300 DPI scan — four times the data, for an output that displays the same on screen and prints indistinguishably from 300 DPI on most office printers. Downsampling computes the effective DPI of each embedded image relative to its rendered size on the page, and resamples it to a target DPI when the original is higher.

Targets per profile:

Web        96 DPI    Aggressive — for screen viewing and email
Balanced   150 DPI   Sweet spot — readable on screen, prints well at small sizes
Print      300 DPI   Conservative — print-shop quality
Custom     any       Set per-image-type and per-format

After downsampling, images are recompressed with format-appropriate codecs — JPEG for photographic content, lossless (Flate or JBIG2) for line art and screenshots. The Custom profile lets you choose codec per image type, set quality factors for JPEG independently, and exclude image classes (e.g. don't downsample any image used as a logo) from the pass.

Font subsetting to the glyphs actually used

When a PDF embeds a font, it usually embeds the whole thing — every glyph, every weight, every style. A standard Latin font has roughly 250 glyphs at one weight. A Pan-European font has 800. A CJK font has tens of thousands. Embedding the full glyph table for a font your document barely uses is dead weight on every page reference.

Subsetting analyzes the content streams to find which glyphs are actually rendered, and writes a new font program containing only those. A document that uses one weight of one font with 200 unique characters can drop a 400 KB embedded font down to 30 KB. Across a 200-document batch with three or four fonts each, that is meaningful aggregate savings — and it changes nothing about how the document looks.

Unused objects, duplicate resources, and stream recompression

Every time a PDF is edited and saved with an incremental update, the previous revision's objects stay in the file — they are simply no longer referenced. A document that's been through ten revisions accumulates ten generations of unused content. The optimizer rebuilds the cross-reference table without the dead objects.

Resources can also duplicate. A logo embedded into 50 forms by a poorly-implemented form generator may end up as 50 copies of the same image. The optimizer detects byte-identical resources and consolidates references to a single copy.

Finally, content streams that were saved uncompressed (or with weak compression) are re-compressed using FlateDecode at a higher level. Object streams group related dictionary objects into a single compressed stream, reducing per-object overhead. None of these techniques touch what the document looks like.

Before/after measurement and the operation log

Optimization without measurement is hand-waving. Every batch produces a per-file before/after report: original size, optimized size, percentage reduction, and the dominant technique that saved the most bytes (so you know whether the win came from images, fonts, or structural cleanup). The total batch savings are summed at the bottom — "200 files, 4.7 GB → 1.9 GB, 60% reduction."

Files that were already efficient (a 200 KB text-only PDF with subset fonts and no images) pass through with a near-zero change instead of being touched unnecessarily. The operation log records every file processed so the audit trail matches the actual work done.

When file size matters

Email Distribution

A marketing team sends 200 product catalogs by email. Each PDF is 15 MB — over the inbox limit. Web profile downsamples images to 96 DPI, subsets fonts, and brings each file to 3 MB. Same content, deliverable.

Public-Facing Web Distribution

A government agency publishes 500 annual reports. Linearization makes the first page render before the rest of the file finishes downloading; image downsampling at 150 DPI makes the bandwidth bill smaller; font subsetting trims the second-largest contributor.

Cold Archive Storage

A company archives 10,000 scanned documents annually. Balanced profile across the lot reclaims 60% of object storage cost without sacrificing readability for future retrieval — and the operation log makes the savings auditable to finance.

Frequently Asked Questions

Where does the size in a PDF actually come from?

In most documents that aren't pure text, embedded images dominate — high-DPI scans, screenshots saved as PNG, decorative photos at full sensor resolution. Embedded fonts are the next-largest contributor, followed by unused objects from prior revisions, redundant resource dictionaries, and uncompressed content streams. The optimizer addresses each of these surfaces.

What does image downsampling actually do?

Each embedded raster image is checked against a target effective DPI. If an image is being rendered at, say, 600 DPI but the page only uses it at 150 DPI, the pixels above 150 DPI are doing nothing visible — they just take up space. Downsampling discards those pixels and recompresses the result. The Web profile targets 96 DPI, Balanced 150 DPI, Print 300 DPI.

What is font subsetting?

When a PDF embeds a font, it usually embeds the entire glyph table — every character of every weight and style, even the ones the document never uses. Subsetting analyzes which glyphs actually appear on the page and embeds only those. A document that uses one weight of one font with 200 unique characters can drop a 400 KB embedded font down to 30 KB.

Will optimization break PDF/A or signed documents?

Optimization rewrites the file structurally, which invalidates any signature attached to it and can break PDF/A conformance flags depending on the techniques applied. For signed or PDF/A documents, run optimization first as part of the production pipeline (before signing or before PDF/A conversion), not afterward. The Batch Pipeline module makes that ordering explicit.

What is linearization (Fast Web View)?

Linearization rearranges the file so a PDF viewer can render the first page before downloading the rest of the document. It costs a small amount of file size but dramatically improves perceived load time when the file is served over HTTP. Recommended for any PDF that will be hosted on the public web.

Shrink your PDF library by 60%

Image downsampling, font subsetting, structural cleanup — with measurement so you know what worked. Complimentary 14-day trial.

Download Complimentary Trial

Windows & macOS · No credit card required · All features included