ZeroUtil
PDF Tools

Why Your PDF Tool Should Run in the Browser, Not the Cloud

What actually leaks when you upload a PDF to a random online tool, what modern in-browser PDF libraries can genuinely do, and a checklist for before you upload anything sensitive.

By · · 7 min read

The pitch of every “Smallpdf-alike” site is the same: drag your file in, get a processed file back, move on with your day. For a random CV or a brochure for a local business, fine. For a signed contract, a tax return, medical records, or anything with a client’s name on it — that upload is worth thinking about.

This isn’t about being paranoid. It’s about understanding what you’re actually doing when you hand a file to a service you found via a Google ad.

What leaks when you upload a PDF

Most people’s mental model is: “I upload the file, the server processes it, I download the result, the file is gone.” That model is optimistic in at least three ways.

The file itself

The moment you upload, the file lands on someone’s infrastructure. Where it goes from there depends entirely on policies you didn’t read. It might be:

  • Stored for 24 hours (common)
  • Stored indefinitely unless you explicitly delete it (less common but real)
  • Processed by a pipeline that includes humans reviewing output for quality control
  • Indexed for training a document AI (increasingly common, often buried in ToS)

Free tiers on these services are usually subsidized by something. Sometimes it’s ads. Sometimes it’s your data.

Metadata embedded in the PDF itself

PDF files carry a surprising amount of metadata in their XMP and DocInfo dictionaries. A contract you exported from Word will typically include:

  • Author name — pulled from the OS user account that created the document
  • Organization — set in Office/system settings, often your company name
  • Creation timestamp — when the document was first created
  • Modification timestamp — when it was last saved
  • Software — “Microsoft Word for Microsoft 365”, “Adobe Acrobat 23.x”, etc.
  • Original filename — sometimes embedded if the document was converted from another format

You might have scrubbed the document body carefully. The metadata tells a different story.

The “redacted” content problem

PDF redaction done wrong — drawing a black rectangle on top of text instead of actually removing the underlying content — is a well-documented failure mode. The redacted text remains in the file’s content stream, selectable and searchable. When you upload such a file to any processing service, they receive the full unredacted content, regardless of what it looks like visually.

This isn’t hypothetical. Court filings, government documents, and corporate disclosures have all had this happen publicly.

Filename and request metadata

Even if the processing service discards your file immediately, their server logs will contain your IP address, the filename you uploaded, the timestamp, and your approximate geographic location. If the filename is acme-corp-acquisition-term-sheet-2026.pdf, that’s already information.

What in-browser PDF tools can actually do

The pessimistic take used to be that browser-based PDF tools were toys — limited to simple operations, poor compression, no real editing. That’s changed materially in the last three years.

pdf-lib

pdf-lib is a pure JavaScript library that runs entirely in the browser. It can:

  • Merge and split documents
  • Add, remove, and reorder pages
  • Add text, images, and shapes to existing pages
  • Fill PDF form fields
  • Set and remove passwords (user password / owner password, RC4 and AES encryption)
  • Embed fonts
  • Read and modify document metadata

For the overwhelming majority of “I need to do something to this PDF” tasks, pdf-lib covers it. The output is a standard PDF/1.x file. Our PDF Merger , PDF Splitter , and PDF Password Protect tools are all built on this.

PDF.js

Mozilla’s PDF.js is primarily a renderer — it’s what powers Firefox’s built-in PDF viewer. It can parse and render existing PDFs to canvas, which makes it useful for preview and text extraction, but it’s not a creation or editing library.

WebAssembly-compiled Ghostscript and MuPDF

For compression specifically — particularly reducing file size via downsampling embedded images, re-compressing streams, and removing duplicate objects — the most capable option is a WebAssembly build of a C library like Ghostscript or MuPDF. These are large (~10–20 MB WASM bundles) but they bring near-native-quality compression to the browser.

Our PDF Compressor uses this approach: the WASM module loads once, processes your file locally, and the compressed PDF is offered for download without any server involvement.

What in-browser tools can’t do well yet

It’s worth being honest about the gaps:

OCR on scanned documents. Turning a scanned image PDF into searchable text requires a trained OCR model. Tesseract.js runs in the browser but is slow on large documents and produces mediocre results compared to server-side Google Vision or AWS Textract. If you need quality OCR, you need a backend — the question is whether you trust the one you’re sending your document to.

Digital signature validation. Verifying a PDF signature requires checking against a certificate authority chain and often requires OCSP/CRL lookups. This is network-dependent by design and not something you can fully validate offline.

Extreme compression. The theoretical maximum compression requires analyzing content deeply — detecting duplicate images, optimizing font subsets, stripping unreferenced objects — in ways that heavy server-side tools do better than current WASM ports.

A practical checklist before you upload any PDF

Use this when you’re about to upload something and you’re not sure whether it matters.

Stop. Ask yourself:

  • Does this document contain names of real people (clients, employees, patients)?
  • Does it contain financial information (salary data, invoices, bank details, tax numbers)?
  • Does it contain legal information (contracts, NDAs, court filings, settlement terms)?
  • Does it contain medical information?
  • Does it contain company-confidential information (roadmaps, M&A activity, source code, internal policies)?
  • Is there embedded metadata you wouldn’t want disclosed (author name, organization, timestamps)?
  • Was any content “redacted” by drawing over it rather than removing it from the file?

If you answered yes to any of those:

  • Use a tool that processes in the browser with no upload (browser DevTools can confirm: Network tab should show no outbound request carrying your file data)
  • Or use a local tool: Preview on macOS, LibreOffice, Adobe Acrobat on your own machine
  • Or use a trusted internal corporate tool that your IT/legal teams have vetted

If you still need to use a cloud service:

  • Read whether they offer a privacy-preserving or end-to-end encrypted option
  • Check whether they are SOC 2 Type II certified and under what jurisdiction
  • Check the data retention policy explicitly — not the marketing page, the privacy policy
  • Check whether file processing is manual, automated, or mixed
  • For very sensitive documents, consider whether you need legal review before any cloud exposure

Why this problem is structural, not just a “choose a reputable service” problem

Even a reputable service with good intentions is a target. A PDF processing service that retains files — even for 24 hours — is a honeypot of potentially sensitive documents. That’s not a criticism of any specific company; it’s a structural reality. The attack surface grows with every upload.

A browser-based tool that never receives your file has no attack surface for your file. There is no database to breach, no storage bucket to misconfigure, no employee with access to your document. The file never leaves your browser’s memory.

This is the core reason in-browser PDF tools matter beyond convenience. It’s not that cloud tools are bad. It’s that the trust model is completely different — and for anything sensitive, the browser-local trust model is strictly simpler.

Tools mentioned in this article

  • PDF Compressor — Optimize PDF file size by re-serializing and stripping unused metadata.
  • PDF Merger — Merge multiple PDF files into a single document with drag-and-drop reordering.
  • PDF Splitter — Extract specific pages or page ranges from a PDF into a new document.
  • PDF Password Protect — Add password protection to PDF documents for secure sharing.

Related articles