PDF Sanitize
Deep-clean PDFs by removing JavaScript, metadata, embedded files, PieceInfo, and AcroForm elements. Toggle each category independently. No upload required.
How to Use
Sanitize your PDF in four steps:
- Upload your PDF — Drag and drop the file or click the dropzone to browse. The tool reads the file locally and immediately scans for hidden elements: JavaScript actions, metadata fields, PieceInfo dictionaries, embedded file attachments, and AcroForm definitions.
- Review the scan results — Each category displays the number of items found. Categories with zero items are automatically disabled. The tool auto-selects all categories where items were detected, so you can sanitize everything with one click.
- Toggle categories — Deselect any category you want to preserve. For example, you might keep metadata but remove JavaScript and embedded files. Each checkbox controls an independent sanitization pass.
- Click "Sanitize PDF" — The tool processes the document and shows a before-and-after file size comparison along with a summary of what was removed. Download the sanitized PDF when ready.
The entire operation runs in your browser using pdf-lib. Your PDF is never uploaded to any server, making this tool safe for classified, legally privileged, or otherwise sensitive documents.
About This Tool
PDF sanitization is the process of stripping non-visual, potentially dangerous, or privacy-compromising elements from a PDF document while leaving the visible page content untouched. Unlike simple metadata editing or annotation removal, sanitization targets the deeper structural components of the PDF format that most users never see but that can carry significant security and privacy risks.
JavaScript in PDFs is stored in three locations within the document structure. The /OpenAction entry in the document catalog triggers code when the PDF is opened — this is the most common vector for malicious PDF exploits. The /AA (Additional Actions) dictionary on the catalog or individual pages fires scripts on events like page open, page close, or print. The /Names/JavaScript name tree contains named scripts that can be invoked by form fields or other actions. Removing all three locations ensures no executable code remains in the document.
Metadata in PDFs exists in two forms. The Info Dictionary stores standard fields — Title, Author, Subject, Keywords, Creator (the application that created the original document), and Producer (the application that converted it to PDF). The XMP Metadata stream is an XML-based metadata format (defined by ISO 16684) that can contain far more information: GPS coordinates, edit history, software versions, and custom properties. Government agencies and law firms routinely sanitize both metadata formats before public release to prevent information leakage.
PieceInfo is a private data mechanism defined in the PDF specification (ISO 32000-2:2020, Section 14.5). Applications like Adobe Illustrator store the complete .ai source file inside PieceInfo, allowing round-trip editing. InDesign embeds layout data, and Photoshop stores layer information. This data can double or triple the file size and exposes your original design assets to anyone who knows where to look. Removing PieceInfo strips this proprietary payload without affecting the rendered appearance.
Embedded files are stored in the /Names/EmbeddedFiles name tree. PDFs can carry any file type as an attachment — spreadsheets, executables, archives, or additional PDFs. While legitimate uses exist (PDF portfolios, email attachments saved as PDF), embedded files can also serve as a vector for delivering malware or exfiltrating data. Sanitization removes these attachments entirely.
AcroForm is the PDF interactive forms standard. The /AcroForm dictionary in the document catalog defines the form field hierarchy, default values, calculation order, validation scripts, and submit actions. Removing AcroForm strips all form interactivity — text inputs, checkboxes, dropdowns, radio buttons, and signature fields — along with any associated JavaScript validation or calculation scripts. The visual appearance of filled-in fields is lost (unlike flattening, which preserves it), making this option appropriate when you need to eliminate all interactive elements regardless of their visual state.
Why Use This Tool
PDF sanitization addresses critical needs across security, privacy, compliance, and file optimization workflows:
- Security hardening — Malicious PDFs are a top attack vector in phishing campaigns. CVE databases list hundreds of PDF JavaScript exploits targeting Adobe Acrobat, Foxit Reader, and browser-based viewers. Stripping JavaScript, embedded files, and form actions neutralizes these attack surfaces. Organizations that receive PDFs from untrusted sources use sanitization as a standard preprocessing step before allowing documents into their network.
- Privacy and FOIA compliance — Metadata can reveal author names, organizational details, software versions, file paths, and edit timestamps that should not appear in public-facing documents. The U.S. National Security Agency published a guide specifically on removing hidden data from PDFs before public release. XMP metadata can contain GPS coordinates, camera data, and revision history that constitute a privacy risk.
- Legal document preparation — Before filing documents with courts, submitting evidence, or distributing contracts, attorneys sanitize PDFs to remove internal metadata, review comments stored as embedded objects, and any JavaScript that could interfere with court document management systems. A 2023 American Bar Association survey found that 15% of attorneys had inadvertently disclosed privileged information through unsanitized PDF metadata.
- File size reduction — PieceInfo from design applications can account for 50-80% of a PDF's file size. An Illustrator-exported PDF that appears to be a simple logo might contain the entire
.aisource file embedded in PieceInfo. Removing this data can reduce file sizes dramatically — a 12MB PDF might shrink to 2MB after PieceInfo removal. Embedded file attachments also contribute unnecessary bulk. - Archival preparation — Long-term archival standards like PDF/A (ISO 19005) prohibit JavaScript, restrict embedded files, and limit metadata to specific schemas. Sanitizing a PDF before archival conversion removes elements that would cause PDF/A validation failures.
- Cross-platform compatibility — JavaScript actions, AcroForm scripts, and embedded files may behave differently or fail entirely across different PDF viewers. Mobile PDF readers, browser-based viewers, and lightweight desktop applications often cannot execute JavaScript or process embedded files. Sanitizing ensures the document renders consistently everywhere by removing viewer-dependent features.
Processing PDFs locally in your browser is essential for sanitization workflows. The documents most likely to need sanitization — security-sensitive files, legally privileged materials, classified documents, and PDFs from untrusted sources — are precisely the documents you should never upload to a cloud service. This tool guarantees that your document data stays on your device throughout the entire sanitization process, with no network requests and no third-party access.