Extract Images from PDF
Extract all embedded images from any PDF with original resolution, automatic format detection, deduplication, and ZIP download. Runs entirely in your browser.
How to Use
Extract embedded images from your PDF in three steps:
- Upload your PDF — Drag and drop the file onto the upload area or click to browse. The tool loads the PDF entirely in your browser using pdfjs-dist (Mozilla's PDF rendering engine) and begins scanning every page for embedded image objects.
- Review extracted images — A gallery grid displays every unique image found in the document. Each card shows a preview thumbnail, the image dimensions in pixels, the detected format (JPEG or PNG), the file size, and the page number where the image appears. Duplicate images — such as repeated logos in headers — are automatically detected and shown only once.
- Download images — Click the download button on any individual image card to save it, or use the "Download all as ZIP" button to get every extracted image in a single archive. JPEG images are saved in their original encoding without re-compression, preserving exact quality. Non-JPEG images are exported as lossless PNG.
The extraction process walks the PDF operator list for each page, identifying paintImageXObject and inline image operations. Image data is retrieved from the page's object store with full resolution — no downscaling occurs during extraction. Your PDF is never sent to any server.
About This Tool
Images in PDF documents are stored as XObject resources — specifically, Image XObjects defined by the /Subtype /Image entry in their dictionary. Each Image XObject contains the raw pixel data (or compressed stream), the image dimensions, the color space, bits per component, and optional decode parameters. These objects are referenced by name in a page's resource dictionary and drawn onto the page via the Do operator in the content stream.
The PDF specification supports several image compression filters, each suited to different content types. DCTDecode stores JPEG-compressed data — the most common filter for photographs and complex images. When a PDF viewer encounters a DCTDecode stream, it decodes the JPEG data to produce pixel values. Extracting these images ideally preserves the raw JPEG bytes without re-encoding, avoiding any generational quality loss. FlateDecode uses zlib/deflate compression on raw pixel data and is typically used for screenshots, diagrams, and images with sharp edges where lossless compression is preferred. JBIG2Decode and CCITTFaxDecode are specialized for black-and-white content like scanned text pages, achieving high compression ratios on bilevel images. JPXDecode encodes images using JPEG 2000, offering both lossy and lossless modes with superior compression efficiency for large images.
Color spaces add another layer of complexity to PDF image extraction. Images may use /DeviceRGB (screen-oriented, 3 channels), /DeviceCMYK (print-oriented, 4 channels), /DeviceGray (single channel), or ICC-based color spaces with embedded profiles. The pdfjs-dist renderer normalizes all color spaces to RGBA during decoding, which this tool leverages to produce universally compatible output images regardless of the original color model. Images with spot colors or duotone separations are likewise converted to standard RGB.
Inline images — defined between BI and EI operators directly within a page's content stream rather than as separate XObject resources — are a less common but valid way to embed small images. These are typically used for tiny icons, bullet points, or background patterns. The PDF specification recommends inline images only for images under 4 KB, but some PDF generators use them for larger content. This tool extracts inline images alongside standard XObjects, ensuring complete coverage.
Image masks in PDF documents serve a different purpose from displayable images. A soft mask (/SMask) defines per-pixel transparency for another image, functioning like an alpha channel. A stencil mask (/ImageMask true) is a 1-bit image used to paint the current fill color through a pattern of holes. This tool focuses on extracting displayable image content — the visual images you see on the page — rather than auxiliary mask objects, though masks that produce visible content are included in the output.
Deduplication is essential for practical image extraction. PDF documents routinely share image XObjects across multiple pages — a company logo appearing in every page header is stored once and referenced by every page's resource dictionary. Without deduplication, extracting a 50-page corporate document would yield 50 copies of the same logo. This tool computes a lightweight fingerprint from each image's byte length, leading bytes, and trailing bytes to identify duplicates without performing expensive full-buffer comparisons. The fingerprint approach catches identical images referenced under different names across pages while keeping extraction fast.
Why Use This Tool
Extracting images from PDF documents serves a wide range of professional and personal workflows:
- Recovering original photos — When photographs are embedded in reports, presentations, or ebooks distributed as PDF, extracting them recovers the images at their original stored resolution. This is far superior to taking screenshots, which rasterize at screen DPI and lose quality.
- Design asset recovery — Designers frequently receive brand guidelines, style guides, or marketing collateral as PDFs containing logos, icons, and product photos. Extracting these assets in their original format saves time compared to recreating them or requesting source files.
- Archival and cataloging — Museums, libraries, and archives that digitize collections as PDF documents can extract individual images for cataloging in digital asset management systems, where each image needs its own metadata record and thumbnail.
- Legal discovery — In litigation workflows, embedded photographs, charts, and exhibits within PDF evidence bundles may need to be isolated for separate analysis, annotation, or inclusion in court filings.
- Academic research — Researchers extracting charts, graphs, and figures from journal articles for inclusion in literature reviews, presentations, or thesis documents need individual image files rather than full-page screenshots.
- Scanned document processing — OCR workflows sometimes require extracting the raw scanned page images from a PDF to run through specialized OCR engines that accept image input rather than PDF input.
Processing PDFs locally in your browser eliminates the privacy risk inherent in online extraction tools. PDF documents frequently contain proprietary product images, confidential financial charts, personal photographs, medical imaging, or attorney-client privileged exhibits. Cloud-based extractors require uploading the entire PDF to a third-party server, creating data exposure risk and potential compliance violations under GDPR, HIPAA, and similar regulations. This tool guarantees your document and its images never leave your device — all decoding, extraction, and ZIP packaging occurs entirely in browser memory.