Skip to content
DevToolKit

PDF to Excel Converter

Extract tables from PDF documents and convert to Excel XLSX spreadsheets. Automatic column detection, frozen headers, and auto-filter — all processed locally in your browser.

pdf

Drop your PDF here, or click to browse

Files are processed entirely in your browser — never uploaded

Processed locally
Was this tool helpful?

How to Use

Convert PDF tables to Excel spreadsheets in three steps:

  1. Upload your PDF — Drag and drop your file into the dropzone or click to browse. The tool reads the document locally in your browser to determine the page count. No data is transmitted to any server at any point during the process.
  2. Set a page range (optional) — To extract tables from specific pages, enter a range like 1-5 or individual pages like 1,3,7. Leave the field empty to process all pages. This is useful for large reports where tables appear only on certain pages.
  3. Click "Convert to Excel" and download — The tool processes each page using PDF.js to extract text coordinates, clusters text into rows, detects column boundaries from whitespace gaps, and then assembles each page's table into a separate sheet in the XLSX file. A progress indicator tracks page-by-page conversion. Once complete, the stats bar shows sheet count, total rows, and file size. Click "Download .xlsx" to save the spreadsheet.

Each sheet in the output file includes a frozen header row and auto-filter, so you can immediately sort and analyze the data in Excel, Google Sheets, or LibreOffice Calc without any additional formatting.

About This Tool

Extracting tabular data from PDF documents is a common challenge because the PDF format was designed for fixed-layout visual reproduction, not for structured data exchange. Unlike spreadsheets that store data in rows and columns with defined relationships, a PDF table is rendered as individual text fragments positioned at precise coordinates on the page. The concept of "table" exists only visually for the human reader -- the file format itself has no table data structure.

This tool reconstructs tabular structure from positional clues. It uses Mozilla's PDF.js library to parse each page's content stream and extract every text item along with its 6-element affine transformation matrix. The matrix encodes the item's position (x, y), scale, and rotation on the page. From these coordinates, the algorithm performs two key steps: row clustering and column detection.

Row clustering groups text items whose Y-coordinates fall within a proximity threshold of approximately 3 points (about 1mm). Items on the same visual line are grouped together regardless of how the PDF generator split them into separate text operations. Within each row, items are sorted by X-coordinate to establish left-to-right reading order. Column detection then analyzes the X-positions of all items across all rows, identifying significant horizontal gaps (15+ points) that separate columns. These gap-based boundaries define the grid structure into which each text item is placed.

The output uses the XLSX format (Office Open XML Spreadsheet), generated client-side by SheetJS. Each page that contains detected tabular data becomes a separate worksheet named "Page N". The first row of each sheet is frozen and auto-filter is enabled, matching the conventions that spreadsheet users expect for data analysis. This "stream" parsing approach -- detecting tables from text alignment rather than from visible cell borders -- works with both bordered and borderless tables, which is particularly important since many PDF generators omit drawn gridlines even for visually obvious tables.

Limitations of coordinate-based table extraction include multi-line cell content (which may split into separate rows), merged cells (which lack explicit structural markers in the PDF), and inconsistent column alignment across pages. Documents with complex nested tables, rotated text, or overlapping form fields may produce imperfect results. For raw CSV output without Excel formatting, the Extract Tables from PDF tool provides a lighter-weight alternative. For plain text extraction without table structure, use PDF to Text.

Why Use This Tool

Converting PDF tables to Excel spreadsheets serves a wide range of professional, academic, and personal workflows:

  • Financial analysis — Bank statements, quarterly reports, and investor filings frequently arrive as PDF. Converting them to Excel enables pivot tables, formulas, and charts without manual data entry. A single quarterly report might contain 50-100 rows of financial data across multiple pages that would take an hour to retype.
  • Government and regulatory data — Census tables, tax schedules, procurement records, and statistical reports published by government agencies are overwhelmingly distributed as PDF. Researchers and analysts need this data in spreadsheet form for aggregation, trend analysis, and cross-referencing with other datasets.
  • Supply chain and procurement — Vendor price lists, purchase orders, and inventory reports circulate as PDF between organizations that use different software systems. Converting to Excel enables comparison formulas, VLOOKUP matching, and automated reorder calculations.
  • Academic research — Published papers include data tables that researchers need to reanalyze, combine with their own data, or verify statistically. Extracting these tables into a spreadsheet eliminates transcription errors that could compromise research integrity. Use the Word Counter to check extracted text metrics.
  • Real estate and insurance — Comparative market analyses, appraisal reports, and claims data arrive as PDF. Converting to Excel enables sorting by value, filtering by criteria, and calculating averages that inform pricing and coverage decisions.
  • Healthcare administration — Patient registries, drug formularies, and billing code tables distributed as PDF need spreadsheet conversion for database import, audit compliance, and reimbursement calculations.

Privacy is critical when working with documents that contain financial, medical, or proprietary information. This tool processes everything locally in your browser -- the PDF never leaves your device, and no data is transmitted to any server. For additional PDF data workflows, try Extract Tables from PDF for CSV export, PDF to JSON for structured coordinate data, or PDF to Text for plain text extraction.

FAQ

How does the table detection work?
The tool extracts text items with their coordinates from each PDF page using PDF.js. It clusters items into rows by Y-coordinate proximity, then detects column boundaries by analyzing whitespace gaps between X positions.
Does it work with scanned PDFs?
No. This tool extracts embedded text from digital (born-digital) PDFs. Scanned PDFs contain only images of text and require OCR, which this tool does not perform.
Can I convert specific pages?
Yes. Enter a page range like '1-5' or '1,3,7' to extract tables from specific pages only. Each page with detected tables becomes a separate sheet in the Excel file.
Are my PDF files uploaded to a server?
No. All processing runs entirely in your browser using PDF.js and SheetJS. Your documents never leave your device.
What Excel features does the output include?
The generated XLSX file includes frozen header rows and auto-filter on each sheet, making it ready for sorting and analysis in Excel, Google Sheets, or LibreOffice Calc.