PDF OCR — Extract Text from Scanned PDFs

Turn image-based or scanned PDFs into searchable, selectable text. Powered by Tesseract.js — runs entirely in your browser with no file uploads.

Drop a scanned PDF here or click to upload

Stays on your device · no server upload

menu_book

How to Use PDF OCR

To extract text from a scanned PDF, upload the file using the upload area. Select the language of the document (English is selected by default; French, German, Spanish, Italian, Portuguese, Chinese Simplified, and Arabic are also supported). Choose your output format: Searchable PDF creates a new PDF with the original page images and an invisible text layer that makes the document searchable and copy-pasteable; Text file outputs a plain .txt file with all extracted text.

Adjust the render quality slider if needed — higher values improve OCR accuracy at the cost of processing time. Click Run OCR. The tool renders each page using PDF.js, passes it to Tesseract.js for recognition, and shows a live text preview as each page is processed. When all pages are done, download your searchable PDF or text file.

help

Frequently Asked Questions

What is OCR and when do I need it? expand_more

OCR (Optical Character Recognition) converts images of text into actual, machine-readable text. You need it when your PDF was created by scanning a paper document or by taking a photo — in those cases the PDF contains only images, not selectable text.

How accurate is the OCR? expand_more

Accuracy depends on scan quality and language. Clean, high-resolution scans of printed text in English or major European languages typically achieve 95–99% accuracy. Handwriting, low-resolution scans, unusual fonts, or lesser-supported languages will have lower accuracy.

What languages are supported? expand_more

The tool currently supports English, French, German, Spanish, Italian, Portuguese, Chinese (Simplified), and Arabic. The language model is downloaded automatically when you select a language — allow a few seconds for the first download.

Is my PDF uploaded to a server? expand_more

No. All processing — rendering, OCR, and PDF creation — runs locally in your browser using PDF.js, Tesseract.js, and pdf-lib. Your file never leaves your device.

Why is OCR slow? expand_more

OCR is computationally intensive. Each page is rendered at high resolution, then analysed word-by-word by the Tesseract neural network. A 10-page document typically takes 30–90 seconds. Use the 1× render quality setting for faster results on clean documents.