Monthly Archives: May 2013

Force Adobe OCR if page contains “renderable text” on Mac

This is a appears to be a difficult problem because there appears to be no straightforward way to remove renderable text. If you try to open the PDF in Preview and print as PDF or even as PostScript, the renderable text still remains.

The solution lies in exporting the PDF to an image format, e.g., TIFF. TIFF is better than JPEG because you don’t get rendering artifacts. Also it supports multi-page documents. PNG does not give artifacts, but also does not allow for multi-page documents.

When exporting to TIFF make sure you select a high DPI, e.g., 600, to preserve quality. The default is only 150 which is too low.

Then open the such created TIFF once more in Preview and re-export as PDF.

The such created PDF should pose no more problems for OCR in Acrobat.