PDF OCR Batch-processing with ScanSnap

The ScanSnap S1500M (but also the S1500 as well as the earlier models S510 and S510M) comes with ABBYY FineReader for OCR support. This is great for creating searchable PDFs. Especially handy on a Mac with Spotlight. The problem is that in the default setup, each scan is OCRed right after the scan and depending on the age your machine (my G5 is getting a little long in the tooth) in can take quite a while. When you’re in the process of scanning many hundred’s of pages of paper documents, you don’t want to have to wait for the computer to do it’s OCR recognition, you’d rather feed it all the documents and let it do OCR while you’re doing something else.

Fortunately, this is possible. Reading all the way through the handbook as well as through the ABBYY online help I found out that you can scan to PDF only, and then afterwards convert the PDFs with ABBYY FineReader. Here’s how to do it:

  1. Start ScanSnap Manager and set it up for scan to PDF, but without OCR
  2. Do all your scans and save the files to a folder
  3. Locate the “Scan to Searchable PDF” applications in the “ABBYY FineReader for ScanSnap 4.0” folder in your applications folder. Drag it to the Dock.
  4. Start the application and set the OCR preferences using the “Preferences…” menu option. For best results and smallest file sizes, choose the high quality setting and the same format as used for the scan.
  5. Now drag all your previously scanned PDFs onto the “Scan to Searchable PDF” dock icon (up to 100).
  6. Each file will be processed separately and be save in the source location under a new filename. If the original file was named “filename.pdf”, the new processed file will be named “filename processed by FineReader.pdf”

Regarding the OCR preferences, for instance, I scan old account statements in B&W, so I choose B&W for the format. Medium quality results in dithered text output (probably b/c it uses jpeg compression). It seems that when performing OCR from within ScanSnap, it also selects high quality and the same format as used for the scan, because I get very similar results and file sizes using this approach.

Performing OCR of many documents will take a while, so it’s best run overnight. You could spend the time shredding the scanned documents or continuing scanning more documents. Since the ScanSnap Manager does not use ABBYY while scanning, this way both processes can take place in parallel. It would be nice if ScanSnap Manger had this batch processing functionality out of the box, but the workaround presented here works well. Last not least I should mention that the same approach also works with the “Scan to Word” or Scan to Excel” functionality of ABBYY but I don’t have much use for this feature.

7 Responses to PDF OCR Batch-processing with ScanSnap

  1. Hey thanks for this post. I’ve done batch ScanSnap posts using Acrobat but hadn’t seen one using FineReader so this is great. Blogged about it!

  2. Update:
    The same approach does not work well for color scans. When I scan color at 300 dpi and then do the OCR with the high quality color option (for documents), the file size grows approximately 2-3 fold. For color OCR, there are two options: (1) documents, and (2) photo. Interestingly, picking the high quality photo option for OCR creates high quality OCR even of text at only a fraction of the file size. Medium and low quality text setting results in small files sizes and very poor quality. The medium quality text setting is really quite useless. Medium quality photo setting is similar to medium quality text. ==> My choice is the “high quality photo” setting for Color Text OCR.

  3. Hey Tobi – though I understood your first post perfectly, I’m a bit confused by your update – perhaps the options have changed in the latest version of ScanSnap Manager (3.1.11 is what I have)? Any chance you have an update based on your own trials? Thanks!

  4. Thank you for taking the time to post this solution.

    I tried to drag a non-OCR PDF to the scan2pdf.exe file on my PC located in the ScanSnap directory, but nothing happened. I take it this workaround only works on Mac.

    Any PC users have a solution?

    • Bill, I don’t think you can drag onto an icon on Windows but you can open the OCR app first and then open the non-OCR’d documents from within.

      Also note that ABBYY will only OCR documents scanned with the ScanSnap.

  5. I have the ABBYY FineReader for ScanSnap 4.0 open, and I don’t see any option to select or open a file.

    The General Options tab features language settings and the Scan to Searchable PDF tab has only the settings you described in the post.

    I can’t drop non-OCR’d documents onto the ABBYY window, either. I’m striking out on replicating this on PC.

  6. Pingback: Doing OCR Batch Processing Using The ScanSnap And ABBYY FineReader | Tips To Learn How To Go Paperless | DocumentSnap Paperless Blog

Leave a Reply

Your email address will not be published. Required fields are marked *