Tom's Hardware > Forum > Computer Peripherals > Scanners > OCR software that will scan 1000 docs automatically?

OCR software that will scan 1000 docs automatically?

Forum Computer Peripherals : Scanners - OCR software that will scan 1000 docs automatically?

Tom's Hardware: Over 1.4 million members in 6 different countries available to answer all your high-tech questions. Sign up now! Its free!
Word :    Username :           
 

Archived from groups: alt.comp.periphs.scanner (More info?)

 

I have around 1000 PDFs in a series of subfolders. The PDFs are a mix
of text and image-only PDFs.

I would like an OCR program that
a) can be set to automatically look at each and all PDFs in a folder,
including subfolders, to determine if the PDF is image-only text;

b) does OCR on image-only PDFs only; and

c) overwrites the image-only PDF with an image-on-text PDF

Can anyone recommend an OCR program that can do this?

I am running Win XP on a medium-spec machine.

Thanks in advance
Matt

Sponsored Links
Register or log in to remove.

Archived from groups: alt.comp.periphs.scanner (More info?)

 

<mattb02@gmail.com> wrote in message
news:1121887988.984426.173380@o13g2000cwo.googlegroups.com...
>I have around 1000 PDFs in a series of subfolders. The PDFs are a mix
> of text and image-only PDFs.
>
> I would like an OCR program that
> a) can be set to automatically look at each and all PDFs in a folder,
> including subfolders, to determine if the PDF is image-only text;
>
> b) does OCR on image-only PDFs only; and
>
> c) overwrites the image-only PDF with an image-on-text PDF
>
> Can anyone recommend an OCR program that can do this?
>
> I am running Win XP on a medium-spec machine.
>
> Thanks in advance
> Matt
>
There seem to be 5 types of PDF files.
PDF (Normal):
The PDF file can be viewed and searched in a PDF viewer and edited in a PDF
editor.

PDF Edited:
The PDF file can be viewed, searched and edited.

PDF with image on text:
The PDF file is viewable only and cannot be modified in a PDF editor. The
original images are exported, but there is a linked text file behind each
image, so the text can be searched.

PDF with image substitutes:
As for PDF (Normal), but word containing reject and suspect characters have
image overlays, so these uncertain words display as they were in the
original document. The PDF file can be viewed, searched and edited.

PDF, image only:
The original images are exported. The PDF file is viewable only and cannot
be modified in a PDF editor and text cannot be searched.

Since you already have PDFs, you may be only able to change the current
configuration with the only software the can edit PDFs, Adobe Acrobat family
of products. Acrobat Professional is the top of the line, Acrobat Standard
has fewer features.

http://www.adobe.com/products/acrobat/main.html

There are several other software that can create PDFs, but not many that can
edit a PDF after it is created.

I do not think that there is a low cost solution to editing PDF.

--
CSM1
http://www.carlmcmillan.com
--

Reply to Anonymous
- 0 +

Archived from groups: alt.comp.periphs.scanner (More info?)

 

www.bookscanning.com does conversion services like pdf to word - word
to pdf. Check it out at
www.bookscanning.com

Thomas

Reply to Thomas
Tom's Hardware > Forum > Computer Peripherals > Scanners > OCR software that will scan 1000 docs automatically?
Go to:

There are 1019 identified and unidentified users. To see the list of identified users, Click here.

Please mind

You are about to answer a thread that has been inactive for more than 6 months.
If you still wish to proceed, please ensure that your posting is original and does not duplicate or overlap any prior responses to this thread.

Add a reply Cancel
Sponsored links
  • Ask the community now
  • Publish
Ad
They won a badge
Join us in greeting them