Problems about the scanning speed and OCR function

kwokhin

Honorable
Oct 25, 2013
2
0
10,510
Hello everyone!
I’m a student and I want to scan my books for convenience.
I want a flatbed scanner because I don’t want to destroy the binding of my books.
But here is the question.
Why does it take soooo long to scan a color A4 document?!
I haven’t bought any scanner but as I searched through the products, for example,
The Canon LiDE 110 takes Approx. 16 sec (A4 / 300dpi)
It is very time-consuming!
Sheet-feed scanners usually take 2-3 sec.
I’m wondering it is true that, flatbed scanners take more than 10s to scan an A4 document with 300dpi?
Does the computer spec affect the speed?
I went to electronic stores and found that, multi-purposes printer with scanning function and faxing only costs about US$ 50!
What dpi is appropriate to scan document with photos? Is 900 dpi enough?
Is it the higher the dpi, the higher the chance for the OCR software to recognize the text?
But is it useless to increase the dpi as the words cannot get any clearer than a certain dpi?

My criterion:
Budget : <US$80
Flatbed scanner (The slimmer the better)
Faster scanning speed
No need to film scanning
Which brand do you prefer?

Besides, have anyone tried any OCR software? I want to be able to edit my texts after scanning. Are the texts accurate? Is it reliable? Of course, checking is important. I am afraid that the software can’t ‘recognize’ my text book.

Sorry for my ignorance because I am very new:( Thank you very much! Your help is much appreciated!
 
Solution
When scanning book pages which contain pictures, and at a size that's unlikely to need enlarging, 200 dpi is plenty.
Only use 300 dpi when scanning a real photograph for 1:1 reproduction.
If you intend to enlarge it, increase dpi accordingly.

That's why scanners have a much higher dpi capability (up to 1200 dpi without interpolation) in case you wish to scan a small photo but print it much larger than it's actual size.
When scanning book pages which contain pictures, and at a size that's unlikely to need enlarging, 200 dpi is plenty.
Only use 300 dpi when scanning a real photograph for 1:1 reproduction.
If you intend to enlarge it, increase dpi accordingly.

That's why scanners have a much higher dpi capability (up to 1200 dpi without interpolation) in case you wish to scan a small photo but print it much larger than it's actual size.
 
Solution

kwokhin

Honorable
Oct 25, 2013
2
0
10,510


Thank you very much!
It is true that, flatbed scanners take more than 10s to scan an A4 document with 300dpi?
Does the computer spec affect the speed?
 

RedStripedCat

Honorable
Feb 14, 2014
2
0
10,510
@kwokhin:

Check this link for helpful reviews of scanner: www.pcmag.com/reviews/scanners. It should help you understand what you can expect for your budget.

When you talk about scan time you have to distinguish between the time the flatbed scanner needs (movement of the scan head) and the time the driver/software needs to process the image. Your computer specs will have no impact on the speed of the scan head but on the speed of the processing of the image by the driver/software. That is why scanner manufacturers usually give only the time for the scan head. Most of the time they omit the time the scan head needs to move back to the home position to start the next scan. This way they make the scanner appear faster than it actually is. The time for this back movement differs between scanners. Often you won't find out until you test the scanner yourself. HP is an exception in this regards. They often give times needed to complete certain scan tasks. But obviously these times depend on the performance of the computer with which these times were measured. You could also contact support of scanner manufacturers with the question how long it takes to complete the task you intend to use the scanner for. Some manufacturers are more helpful than others to answer such questions. It's worth a try.

The higher the dpi the slower the scan (both the scan head movement and the software processing). The scan head movement times manufacturers publish are usually at 200 or 300 dpi. If you scan at 600 dpi the time can be twice (or even more) the time needed at 300 dpi.

When contacting support of a scanner manufacturer about how to maximize scan speed I was advised to use a current i7 Intel CPU because they provide more performance per cycle than AMD CPUs. On Tomshardware you can compare CPU performance e.g. for programmes such as Abbyy Finereader or Adobe Acrobat (both are for document scanning) to get an idea about the difference between CPUs.

On dpi: It depends on the optical quality of the scanner what dpi is best. But for book scans 600 dpi should be more than enough. For standard print 300 dpi is enough. Only if you have very small print and you need to have the finest details of pictures of graphics from your books, you might need to go higher. OCR will not improve (according to what I read all over the web) if you go higher than 600 dpi for standard sized print.

Also have a look at book scanning with cameras: www.diybookscanner.org. In the forum there you will also find information on flatbed bookscanners.

The time needed by the scanner driver/software to process the image is heavily affected by the image enhancement features you choose. Some features are very CPU intensive. For example the time for the scan at 300 dpi from a book with my CanoScan 9000F Mark II is almost 4 times longer if I choose "Reduce gutter shadow" as compared to no image enhancement. With "Reduce moiré" it almost doubles. For book scans with a normal flatbed scanner (i.e. not a flatbed book scanner) you might need "Reduce gutter shadow" if you don't like the shadows at the spine of the book (remember the black and white photocopies of books where such shadow is very visible at the area of the spine – that is what "Reduce gutter shadow" removes or tries to remove). If you have graphs and pictures in your books you might want to use "Reduce moiré" (sometimes also called Descreen, an effect of this will be to reduce the file size).

I would not recommend to use a "slim" scanner because those scanners use CIS for their optics. For book scanning (if you put both sides of your book on the flatbed) CCD is better because it has a higher depth of field. Because if you put the whole book on the flatbed, the part of the page that is close to the spine will be further away from the glass than the rest of the page. With CIS this part will be out of focus and your OCR will have more trouble to recognize that text than when it is sharp as with CCD scanners (you might know this blurring from photocopiers).

On OCR software: Usually all flatbed scanners come with integrated or separate OCR software. At your budget I rather would buy a better scanner than invest the same money in extra OCR software. Only if you need to have the exact layout of the book in your word processing software an extra OCR software is worth the money. The OCR bundled with flatbed scanners is usually good enough. Anyway, you will never get 100 % accurate OCR results, you always need to proofread if very high accuracy is important for you. You will find helpful discussions about OCR on the forum at www.diybookscanner.org.

To improve your speed you might want to separate the physical process of scanning from the post-processing (i.e. image enhancement such as auto-crop, OCR, etc). To do this, scan to TIFF (this format allows for multipage document images) and later process the TIFF with whatever you need (e.g. OCR). This way you will not need to wait until your computer does the processing of a page until it is ready to acquire the next image from the scanner for processing.

I hope this was helpful. Unfortunately, there is no solution for book scanning that is quick, easy and cheap at the same time. Many people looked into this (see for example www.diybookscanner.org) and it seems nobody found a solution yet that is all three in one.

Some libraries have special (very expensive) book scanners. You might get quicker and better results if you take your books to such a library and scan it with their professional book scanner.