GOCR (GNU Optical Character Recognition)
Here you can find OS/2 executables of GOCR Version 0.3.5 if you want to play
with it.
There are two packages available. One compiled with GCC 3.0.2 and one compiled
with WATCOM.
The GCC 3.0.2 - version is a little bit faster on my machine.
Here are the links:
OCR-Quality has
improved in relation to the 0.3.4 - version.
The project homepage of GOCR is: http://jocr.sourceforge.net
Usage:
- Type gocr -h for usage.
- Example 'one-liner' of a scan2text.cmd:
scanimage --device=epson --mode=Gray --resolution=300 | gocr - > textfile.txt
- Another example:
scanimage --device=epson --mode=Gray --resolution=300 1>out.pnm 2>out.error && gocr out.pnm > ocr.txt
Hints:
- If the image is complex or the letters are small, gocr is quite slooow. (expect duration of serveral minutes!).
- I suggest that you make your first tests with small scans.
How to compile:
- with GCC 3.0.2
- get and install os2unix
os2unix -all
delete make.bat
sh configure
make
Compiling wiht emx-gcc required 2 more steps:
copy src\libPgm2asc.a src\Pgm2asc.a
make
- with WATCOM
- get and install the openwatcom compiler
- copy
time.h containing:
#if !defined (_TIMEVAL)
#define _TIMEVAL
struct timeval
{
long tv_sec;
long tv_usec;
};
#endif
to x:\WATCOM\h\sys
- Open watcom ide and create a new projectfile with the name
gocr.wpj
in ...\gocr-0.3.5\src\
chose OS/2 - 32-bit executable
- Add all the
.h and .c files in
...\gocr-0.3.5\src\ to the source-code-list of the project
(gocr.exe window)
- Actions -> make all
Known limitations of the WATCOM - Version:
- It doesn't report elapsed time because
gettimeofday() does not exist in Watcom-Libs
Franz Bakan, 4. April 2006