2
54
N. Liolios et al. / Pattern Recognition 35 (2002) 253}264
other possible ways. Finally, a signi"cant amount of salt
and pepper noise may be present in addition to hand-
written notes on the margins or even company seals at
various places on the page.
A system that can be adapted for commercial use
should be able to deal with all these problems while
minimizing user intervention.
methods. The most plausible methods proposed in the
literature are summarized next:
Use of special symbols or structures in the form: These
symbols are initially located on the new form and then
compared to a prototype to determine the degree of
rotation or translation [6,7]. This approach requires the
form to be redesigned and therefore it may not be suit-
able for processing the existing types of forms.
A form presented for processing to such a system has,
"
rst, to be identi"ed. This step is necessary if the system
Detection of vertical and horizontal lines from which the
skew and shift can be determined [7]: The projection
pro"le is used to determine the location of the lines that
correspond to the highest peaks in the pro"les. One
obvious drawback of this approach is the necessity for
long lines in the form so as to guarantee dominant high
peaks in the projections. Another de"ciency is the necess-
ity to correct rotation before identi"cation, since a ro-
tated form has a very low pro"le when projected on the
vertical axes (horizontal projection). As a result of this
limitation, a generic method of document skew detection
has to be used.
Special patterns of line crossings can be used, the loca-
tion of which when detected can determine the amount of
skew and translation to be corrected [6]. The patterns or
line crossings can be used as a feature vector for form
identi"cation as well. The obvious drawback is that the
system does rely on the form having these line crossings.
Such a system cannot always be trained with a new type
of form that is not designed with these line crossings. The
types of line crossings and their locations on the blank
form template have to be speci"ed interactively during
training.
should be able to handle several di!erent types of forms.
It is also desirable to have the capability to train it on
new types of forms, as the form may change or get
replaced over time.
A new blank form that the system is trained on is called
a form template, or prototype. Along with the template,
the system has to store information about the location of
the areas where the user supplied information is to be
found, so that it can be properly extracted.
After the identi"cation, the best matching template is
known but the text from the "elds (user "lled informa-
tion) cannot be extracted yet, since the form may have
a varying degree of skew and it is most likely shifted
during either copying or scanning.
Skew and shift have to be determined and the form has
to be rotated and shifted in opposite directions so that an
exact match to one of the stored templates is obtained.
For small rotation angles (up to 13), it is more e$cient to
rotate the template instead of the form. This is not the
best choice, however, for large rotation angles for two
reasons:
(
a) The "eld de"nitions are more easily handled as
rectangular areas de"ned by their upper-left and
bottom-right points.
b) If they are left rectangular they will most likely cut
into the preprinted parts of the form which remains
skewed.
A variety of skew detection methods have been pro-
posed in the literature; they include Projection Pro"le
[8], Hough Transform [4], Nearest}Neighbor Cluster-
ing [3], Bounding Box Detection [6] or A combination
of the above methods [9].
All these approaches (except nearest-neighbor cluster-
ing) are usually able to deal with rather small skew angles
of ($153). It is noteworthy to mention, however, that
this range ($153) of angles is not a limitation to OCR
systems. Assuming reasonable care during scanning, this
type of skew error is seldom introduced. The Hough
transform method is rather computationally intensive
and is usually avoided. Some variations, however, which
require reduced amount of computation have at times
been introduced [5].
(
After identi"cation, correction for rotation and shift,
the original "eld locations, as de"ned initially by the user
on the blank form prototype, can then be used to extract
the corresponding information from the incoming form.
The order in which form identi"cation, skew and shift
detection is performed varies widely in the methods pro-
posed in the literature. The order proposed here is rather
unique to the method presented in this paper.
Several generic algorithms have been presented in the
past that can be used to detect skew [3,4] and shift [5}7],
but all of them are either expensive, as far as the compu-
tation time is concerned, or they do not work for every
type of form. A system that is based on these algorithms
would not be generic enough for any type of form and it
may su!er in response time so as to restrict its use to
o!-line batch processing.
The nearest-neighbor clustering method does not have
any of the above limitations. It is rather fast but its
accuracy is within the neighborhood of $1.53. This type
of limitation cannot always be tolerated, especially when
the data has to be extracted from speci"c locations on
a preprinted form (i.e., an application form) before they
are forwarded for processing by the OCR system.
The method we propose here, does not have any of the
limitations described above for either the skew detection
or the form identi"cation part. It is much more generic
Systems in existence today that we know of, rely on
either one, or a combination of form identi"cation