Scholarly article on Hexamethyleneimine 111-49-9 from Chemische Berichte p. 1867,1871

Pattern Recognition 35 (2002) 253}264

On the generalization of the form identi"cation and skew

ଝ

detection problem

N. Liolios*, N. Fakotakis, G. Kokkinakis

Department of Electric and Computer Engineering, Wire Communications Laboratory, University of Patras, Patra 26500, Greece

Received 11 June 1999; accepted 5 January 2001

Abstract

A new method is proposed to solve the document identi"cation and skew detection problem. It can be applied to

a widely used subclass of documents which resemble in style an application form. Unlike other approaches, we make no

assumptions about the nature and/or style of the printed form. An attempt is made to solve the problem in the most

general sense. The method presented here does not rely on any special features such as patterns of line crossings, or

dominant lines, or even special symbols found only on specially designed forms. The Power Spectral Density of the

horizontal projection pro"le of the form is used as a shift invariant feature vector. The Karhunen}Loeve transform is

employed to de-correlate and reduce the length of the feature vectors in the training set. Training is done in such a way

that no rotations of the unknown form are necessary during recognition. The eigenvectors of the covariance matrix of

the power spectral densities for the training set, along with learning vector quantization, were used for training, and

the Euclidean distance, for recognition. A limitation related to the amount of skew that the system can handle is

alleviated with the use of a known skew detection method. ꢀ 2001 Pattern Recognition Society. Published by Elsevier

Keywords: Form identi"cation; Skew detection; Shift detection; Power spectrum; Karhunen}Loeve transformation; Learning vector

quantization

1

. Introduction

Methods and tools have been developed in the past [2]

that to a satisfactory degree solve the problem of printed

character recognition and to a less than satisfactory

degree the problem of handwritten character recognition.

The problem of optical character recognition inherits

added complexity when the document to be processed is

a preprinted form (i.e., an application form), with "elds

that are initially blank and are then "lled by the cus-

tomer. The user-supplied information in the prede"ned

areas (form "elds), can be in either handwritten or ma-

chine typed form. To make things worse the document

might be contaminated with noise, and it may be of poor

resolution (i.e., fax transmission), it may also have been

shifted and skewed because of incorrect positioning dur-

ing either copying or scanning. The form may also have

been stretched in a non-uniform way, due to a non-linear

motor speed of a copy machine, or deformed in several

O$ce automation is steadily decreasing the number

of documents that contain handwritten parts, but com-

panies are still processing a vast amount of documents

manually. Even the companies that are in the process of

becoming fully o$ce-automated have a need for a system

that will convert the old documents into a suitable elec-

tronic format [1].

ଝ

This work is supported by the European Commission,

Telematics Applications Program, project LE-1 1802 ACCeSS.

*

Corresponding author. Tel.: #30-61-991-722; fax: #30-

6

1-991-855.

E-mail address: nliolios@teilam.gr (N. Liolios).

0

PII: S 003 1 - 3 2 03 ( 01 ) 0003 0- 9

2

54

N. Liolios et al. / Pattern Recognition 35 (2002) 253}264

other possible ways. Finally, a signi"cant amount of salt

and pepper noise may be present in addition to hand-

written notes on the margins or even company seals at

various places on the page.

A system that can be adapted for commercial use

should be able to deal with all these problems while

minimizing user intervention.

methods. The most plausible methods proposed in the

literature are summarized next:

Use of special symbols or structures in the form: These

symbols are initially located on the new form and then

compared to a prototype to determine the degree of

rotation or translation [6,7]. This approach requires the

form to be redesigned and therefore it may not be suit-

able for processing the existing types of forms.

A form presented for processing to such a system has,

"

rst, to be identi"ed. This step is necessary if the system

Detection of vertical and horizontal lines from which the

skew and shift can be determined [7]: The projection

pro"le is used to determine the location of the lines that

correspond to the highest peaks in the pro"les. One

obvious drawback of this approach is the necessity for

long lines in the form so as to guarantee dominant high

peaks in the projections. Another de"ciency is the necess-

ity to correct rotation before identi"cation, since a ro-

tated form has a very low pro"le when projected on the

vertical axes (horizontal projection). As a result of this

limitation, a generic method of document skew detection

has to be used.

Special patterns of line crossings can be used, the loca-

tion of which when detected can determine the amount of

skew and translation to be corrected [6]. The patterns or

line crossings can be used as a feature vector for form

identi"cation as well. The obvious drawback is that the

system does rely on the form having these line crossings.

Such a system cannot always be trained with a new type

of form that is not designed with these line crossings. The

types of line crossings and their locations on the blank

form template have to be speci"ed interactively during

training.

should be able to handle several di!erent types of forms.

It is also desirable to have the capability to train it on

new types of forms, as the form may change or get

replaced over time.

A new blank form that the system is trained on is called

a form template, or prototype. Along with the template,

the system has to store information about the location of

the areas where the user supplied information is to be

found, so that it can be properly extracted.

After the identi"cation, the best matching template is

known but the text from the "elds (user "lled informa-

tion) cannot be extracted yet, since the form may have

a varying degree of skew and it is most likely shifted

during either copying or scanning.

Skew and shift have to be determined and the form has

to be rotated and shifted in opposite directions so that an

exact match to one of the stored templates is obtained.

For small rotation angles (up to 13), it is more e$cient to

rotate the template instead of the form. This is not the

best choice, however, for large rotation angles for two

reasons:

(

a) The "eld de"nitions are more easily handled as

rectangular areas de"ned by their upper-left and

bottom-right points.

b) If they are left rectangular they will most likely cut

into the preprinted parts of the form which remains

skewed.

A variety of skew detection methods have been pro-

posed in the literature; they include Projection Pro"le

[8], Hough Transform [4], Nearest}Neighbor Cluster-

ing [3], Bounding Box Detection [6] or A combination

of the above methods [9].

All these approaches (except nearest-neighbor cluster-

ing) are usually able to deal with rather small skew angles

of ($153). It is noteworthy to mention, however, that

this range ($153) of angles is not a limitation to OCR

systems. Assuming reasonable care during scanning, this

type of skew error is seldom introduced. The Hough

transform method is rather computationally intensive

and is usually avoided. Some variations, however, which

require reduced amount of computation have at times

been introduced [5].

(

After identi"cation, correction for rotation and shift,

the original "eld locations, as de"ned initially by the user

on the blank form prototype, can then be used to extract

the corresponding information from the incoming form.

The order in which form identi"cation, skew and shift

detection is performed varies widely in the methods pro-

posed in the literature. The order proposed here is rather

unique to the method presented in this paper.

Several generic algorithms have been presented in the

past that can be used to detect skew [3,4] and shift [5}7],

but all of them are either expensive, as far as the compu-

tation time is concerned, or they do not work for every

type of form. A system that is based on these algorithms

would not be generic enough for any type of form and it

may su!er in response time so as to restrict its use to

o!-line batch processing.

The nearest-neighbor clustering method does not have

any of the above limitations. It is rather fast but its

accuracy is within the neighborhood of $1.53. This type

of limitation cannot always be tolerated, especially when

the data has to be extracted from speci"c locations on

a preprinted form (i.e., an application form) before they

are forwarded for processing by the OCR system.

The method we propose here, does not have any of the

limitations described above for either the skew detection

or the form identi"cation part. It is much more generic

Systems in existence today that we know of, rely on

either one, or a combination of form identi"cation

N. Liolios et al. / Pattern Recognition 35 (2002) 253}264

255

and can be used for any type of pre-printed form. Our

method does not rely on any special features like patterns

of line crossings or dominant lines or even special sym-

bols found only on speci"cally designed forms. We use

the Power Spectral Density (PSD) of the form's horizon-

tal projection pro"le, as a shift invariant feature vector.

The Karhunen}Loeve (KL) transform is employed next

to de-correlate and reduce the length of the feature vec-

tors in the training set. Training is done in such a way

that skew correction is not necessary before identi"ca-

tion. This is accomplished by rotating each form used in

training to a set of prede"ned angles, while extracting the

feature vectors of the form for each rotation angle. Since

the range of expected angles (due to scanning) is small,

a good compromise is found between accuracy and the

amount of data for training. Learning Vector Quantiz-

ation (LVQ) was used to train the system on all the

KL-transformed PSDs in the training set. For recogni-

tion, the Euclidean distance alone was su$cient. The

system is very fast in both form identi"cation and skew

detection, since almost all the computation overhead

takes place in the training phase.

Fig. 1. Registering a new type of form.

This paper is organized as follows. In Section 2 we

describe the system. In Section 3 we present and analyze

our experimental results, and, "nally, in Section 4 we

draw some conclusions and discuss our plans for future

work.

the user to scan a form, display it and de"ne the rectan-

gular areas (see Fig. 1). These rectangular area de"nitions

("elds) are the parts of the form that have to be extracted

and forwarded to the handwritten text recognition part

of the system. The "eld de"nitions for all the types of

forms the system is trained with are kept in a "eld

de"nition database.

2

. System description

2

.2. Feature extraction

The preprinted form identi"cation, skew and shift de-

tection part is usually regarded as a preprocessor to

a handwritten text recognition system. The method and

algorithms described in this paper were actually used to

build an actual working, forms-processing tool capable

of handling almost any type of form. The system can be

viewed as a set of operations (each one of which is

implemented as a separate module). During the training

phase a set of "eld de"nitions is created for each blank

form prototype. It is followed by manual straightening

and positioning of all the forms in the train set from

which the feature vectors are extracted, for all predeter-

mined angles, for each of the forms, thus forming the

vectors in the train set. The recognition part consists of

the Form Identi"cation, as well as the Skew and Shift

detection operations. What follows in this section is

a more in depth analysis of the operations in the training

and recognition phases mentioned above.

To solve the problem of identifying the form without

making use of any special symbols, line crossing patterns

or dominant lines, we decided to use the PSD of the

form's projection pro"le, in conjunction with the KL

transform for vector size reduction.

Assume that a set of user "lled forms is available as

binary duo-tone images. Furthermore, assume that these

P images uꢀ.ꢁ are of maximum size N;M pixels. The pth

form can be regarded as a real image uꢀ.ꢁ such that its

elements are described below.

1

black pixels,

u "

GH

ꢀ

0white pixels.

The 2D image can also be considered as a vector of size

N rows by M columns:

2

.1. Field dexnition

u

$

u

$

2

\

u

$

ꢂꢂ

ꢃꢂ

ꢂꢃ

ꢃꢃ

ꢂ+

ꢃ+

Creation of a set of "eld de"nitions are associated with

u"

each blank form prototype which speci"es the locations

of the blank "elds of interest (the areas where the hand-

written text is expected). A graphical user interface allows

ꢁ

u

2

u

ꢂ

,ꢂ

,ꢃ

,+

2

56

N. Liolios et al. / Pattern Recognition 35 (2002) 253}264

The horizontal projection proxle is therefore

A vector of size 1025 (PSD) is a rather large to be used

e!ectively as a feature vector in the formation of a code-

book. This is the main reason the KL transform is ap-

plied to all the vectors in the train set to obtain a

transformed feature vector of reduced length (128) as

described below:

+

H(u)" ꢀ u , ꢀ u , , ꢀ u

2

ꢂH

ꢃH

^,^Hꢄ

ꢃ

Hꢄꢂ

And the vertical one

The mean spectrum vector m can be de"ned as

,

Gꢄꢂ

,

Gꢄꢂ

,

Gꢄꢂ

1

<

(u)" ꢀ u , ꢀ u , , ꢀ u

2

Gꢂ

Gꢃ

G+

ꢃ

ꢄ

1

+

Iꢄꢂ

m "E+S,"

ꢀ S

1

M

I

We used the Welsh method of averaged periodograms

(

see Ref. [10,11]) to obtain the power spectral density of

the horizontal projection:

and the covariance matrix of the spectra in the train set is

The algorithm as it applies in this case can be sum-

marized in the following steps:

C "E+(S!m )(S!m )2,.

V

1

Therefore

1

. Divide the horizontal projection pro"le vector <(u)

into a number of overlapping sections and remove any

linear trends from each section.

. Apply a window function to each successive detrended

section.

. Transforms each section with an FFT.

. Form the periodogram of each section by log-scaling

the magnitude squared of each transform.

. Averages the periodograms of the overlapping sec-

tions to form =( f ), the power spectrum of. the projec-

tion pro"le vector <(u).

1

+

Iꢄꢂ

C "

ꢀ S S2!m m2

1

I I

1 1

M

2

where T indicates vector transposition.

The covariance matrix, C , gives the mean of all the

spectra in the train set, of all the N;N interfrequency

correlations, and it actually describes statistically how

the spectra vary.

3

4

1

5

The covariance matrix C has Nꢃ eigenvectors as the

columns of ꢀ de"ned by the equation:

Cꢀ"ꢀꢁ

And it is easily shown that =( f ) is a shift invariant

representation of <(u) when either the number of over-

lapping windows is large or the signal is either folded or

replicated to form a large number of periods.

where the only non-zero elements of ꢁ are the eigen-

values on its diagonal. The eigenvalues are the directions

of maximum variance in the Nꢃ spectra space.

In our particular case, the signal (projection) is ex-

pected to start and end at level 0(due to document

margins). If the shift amount is smaller than the margins,

it can be shown, both in theory and in practice (see

Fig. 4 below), that even a small number of windows and

even only one period of the signal, results in relatively

small error.

Since the exact spectral bands that di!erentiate two

types of forms are not known in advance and the system

must be able to handle any type of form, we had to

calculate a rather large spectrum. Experimentally we

have determined that 1024 frequencies is a good compro-

mise between accuracy and speed.

If a new matrix A is formed from the columns of

having as "rst column the column of ꢀ that corres-

ponds to the largest eigenvalue of ꢁ, and as last column

the column of ꢀ that corresponds to the smallest eigen-

value, then the equation

ꢀ

y"A(S!m )

1

de"nes the Karhunen}Loeve transformed vector y for the

spectrum vector S. The inverse KLT is the solution for

S from the equation above:

S"A2y#m .

1

Note that A\ꢂ"A2 Since A is orthonormal. If a new

2

.3. Feature compaction

matrix A is formed from the K eigenvectors of A which

)

correspond to the largest eigenvalues of ꢁ, then an esti-

The optimality of our method is derived from the use

mate of S can be obtained:

of the Karhunen}Loeve transform. The basis of the

transform is the eigenvector set of the covariance matrix

of all the PSDs of the horizontal projection pro"les of the

images in the train set. It can also be viewed as a statist-

ical representation of the variance of the patterns the

system has to learn. The KL transform coe$cients of

a PSD are the feature vector (KL space representation) of

a printed form's image.

S

K

"A2y#m

I

1

with a mean reconstruction error of

ꢃ

,

)

,

e" ꢀ ꢀ ! ꢀ ꢀ "

ꢀ

Hꢄ)>ꢂ

ꢀ_H

H

Hꢄꢂ

where ꢀ 's are the eigenvalues on the diagonal of ꢁ.

H

N. Liolios et al. / Pattern Recognition 35 (2002) 253}264

257

It is obvious from the equation above that the recon-

prototype for training. Alternatively the projection of

the incoming form could be shifted in all possible

ways and each one of them would have to be tested

to "nd the best match. Both of these methods were

tested and the recognition results were very satisfactory.

Both methods however required a signi"cant amount of

computation and disk I/O time that renders them unus-

able.

It is therefore evident that a shift invariant transforma-

tion is required for the projection. We decided to use the

power spectrum of the horizontal projection based on the

Welsh method of averaged periodograms. Fig. 4 shows

the extreme case of how pronounced the horizontal pro-

jection di!erences are on a shifted form. They are the

projections of a blank form for 0shift (top), 200pixels left

shift (middle) and 300 pixels right shift (bottom) corre-

spondingly. Fig. 5 shows the power spectrums obtained

for the corresponding projections in Fig. 4. The similarity

of the three spectrums (top, middle and bottom) of

Fig. 5 indicates that the power spectrum forms a feature

vector that can be used for form identi"cation in a shift

invariant manner.

In order to solve the problem of correctly identifying

a rotated and possibly shifted form we decided to train

the system with each form prototype rotated to a number

of pre-speci"ed angles while obtaining a horizontal

projection for every di!erent rotation angle (see Fig. 7).

At this stage projections are generated for each form in

the train set for all angles in the [!103, #103] range

with a step of 0.23. The projections are stored in a

database using the form name, the rotation angle and the

projection type (Vertical or Horizontal) as the key for

subsequent retrievals.

struction error is 0when A "A because k"Nꢃ.

)

Furthermore, because the ꢀ 's decrease monotonically,

H

the error can be minimized by choosing the "rst K eigen-

vectors of A. The graph of Fig. 2 summarizes the mean

reconstruction error, derived experimentally, for all the

forms in the NIST database versus the choice for K. We

have chosen to use k"128 which results in an average

reconstruction error of 24.56%. This is rather large re-

construction error and one would expect that recognition

could be dramatically impaired. This is de"nitely not the

case. It is the nature of the KL transform to reconstruct

the parts which highly di!erentiate the vector (large

covariance) with the "rst few eigenvectors that corres-

pond to the largest eigenvalues. The least di!erentiated

ones (larger correlation) are reconstructed from the re-

maining eigenvectors which correspond to smaller eigen-

values.

To state it as it applies in this case, this means that the

frequency bands that make a given form di!erent from

the others are kept, while the most common frequencies

are dropped. This not only does not impair recognition

but it contributes to the formation of better di!erentiated

classes in the code-book, which in turn increases recogni-

tion accuracy.

Fig. 3 shows the horizontal projections of a blank form

for 03 (top), !1.83 (middle) and #1.83 (bottom) of

rotation. It turns out that there is enough variance in the

projections to successfully identify the degree of rotation

once the form itself has been identi"ed.

One way to identify a possibly shifted form from

its projection is to store every possible shifted

projection for every discrete rotation angle as a

Fig. 2. Reconstruction Error vs. Number of Eigenvectors.

2

58

N. Liolios et al. / Pattern Recognition 35 (2002) 253}264

Fig. 3. Projections of the same form at di!erent angles: 03 (top), !1.83 (middle), #1.83 (bottom).

Fig. 4. Projections of the same form with di!erent amounts of shift: No shift (top), !200 pixel shift (middle), #300 pixel shift (bottom).

The vertical projections were obtained only for 0ꢀ

degrees of rotation since they do not contain any

signi"cant information about a document's skew angle

in portrait orientation. The vertical projections

however were used for horizontal shift detection, in

the same manner as horizontal projections at 03 were

N. Liolios et al. / Pattern Recognition 35 (2002) 253}264

259

Fig. 5. The Power Spectral Densities of the same projection but with di!erent amounts of shift: No shift (top), !200 pixel shift (middle),

300 pixel shift (bottom).

#

used for vertical shift detection (described later in this

section).

2

.4. Training and identixcation

Forms that will be used during training should have no

skew or shift. A user interface was built where by super-

imposing the new image over a blank form prototype,

which is displayed as a water mark, the user manually

straightens all the samples in the train set. The whole

process is depicted in Fig. 6. The horizontal projection

along with the power spectrum for all the rotation angles,

as speci"ed earlier, are produced and stored, for every

form that the system is trained with.

All the KL-transformed power spectrum vectors,

which resulted from every rotation angle of a given form

in the train set (see Fig. 7) were assigned to the same class

labeled by the form's ID. Using Learning Vector Quant-

ization [4] (LVQ) a total of 375 distributions were ob-

tained, the centroids of which form the actual code-book.

The Euclidean distance was used as a measure of sim-

ilarity for identi"cation.

When an unknown form is now presented to the sys-

tem for identi"cation, both horizontal and vertical pro-

jections are obtained simultaneously (for economy in

computation) and subsequently, the power spectrum S of

the horizontal projection is calculated.

Fig. 6. Train set construction.

Using the KL transform we obtain the vector:

y"A(S!m ).

1

Vector y is the one used for best match in the codebook

using Eucleadian distance as a measure of similarity. The

"rst label of the matched codebook vector is the name of

the known form that best matches the unknown.

2

60

N. Liolios et al. / Pattern Recognition 35 (2002) 253}264

Fig. 8. A pictorial view of the codebook: &*'"300 dpi,

&

#'"150dpi, &o'"75 dpi &;'"35 dpi.

The type of rotation (left or right) as well as the

rotation angle itself could be determined during identi-

cation. It can be seen from the graph that the system is

"

expected to be very accurate not only for form identi"ca-

tion but for skew angle detection as well, when the angle

is relatively small.

2

.5. Skew detection

Fig. 7. Creation of vectors in train set.

Once a form is identi"ed, the rotation angle, as well as

the amounts of horizontal and vertical shift, have yet to

be determined. Since the horizontal projections were

generated for all prede"ned rotation angles we created

a set of code-books, one per page per form. Each one of

these code-books contained one vector per form per page

per rotation angle, where each vector is now the

KL-transformed PSD of the horizontal projection of

a given form page at that angle. The rotation angle is now

the actual vector label, while the codebook name is the

same with the form type. LVQ was used to create one

codebook for each one of these train sets (form types).

The reason that all these codebooks are created is that

each codebook now contains more vectors for every

angle and therefore more accurate skew estimation. In

addition response time is shortened since the search is

done in a speci"c codebook which is much smaller than

the initial one which was created for form identi"cation.

With the form already identi"ed the codebook to be

used is uniquely determined. The label of the best match

in the correct code-book is the skew angle of the un-

known form.

In Fig. 8, Summons Mapping [12] was used to

show how well the classes are formed in the codebook.

For clarity, the system in this case, was trained only

with the seven di!erent types of forms, chosen at

random from the train set. The class separation is obvi-

ous and it shows pictorially that form identi"cation is

expected to be very high since there is no overlap between

the classes.

We were able to decipher that the points that are

farthest from the center of the graph are formed from the

spectra at small rotation angles while the ones towards

the center correspond to larger rotation angles. The high-

er density of the graph at the center shows that the

capability of the system to discriminate decreases as

rotation angle increases. This is a logical conclusion since

the power spectrum at large rotation angles has smaller

high frequency content and it is therefore incapable of

representing the details of the form.

The points in each class can be farther divided into two

almost symmetrical over an axis subgroups. The one of

the two subgroups in every class represents positive rota-

tion angles while the other one the negative.

Experimentally we determined that the system is accu-

rate enough for up to $3.53 of skew angle detection.

When a larger angle is reported, the result is considered

N. Liolios et al. / Pattern Recognition 35 (2002) 253}264

261

inaccurate. In the case where the angle is large we use the

connected-components [9] algorithm for an initial deter-

mination of the rotation angle. The accuracy of the

connected-components algorithm is not acceptable for

this application. It is used only as an initial estimate

because it works for angles larger than the ones our

method can handle. Once the rotation is corrected ini-

tially using the connected-components algorithm, the

process of identi"cation is repeated since the rotation

angle is now guaranteed to be within the acceptable

limits of our method.

After the horizontal and vertical shifts are determined,

instead of shifting the whole form, the prototype form's

"eld coordinates are shifted in the opposite direction.

3

. Experimental results

The system was trained for 26 di!erent types of

forms with 3 samples from each type (total 78 samples).

It was tested with a set of 2284 forms that contained

a variable amount of handwritten text in the blank "elds.

Many of them were contaminated with a signi"cant

amount of noise (i.e., handwriting on the margins and

company seals). A subset (1200 forms) of the test set

was taken from the NIST Scanned Forms database,

which is considered to be the standard for reporting

OCR results of this type. The reason that the NIST

database set was not used exclusively is that all the forms

in it are of the same structure. It would therefore be

impossible to test the form identi"cation part of the

system. The system was tested for recognition accuracy at

di!erent scan resolutions and the results are summarized

in Table 1.

The CPU time was measured on a 350MHz Pen-

dium-Pro CPU running Windows NT Workstation

V4.0. The same set of forms was also used to test the skew

detection capability of the system. The true skew angle of

every form in the test set is not known in advance and it is

impractical to determine it manually for a set this large.

Furthermore a rigorous test should involve random rota-

tions within a range of angles in order to determine the

system limits.

Using connected components to determine skew angle

is a slower process. It does not however, have a signi"-

cant impact in the overall performance for the following

reasons:

1

. The connected components are formed concurrently

with the projection pro"le. Both processes consider

only the black pixels, and they can both be created in

one pass through the image.

2

3

. The connected components are also needed by the

OCR system to extract a list of character images from

every "eld.

. During scanning of a document a skew angle of more

than 3.53 is seldom introduced. This process is there-

fore seldom called by the system.

2

.6. Shift detection

With the form identi"ed and deskewed, the problem is

reduced to determining the horizontal and/or vertical

shift. Two new codebooks were created one for horizon-

tal and one for vertical shift detection. These codebooks

were formed from the corresponding projections at 03 of

rotation. Each vector was labeled with the form type that

it was created from.

In each codebook, only those vectors, whose label

matches the form, as it was identi"ed, are now consider. By

performing continuous shifts on the projection of the

incoming form, while comparing it to a projection in the

codebook, eventually the amount of shift that results in

minimum Euclidean distance is found. The horizontal pro-

jection was used for vertical shift detection and the vertical

one to detect the horizontal shift in a similar fashion.

To reduce computation, the projection is not shifted,

but an o!set to it is used instead, which is changed in

every step. To reduce computation further the search for

the best shift is done in three stages:

The following simple algorithm was used to automate

the testing process.

Algorith TestForms(Image List, Correct ID List)

}

For Every Image in Image List

}

Image Name"Next Image(Image List)

}

I"Read Image(Image Name)

}

Correct ID"Next ID in Correct ID List

} }

}

ID"Identify Image(I)

}

If ID (' Corect ID then

}

Report Incorrectly Identi"ed(Image Name)

}

Else

Report Correctly Identi"ed(Image Name)

}

r"Detect Rotation(I)

}

Iꢁ"Correct Rotation(I, r)

}

rꢁ"Detect Rotation(Iꢁ)

If -Tolerance 'rꢁ'#Tolerance then

Report Incorrec Angle(Image Name)

1

2

. Find best shift using a shift step of 50pixels.

. Search for a better mach in the neighborhood of

}

Else

Report Correct Angle(Image Name)

[

!503, #503] pixels from the previous match using

}

a shift step of 10.

//The skew is at this point corrected

I"Iꢁ

//Repeat testing for a range of rotations

3

. Change the neighborhood to [!103, #103] and

repeat the search around the previous best match with

a shift step of 1.

For r"!Max Skew to#Max Skew step 0.1

}

2

62

N. Liolios et al. / Pattern Recognition 35 (2002) 253}264

Ir"Rotate(I, r)

of forms, which makes this experiment the largest re-

ported of this kind (Tables 2 and 3). It can be seen that

this method is very robust even at low resolutions and

therefore it lends itself to applications such as fax trans-

missions that are normally scanned at 75 dpi.

rꢁ"Detect Rotation(Ir)

}

Iꢁ"Rotate(Ir, !rꢁ)

rr"Detect Rotation(Iꢁ)

}

If -Tolerance (rr(#Tolerance then

Report Correct Angle(Image Name)

Table 3 is an example of how skew detection percentage

rate increases, as tolerance increases. The graph of Fig.

9 shows how angle detection accuracy increases as error

tolerance increases at 300, 150, 75 and 35 dpi. It can be

seen that for tolerances over 1.23 accuracy approaches

100% at 300 dpi resolution. As mentioned earlier (and

proven experimentaly in our labs) this tolerance is accept-

able for satisfactory text extraction from the form "elds.

The horizontal and vertical shift detection never failed

once the form type and skew angle were detected cor-

rectly. A threshold of 25 pixels of shift was used in all

experiments. The algorithm that was used to automate

the shift detection testing process was a replica of the

logic used for the skew detection tests.

The left part of Fig. 10is the picture of an incoming

form and shows how misplaced the "elds are after identi-

"cation. This particular form was shifted up and left

during scanning. A signi"cant amount of noise can be

seen as a black smudge near the center of the form. The

rotation is not quite visible but nevertheless the system

}

Else

Report Incorrec Angle(Image Name)

}

End If

End For

End If

End For.

The above algorithm is based on the simple fact that if

the skew angle is detected correctly, then after straighten-

ing the image, a second test should report a skew angle

very close to 03.

Theoretically, the algorithm cannot be proven correct

since there is a distinct possibility of two consecutive

errors in the skew angle estimation with the second test

falsely reporting 03. In practice this is seldom the case

since the system is very accurate and robust for angles

close to 03. A visual inspection of the results for about 200

sample forms revealed no instances of this condition

(double fault).

The range [!Tolerance, #Tolerance] is the accept-

Table 3

able angle error. Experimentally it was determined that

tolerances of up to 1.23 do not introduce any signi"cant

errors in the text extraction process, i.e., the "eld borders

do not cross the preprinted parts of the form.

Resolution vs. skew detection rate at 1.23 tolerance

Dpi

Number

of forms

Angle detected

correctly

Correct %

99.60

Using the algorithm above, the actual number of tests

performed to obtain the results is 200 times the number

300

2284

2275

1

502284

2251

98.55

7

35

5

2284

2239

2218

98.02

97.11

Table 1

Resolution vs. recognition rate and CPU time

Dpi

Number

of forms

Correctly

identi"ed

Correct % CPU

time

3

1

7

3

00

502284

5

2284

2278

99.73

99.73 .45

.3399.600

.2298.900

0.82

2278

0

2284

2275

2259

Table 2

Resolution vs. skew detection rate at 0.63 tolerance

Dpi

Number of

forms

Angle detected

correctly

Correct %

95.53

3

1

00

502284

2284

2182

2178

95.35

7

3

5

2284

2173

2166

95.14

94.83

Fig. 9. Skew Detection Rate vs. Tolerance.

N. Liolios et al. / Pattern Recognition 35 (2002) 253}264

263

Fig. 10. Positioning of "elds before (left) and after (right) skew and shift correction.

detected a skew angle of 0.43 a left shift of 74 pixels and

an upshift if 237 pixels, which is a result well within the

acceptable limits. The right half of Fig. 10, shows how

well the "elds are positioned after form identi"cation,

skewing and shift correction.

arti"cial test data from a database of existing forms. This

is rather important to insure that all extreme cases are

considered during testing. Obviously, there is a need for

a standard data set and testing method with which

authors can compare their results in a concise fashion.

The defect model generator could also decrease the need

for large storage capacity demanded for training and

testing such a system, since the document variations can

be generated on the #y.

4

. Conclusions and future work

We have built a system that solves the form identi"ca-

The system described in this paper can be easily ex-

tended to work on a form in either landscape or portrait

orientation. An obvious solution would be to obtainthe

vertical projections for every angle and test them for best

"t as well. We are also working (with encouraging results)

on a method to determine a document's orientation

(landscape or portrait) from the projection pro"le

spectrums.

tion and "eld extraction problem in the most generic

case. It turns out that the power spectrum is an excellent

choice as a shift invariant feature vector. It has enough

variance to be useful for accurate identi"cation and its

shift invariance contributes highly to the system's fast

response time.

It does have a limitation related to the amount of skew

that it can handle, since the high frequency content of the

horizontal projection pro"le spectrum drops rather fast.

However, it can be used e!ectively for OCR applications

where the amount of skew is almost always very small.

We have alleviated the problem of high skew correction

by combining the skew detection with the connected

components method when a threshold is exceeded. Cur-

rently we are working on a new skew detection method

based on connected components but with signi"cantly

improved accuracy when the skew angle is small.

In the near future, we plan to incorporate into the

system a document defect model generator to create

References

[

1] A. Dengel, R. Bleisinger, R. Hock, F. Fein, F. Hones, From

paper to o$ce document standard representation, Com-

put. 25 (7) (1992) 63}67.

2] S. Mori, C.Y. Suen, K. Yamamoto, Historical review of

OCR research and development, Proc. IEEE 80(7) (1992)

1

029}1058.

3] H.S. Baird, The skew angle of printed documents, Pro-

ceedings of Conference Of the Society of Photographic

Scientists and Engineers, 1987, pp. 14}21.

2

64

N. Liolios et al. / Pattern Recognition 35 (2002) 253}264

[

4] S.C. Hinds, J.L. Fisher, D.P. D'Amato, A document skew [8] H.S. Baird, The skew angle of printed documents. Pro-

detection method using run-length encoding and the

hough transform, Proceedings of the 10th International

Conference Pattern Recognition 1990, pp. 464}468.

ceedings of the Conference on Photographic Scientists and

Engineers, SPIE, Bellingham, Wa., 1987, pp. 14}21.

[9] L. O' Gorman, The document spectrum for structural page

layout analysis, IEEE Trans. Pattern Anal. Mach. Intell.

15 (11) (1993) 1162}1173.

[

5] R. Casey, D. Ferguson, K. Mohiuddin, E. Walace, Intelli-

gent forms processing system, Mach. Vision Appl. 5 (1992)

1

43}155.

[10] T. Pavlidis, Avectorizer and feature extraction for docu-

ment recognition, Comput. Vision Graphics Image Pro-

cessing. 35 (1986) 111}127.

[11] G. Wesley, Markowsky, Fleshing out projections, IBM J.

Res. Devel. 25 (1981) 934}954.

[12] J.W. Sammon Jr., A nonlinear mapping for data structure

analysis, IEEE Trans. Comput. C-18(5) (May 1969).

401}409.

6] S.L. Taylor, R. Fridzson, J.A. Pastor, Extraction of data

from preprinted forms, Mach. Vision Appl. 5 (1992)

2

11}222.

7] P.J. Grother, NIST handprinted forms and characters

database, National Institute of Standards and Technology,

Advanced Systems Division, Visual Image Processing

Group, March 16, 1995.

About the Author*NICKOLAS T. LIOLIOS was born in Elassona, Greece, on April 16, 1958. He received his 1st BSc degree in

Industrial Engineering Technology and the 2nd B.Sc. degree in Computer Science from Western Michigan University (USA). He also

received an MSc degree in Computer Science from the same university. He is currently a PhD candidate at the University of Patras,

department of Electrical and Computer Engineering.

From 1985 to 1987 he was a lab supervisor at Western Michigan University. From 1989 to 1995 he was a Software Engineer with

Ford Motor Co., working on the design of diagnostic systems. Since 1995 he is an instructor at the Technological Educational Institute

of Lamia, Greece, Department of Electrical Engineering.

He has presented several papers on international conferences in the areas of Optimization, Parallel Computer Architecture and

Pattern Recognition. He is currently working on `Automated Data Extraction from Preprinted Formsa which is the subject of his PhD

dissertation.

About the Author*DR. NIKOS FAKOTAKIS received the BSc degree from the University of London (UK) in Electronics in 1978, the

MSc degree in Electronics from the University of Wales (UK), and the PhD degree in Speech Processing from the University of Patras,

(Greece), in 1986.

From 1989 to 1996 he was Director of the Human}Machine Communications Dept. of the KNOWLEDGE SA Co. From 1986 to

1

992 he was lecturer in the Electrical and Computer Engineering Department of the University of Patras, from 1992 to 1999 Assistant

Professor and since 2000 he has been Associate Professor in the area of Speech and Natural Language Processing and Head of the

Speech and Language Processing Group at the Wire Communications Laboratory

Dr. Fakotakis is author of over 100 publications in the area of Speech and Natural Language Engineering. His current research

interests include Speech Recognition/Understanding, Speaker Recognition, Spoken Dialogue Processing, Natural Language Processing

and Optical Character Recognition.

Dr. Fakotakis is editor in chief in the European Student of Journal Language and Speech (WEB-SLS) and a member of the executive

board of the European Network in Language and Speech (ELS NET). He is also a member of the IEEE, TEE, EURASIP, ISCA.

About the Author*GEORGE KOKKINAKIS was born in Chios, Greece, on March 17, 1937. He received the Diploma in Electrical

Engineering (Dipl.-Ing.) in 1961, the Doctor's Degree in Engineering (Dr.-Ing) in 1966 and the Diploma in Engineering Economics

(Dipl.-Wirt.-Ing) in 1967, all from the Technical University of Munich (Technische Hochschule Munchen).

During 1968}1969 he served at the Ministry of Coordination in Athens. Since 1969 he is with the Department of Electrical

Engineering at the University of Patras, where he has organized and is directing the Wire Communications Laboratory (WCL). His

current activity in research and development, which coincides with the activity of WCL, includes the analysis, synthesis recognition and

linguistic processing of the Greek language and the design and optimization of telecom networks. The WCL is a partner in ESPRIT

(Polygot), RACE (Telemed) and in other R&D projects "nanced by The Greek Telecom Industry, The Greek General Secretariat for

Research and Technology, etc. He has published several books on Telecommunications and Electrotechnology and over 50technical

papers, articles and reports on Telecommunications and Speech Technology.

Dr. Kokkinakis is a senior member of IEEE and a member of the Technical Chamber of Greece (TEE), the VDE (Verein Deutscher

Elektrotechniker), ESCA, the EURASIP (European Association for Signal Processing), the SEFI (Societe Europeenne pour la

Formation de Ingenieurs), the EEEE (Greek Operations Research Society), and the Linguistics Society of America (LSA).

Article Doi

DOI: 10.1016/S0031-3203(01)00030-9

Source and publish data:

Authors:

Article abstract of DOI:10.1016/S0031-3203(01)00030-9

Full text of DOI:10.1016/S0031-3203(01)00030-9

Products guided by the article

R&D Labs maybe for 111-49-9

Relevant to this article

Hot Product