102
A. Hirashima et al. / Bioorg. Med. Chem. 11 (2003) 95–103
The molecules associated with their conformational
models was submitted to Catalyst hypothesis genera-
tion. The present work shows how a set of various OA/
TA agonists, responsible for the inhibition of sex-pher-
omone production in P. interpunctella, maybe treated
statisticallyto uncover the molecular characteristics
which are essential for high activity. These character-
istics are expressed as chemical features disposed in
three-dimensional space and are collectivelytermed a
hypothesis. Hypotheses approximating the pharmaco-
phore are described as a set of features distributed
within a 3-D space. This process onlyconsidered surface
accessible functions such as HBA, HBAl, hydrogen-
bond donor (HBD), Hp, HpAr, HpAl, negative charge,
positive charge, ring aromatic (RA), negative ionizable
(NI) and positive ionizable (PI).51 A preparative test
was performed with these features. NI and PI were used
rather than negative charge and positive charge in order
to broaden the search for deprotonated and protonated
atoms or groups at physiological pH. Furthermore, in
order to emphasize the importance of an aromatic
group corresponding to the phenol moietyof test com-
pounds, RA which consists of directionalitywas chosen
to be included in the subsequent run. The hypothesis
generator was restricted to select onlyfive features due
to the molecule’s flexibilityand functional complexity.
For molecules larger than dipeptides, Catalyst often will
find five-feature hypotheses automatically, but for
smaller molecules, three- or four-feature hypotheses
might be in the majority. Since hypotheses with more
features are more likelyto be stereospecific and gen-
erallymore restrictive models, the total features min
value was set to 5 in order to force Catalyst to search
for five-feature hypotheses.32
chemist assess the validityof a hypothesis. One is the cost
of an ideal hypothesis, which is a lower bound on the cost
of the simplest possible hypothesis that still fits the data
perfectly. The other is the cost of the null hypothesis,
which presumes that there is no statisticallysignificant
structure in the data, and that the experimental activities
are normallydistributed about their mean. Generally, the
greater the difference between the two costs, the higher the
probabilityfor finding useful models. In terms of hypoth-
esis significance, a generated hypothesis with a cost that is
substantiallybelow that of the null hypothesis is likelyto
be statisticallysignificant and bears visual inspection. 52
Acknowledgements
We thank Dr. Ada Rafaeli (Department of Stored Pro-
ducts, Pheromone Research Lab, Volcani Centre,
Israel) for valuable suggestions in rearing P. inter-
punctella. This work was supported in part bya Grant-
in-Aid for Scientific Research from the Ministryof
Education, Science and Culture of Japan.
References and Notes
1. Raina, A. K. Ann. Rev. Entomol. 1993, 38, 329.
2. Ma, P. W. K.; Roelofs, W. Insect Biochem. Molec. Biol.
1995, 25, 467.
3. Rafaeli, A.; Gileadi, C. Invert. Neurosci. 1997, 3, 223.
4. Jurenka, R. A. Arch. Insect Biochem. Physiol. 1996, 33, 245.
5. Soroker, V.; Rafaeli, A. Insect Biochem. 1989, 19, 1.
6. Rafaeli, A.; Soroker, V.; Kamensky, B.; Raina, A. K. J.
Insect Physiol. 1990, 36, 641.
7. Arima, R.; Takahara, K.; Kadoshima, T.; Numazaki, F.;
Ando, T.; Uchiyama, M.; Nagasawa, H.; Kitamura, A.;
Suzuki, A. Appl. Entomol. Zool. 1991, 26, 137.
8. Jurenka, R. A.; Jacquin, E.; Roelofs, W. L. Proc. Natl.
Acad. Sci. U.S.A. 1991, 88, 8621.
9. Fonagy, A.; Matsumoto, S.; Schoofs, L.; Loof, A.; De
Mitsui, T. Biotech. Biochem. 1992, 56, 1692.
10. Matsumoto, S.; Ozawa, R.; Nagamine, T.; Kim, G.-H.;
Uchiumi, K.; Shono, T.; Mitsui, T. Biosci. Biotech. Biochem.
1995, 59, 560.
11. Rafaeli, A.; Gileadi, C. Insect Biochem. Mol. Biol. 1995,
25, 827.
12. Rafaeli, A.; Gileadi, C. Insect Biochem. Mol. Biol. 1996,
26, 797.
13. Rafaeli, A.; Gileadi, C.; Fan, Y.; Meixun, C. J. Insect
Physiol. 1997, 43, 261.
14. Kuwahara, Y.; Kitamura, C.; Takahashi, S.; Hara, H.;
Ishii, S.; Fukami, H. Science 1971, 171, 801.
15. Brady, U. E.; Tumlinson, J. H.; Brownlee, R. G.; Silver-
stein, R. M. Science 1971, 171, 802.
16. Zhu, J.; Ryne, C.; Unelinus, C. R.; Valeur, P. G.; Lof-
stedt, C. Entomol. Exp. Appl. 1999, 92, 137.
17. Hirashima, A.; Eiraku, T.; Watanabe, Y.; Kuwano, E.;
Taniguchi, E.; Eto, M. Pest Manag. Sci. 2001, 57, 713.
18. Axelrod, J.; Saavedra, J. M. Nature 1977, 265, 501.
19. Evans, P. D. In Reviews in Comparative Molecular Neuro-
biology; Heller, S. R., Ed.; Birkhauser: Basel, 1993; p 287.
20. Evans, P. D. In Reviews in Comprehensive Insect Physiol-
ogy Biochemistry Pharmacology, Kerkut, G. A., Gilbert, G.,
Eds.; Pergamon: Oxford, 1985; Vol. 11, p 499.
21. Nathanson, J. A. Mol. Pharm. 1985, 28, 254.
22. Evans, P. D. J. Physiol. 1981, 318, 99.
Validation of the hypothesis. During a hypothesis gen-
eration run, Catalyst considers and discards many
thousands of models. It attempts to minimize a cost
function consisting of two terms. One penalizes the
deviation between the estimated activities of the training
set molecules and their experimental values. The other
penalizes the complexityof the hypothesis. The overall
assumption used is based on Occam’s razor, that
between otherwise equivalent alternatives, the simplest
model is best. Simplicityis defined using the minimum
description length principle from information theory.
The overall cost of a hypothesis is calculated by sum-
ming the cost function consisting of three terms (weight
cost, error cost, and configuration cost). Weight cost is
a value that increases in a Gaussian form as the feature
weight in a model deviates from an idealized value of
2.0. Error cost is a major value that increase as RMS
difference between estimated and measured activities. It
is obtained bycalculating predicted activitydivided by
experimental value, when predicted activityis under-
estimated. In case predicted activityis overestimated, it is
obtained bycalculating experimental activitydivided by
predicted value and indicated byminus. Configuration
cost is a fixed cost which depends on the complexityof
the hypothesis, equal to entropy of the hypothesis space.
Besides providing a numerical score for each generated
hypothesis, Catalyst provides two numbers to help the
23. Roeder, T.; Nathanson, J. A. Neurochem. Res. 1993, 18, 921.