P. Prusis et al. / Bioorg. Med. Chem. 16 (2008) 9369–9377
9377
4.6. Correlation by partial least-squares projections to latent
structures
References and notes
1. Halstead, S. B. Science 1988, 239, 476.
2. Jacobs, M. G.; Young, P. R. Curr. Opin. Infect. Dis. 1998, 11, 319.
3. Monath, T. P. Proc. Natl. Acad. Sci. U.S.A. 1994, 91, 2395.
4. Gubler, D. J.; Clark, G. G. Emerg. Infect. Dis. 1995, 1, 55.
5. Guzman, M. G.; Kouri, G. Lancet Infect. Dis. 2002, 2, 33.
6. Gubler, D. J. Clin. Microbiol. Rev. 1998, 11, 480.
7. Gubler, D. J. Trends Microbiol. 2002, 10, 100.
8. Edelman, R. Clin. Infect. Dis. 2007, 45(Suppl. 1), S56.
9. Chambers, T. J.; Hahn, C. S.; Galler, R.; Rice, C. M. Annu. Rev. Microbiol. 1990, 44,
649.
10. Stadler, K.; Allison, S. L.; Schalich, J.; Heinz, F. X. J. Virol. 1997, 71, 8475.
11. Falgout, B.; Pethel, M.; Zhang, Y. M.; Lai, C. J. J. Virol. 1991, 65, 2467.
12. Zhang, L.; Mohan, P. M.; Padmanabhan, R. J. Virol. 1992, 66, 7549.
13. Ramachandran, M.; Sasaguri, Y.; Nakano, R.; Padmanabhan, R. Methods
Enzymol. 1996, 275, 168.
14. Preugschat, F.; Yao, C. W.; Strauss, J. H. J. Virol. 1990, 64, 4364.
15. Chambers, T. J.; Weir, R. C.; Grakoui, A.; McCourt, D. W.; Bazan, J. F.; Fletterick,
R. J.; Rice, C. M. Proc. Natl. Acad. Sci. U.S.A. 1990, 87, 8898.
Descriptors were correlated to the logarithms of the Km and kcat
activities of the 48 work set peptides by partial least-squares pro-
jections to latent structures (PLS)44 using the Unscrambler 9.7 soft-
ware (CAMO Software AS, Norway). PLS is a widely used method
for finding a quantitative relationship between a set of descriptors
(X data) and one or several responses (Y data). This is achieved by
simultaneously projecting the X and Y matrices onto lower dimen-
sionality variable space (PLS components) with an additional con-
straint to maximize the covariance between projections of X and Y.
For each response, PLS derives a regression equation, where regres-
sion coefficients show the direction and magnitude of the influence
of descriptors on the response (for detailed descriptions see Refs.
45 and 46).
16. Preugschat, F.; Strauss, J. H. Virology 1991, 185, 689.
17. Arias, C. F.; Preugschat, F.; Strauss, J. H. Virology 1993, 193, 888.
18. Lin, C.; Amberg, S. M.; Chambers, T. J.; Rice, C. M. J. Virol. 1993, 67, 2327.
19. Lobigs, M. Proc. Natl. Acad. Sci. U.S.A. 1993, 90, 6218.
4.7. Validation of models
20. Teo, K. F.; Wright, P. J. J. Gen. Virol. 1997, 78, 337.
21. Li, H.; Clum, S.; You, S.; Ebner, K. E.; Padmanabhan, R. J. Virol. 1999, 73, 3016.
22. Clum, S.; Ebner, K. E.; Padmanabhan, R. J. Biol. Chem. 1997, 272, 30715.
23. Falgout, B.; Miller, R. H.; Lai, C. J. J. Virol. 1993, 67, 2034.
24. Chambers, T. J.; Grakoui, A.; Rice, C. M. J. Virol. 1991, 65, 6042.
25. Wengler, G.; Czaya, G.; Farber, P. M.; Hegemann, J. H. J. Gen. Virol. 1991, 72, 851.
26. Li, J.; Lim, S. P.; Beer, D.; Patel, V.; Wen, D.; Tumanut, C.; Tully, D. C.; Williams, J.
A.; Jiricek, J.; Priestle, J. P.; Harris, J. L.; Vasudevan, S. G. J. Biol. Chem. 2005, 280,
28766.
27. Niyomrattanakit, P.; Yahorava, S.; Mutule, I.; Mutulis, F.; Petrovska, R.; Prusis,
P.; Katzenmeier, G.; Wikberg, J. Biochem. J. 2006, 397, 203.
28. Yin, Z.; Patel, S. J.; Wang, W. L.; Wang, G.; Chan, W. L.; Rao, K. R.; Alam, J.;
Jeyaraj, D. A.; Ngew, X.; Patel, V.; Beer, D.; Lim, S. P.; Vasudevan, S. G.; Keller, T.
H. Bioorg. Med. Chem. Lett. 2006, 16, 36.
We used thorough validations to find the optimal complexity
(i.e., number of components) of the PLS models and to assess their
reliability for interpretations and predictions. The goodness of fit of
a PLS model was assessed by r2, the fraction of the explained var-
iance of the response. The predictive ability was characterized by
q2, the fraction of the predicted variance of the response assessed
by cross-validation.45
In the current study, cross-validation was performed in two
ways. First, the dataset was randomly divided into seven groups.
In this way, we assessed the predictive ability for new substrate–
protease combinations. Cross-validation groups were thereafter
rearranged so that all four activity measurements for each of 48
substrates were assigned to the same cross-validation group. This
was done to avoid overly optimistic assessments of the predictive
ability in case the four protease subtypes shared similar substrate
interaction profiles (Results from conventional cross-validation are
here referred to as q2, and results from modified variant of cross-
validation as q2substr). Finally, we also estimated the external predic-
tive ability of models by performing predictions for the eight test
set substrates (The assay data for the test set substrates were ob-
tained independently using the same assay method as for the work
set substrate and are given in Table 1). The thus obtained q2ext esti-
mate differs from q2substr essentially by the fact that P10–P40 of the
test set peptides resemble native cleavage sites of DEN-1–4 poly-
proteins and, as a result, these peptides generally show higher
activities than the average for the work set. Thus, q2ext measure is
preferable for the assessment of the ability of models to generalize
towards high activities.
29. Chanprapaph, S.; Saparpakorn, P.; Sangma, C.; Niyomrattanakit, P.;
Hannongbua, S.; Angsuthanasombat, C.; Katzenmeier, G. Biochem. Biophys.
Res. Commun. 2005, 330, 1237.
30. Lapinsh, M.; Prusis, P.; Gutcaits, A.; Lundstedt, T.; Wikberg, J. E. S. Biochim.
Biophys. Acta 2001, 1525, 180.
31. Wikberg, J. E. S.; Lapinsh, M.; Prusis, P. Proteochemometrics:
A Tool for
Modelling the Molecular Interaction Space. In Chemogenomics in Drug
Discovery—A Medicinal Chemistry Perspective; Kubinyi, H., Müller, G., Eds.;
Wiley-VCH: Weinheim, 2004; pp 289–309.
32. Lapinsh, M.; Prusis, P.; Uhlén, S.; Wikberg, J. E. S. Bioinformatics 2005, 21, 4289.
33. Lapinsh, M.; Veiksina, S.; Uhlén, S.; Petrovska, R.; Mutule, I.; Mutulis, F.;
Yahorava, S.; Prusis, P.; Wikberg, J. E. S. Mol. Pharmacol. 2005, 67, 50.
34. Prusis, P.; Uhlen, S.; Petrovska, R.; Lapinsh, M.; Wikberg, J. E. S. BMC
Bioinformatics 2006, 7, 167.
35. Mandrika, I.; Prusis, P.; Yahorava, S.; Shikhagie, M.; Wikberg, J. E. S. Protein Eng.
Des. Sel. 2007, 20, 301.
36. Kontijevskis, A.; Prusis, P.; Petrovska, R.; Yahorava, S.; Mutulis, F.; Mutule, I.;
Komorowski, J.; Wikberg, J. E. S. PLoS Comput. Biol. 2007, 3, e48.
37. Eriksson, L.; Johansson, E. Chemom. Intell. Lab. 1996, 34, 1.
38. Lundstedt, T. E.; Seifert, E.; Abramo, L.; Thelin, B.; Nystrom, A.; Pettersen, J.;
Bergman, R. Chemom. Intell. Lab. 1998, 42, 3.
39. De Aguiar, P. F.; Bourguignon, B.; Khots, M. S.; Massart, D. L.; Phan-Than-Luu, R.
Chemom. Intell. Lab. 1995, 2, 199.
40. Eriksson, L.; Johansson, E.; Kettaneh-Wold, N.; Wikström, N.; Wold, S. Design of
Experiments, Principles and Applications; Umetrics AB: Umeå, 2000.
41. Sandberg, M.; Eriksson, L.; Jonsson, J.; Sjöström, M.; Wold, S. J. Med. Chem. 1998,
41, 2481.
Acknowledgments
42. Wold, S.; Esbensen, K.; Geladi, P. Chemom. Intell. Lab. 1987, 2, 37.
43. Gottfries, J. Chemom. Intell. Lab. 2006, 83, 148.
44. Wold, S.; Sjöström, M.; Eriksson, L. Chemom. Intell. Lab. 2001, 58, 109.
45. Wold, S. PLS for Multivariate Linear Modeling. In Chemometric Methods in
Molecular Design; van de Waterbeend, H., Ed.; VCH: Weinheim, Germany,
1995; pp 195–218.
We thank Ewelina Fogelström for technical assistance. Support
was obtained by SIDA-Swedish Research Links (348-2004-5993)
and the Swedish VR (04X-05957) and by basic science grant BRG
49800008 (to G.K.) and a Royal Golden Jubilee scholarship from
the Thailand Research Fund (TRF).
46. Geladi, P.; Kowalski, B. R. Anal. Chim. Acta 1986, 185, 1.