ACCEPTED MANUSCRIPT
Amino acids in a protein sequence plays pivotal role in attaining structural stability and increases
the content of various non-covalent interactions. This have been considered as the major
contributor for thermostability that makes the protein compact and rigid to sustain structural and
functional integrity under elevated temperatures [29–31]. To understand the molecular basis of
protein thermostability, a dataset was created containing 116 non-redundant and homologous
thermophilic-mesophilic pairs. The mesophilic counterparts were chosen through BLASTp
program with a threshold of ≥70% homology (Table S1). PEPSTATS server was used to
enumerate percentage compositions of 29 amino acid features which included 20 standard amino
acids (Ala, Arg, Asn, Asp, Cys, Gln, Glu, Gly, His, Ile, Leu, Lys, Met, Phe, Pro, Ser, Thr, Trp,
Tyr and Val) and 9 amino acid classes (polar, Pol; non-polar, NPol; small, Sml; tiny, Tiny;
aromatic, Aro; aliphatic, Ali; charged, Chrg; basic, Bsc; and acidic, Acd). Two-sample KS-test
was further employed to filter the statistically significant amino acids in the comparing proteins.
KS-test does not depend on cumulative distribution function of the comparing samples and
adapted to give the correct p-values. It measured the statistically significant difference in the
distribution of bipartite dataset on the basis of D-statistics which states the difference between
empirical distribution functions of the two samples [32]. In the present study, KS-test showed
that among 29 features, 19 amino acid features (Ala, Arg, Asn, Asp, Cys, Glu, Gln, His, Ile, Lys,
Ser, The, Val, Acd, Bsc, Chrg, Pol, Sml and Tiny) were statistically significant. It showed that
they were preferred either in thermophilic or mesophilic proteins. Similar analysis was carried
out by Kumwenda et al. (2013) in homologous sequences of Thermus scotoductus and Thermus
thermophilus. The amino acids were tested using Student t-test and Wilcoxon t-test in such
sequences and showed increased substitution of charged amino acids (Arg and Glu) in T.
thermophilus HB27 [33].
1
4