ACCEPTED MANUSCRIPT
2
+
koshunensis [9, 24, 30, 35]. The binding pocket for Ca for GHs such as endoglucanase, α-amylase, pullulanase,
2+
and CGTase have been well-studied [51-55]. There is lack of studies in the literature on the Ca binding site for
2
+
BGLs. In one particular study, a Ca ligand was identified in the crystal structure of GH1 BGL from Thermotoga
2+
maritima [56]. This particular structure consists of 4 chains (PDB id 2WC4). However, Ca bound only to
chain-C interacted by a single amino acid, Glu329 (T. maritima numbering). It is likely that the interaction
2
+
2+
between Ca and Glu329 was unspecific. In another structure of T. maritima BGL (PDB id 2J7B), the Ca
interacted with Asp278, Ser281, and Glu282 [57]. Nevertheless, these residues were not conserved for BglD5. It is
2
+
thus at this point that the Ca binding pocket for BglD5 remains to be determined.
A slight increment in BglD5 enzyme activity was also observed when magnesium chloride was added to
the reaction tube. A similar effect was previously reported, where MgCl2 enhanced the BGLs from
2+
Cellulomonasmicrobium cellulans and T. thermosaccharolyticum [8, 29]. Similar to that of the Ca cation, the
2
+
binding site for the Mg cation was not emphasized in any earlier studies. The protein structure of GH1 BGLs
2
+
that were found to have an Mg binding site are 3VKK and 2ZOX, both of which originate from humans [58].
2
+
The physiological relevance of Mg to BGL is yet to be confirmed.
To date, over 330 BGL structures of various GH families are available at the PDB database. GH1 BGLs
consist of a typical TIM-barrel fold similar to BGL from P. polymyxa (PDB id 2O9P), H. orenii (4PTX) or T.
harzianum (5BWF), and the active site pocket is located near the C-terminal region of the (α/β) barrel [31]. Based
8
on template 4PTX, the predicted structure of BglD5 has a similar overall structure to the template (Appendix A2).
The sequences of GH1 BGLs are very diverse. Among the examined 70 sequences (Appendix A3), only 17
residues are completely identical at designated positions. Almost all GH1 BGLs had a NEP amino acid stretch
(
161-163 BglD5 numbering), where the Glu is the catalytic sites (acid/base). Interestingly, enzymes identified in
some Bacillus spp. and J. soli (A0A0C2RQE, A0A0Q6HVV2, D5E2L2, A0A0M1NP20, and A0A098FCN7;
formed a group in Fig. 5) had the stretch NET (Threonine instead of Proline) following the acid/base catalytic
site. Besides, stretch TENG (347-350 BglD5 numbering, where the Glu is among the catalytic sites (nucleophile),
is another conserved region in GH1 BGLs from bacteria, archaea and eukaryotes. Exceptionally, vegetative weed
Arabidopsis thaliana (AAB64244.1) had a sequence of MENG.
SPRINT cataloged the presence of five motifs for BGLs [19]. Nucleophile Glu348 was placed in motif 2.
Motifs 1, 4, and 5 consist of several important substrate interacting residues (Fig. 3). These five motifs are very
diversifying in amino acid selection (Fig. 4). Interestingly, we notice the presence of a highly conserved region
11
19
(
BglD5: GTATSSFQI ) near the N-terminal of GH1 beta-glucosidase that was not curated in the SPRINT
database. ScanProsite predicted that this conserved region is Glycosyl Hydrolases family 1 N-terminal signature
(
(
Fig. 3 and 4). Based on earlier findings related to BGL obtained from O. sativa subsp. japonica, a residue Gln
Q29) in protrude to the direction of glycon residue and formed hydrogen bond around subsite -1 [26]. It is thus
possible that N-terminal signature may be important for substrate recognition or catalytic reaction.
As mentioned previously, the BGL superfamily was grouped into four GH families. Pei, et al. [8] proposed 5
clades for BGLs affiliated to GH1 and GH3. Accordingly, the report suggested that Clade I is represented by GH1
enzymes from mesophilic bacteria; Clade II is grouped by GH1 BGL from fungi; Clade III by GH3 BGL from
bacteria; Clade IV by GH3 enzyme from fungi; and Clade V contains a mixture of GH1 BGL from thermophilic
bacteria and Bacillus. As the analysis applied limited datasets, it may not comprehensively provide clear insight.
Besides, the proposed grouping may be confusing, as the existing clade-groupings for GH3 beta-glucosidase [59]
have not been considered in the analysis [8]. In fact, CAZy has already classified families GH1, GH5, and GH30
to the Clan GH-A [3]. Therefore, classification suggested by Pei, et al. [8] should be relooked, since, indeed, there
is a need to propose more comprehensive groupings. In this work, we constructed a Maximum Parsimony tree
using 70 sequences of genuine GH1 (Criteria: (i) contains 2 highly conserved Glu as catalytic residues, and (ii)
consisted of GH1 domain). The tree (Fig. 5) is robust, as the placement of taxa is similar to when a
Neighbour-Joining tree was used (not shown). BglD5 was clustered together with the BGLs from thermophilic
Anoxybacillus spp. Interestingly, two other uncharacterized BGLs from J. soli (closer to Bacillus megaterium
beta-glucosidase) and J. campisalis (closer to Bacillus halodurans) were located far away from BglD5, suggesting
that BglD5 had evolved more rapidly compared to other counterparts. BGLs from hyperthermophiles, in
particular, archaea, were clustered at the base of the tree, while enzymes from eukaryotic sources (fungi, termites,
plants, and silkworms) formed a distinctive group. The rest of the trees were mainly BGL sequences from
bacteria. For the current tree, we could only suggest a relative relationship among these enzymes, and it is not
possible to propose defined clades, since BGLs (in particular, bacteria origin) were not grouped according to
taxonomy, optimum growth temperature, or salinity requirements. Instead of sequence classification, functional
classification summarized by a review article [13] may be more appropriate. According to the authors, BGLs are
categorized as class I (aryl beta-glucosidases), class II (true cellobiases), and class III (broad substrate specificity
enzymes), and we agree that this functional classification is important to differentiate BGLs. For the group that
consists of BglD5 from J. malaysiensis, DT-Bgl from Anoxybacillus sp. DT3-1, and other uncharacterized BGLs
from Anoxybacillus spp. (Fig. 5), these BGLs can be classified as class III.