K. T. Nguyen et al. / Bioorg. Med. Chem. Lett. 19 (2009) 3832–3835
3833
inactive compounds, and are computationally very fast and there-
fore applicable to screen large databases.11 141 of the top docking
hits were selected as actives to train the classifier, considering
structures that would be relatively straightforward to prepare, in
particular amides, while complex combinations of polyenes, ami-
dines and hydrazones were rejected. All compounds docking weak-
er than glycine were used as inactives. The classifier returned
35,810 additional compounds from GDB scoring higher than the
lowest Bayesian score for the active set, from which 182 com-
pounds docked stronger than glycine.
A
B
4000
2000
0
glycine
C
Analysis of functional group enrichment showed that docking
did not select for monoamines, but strongly enriched amino acids
and amides, which is similar to our previously reported study3
(Table 1). Amides were very frequent in the final hit set because
we chose this functional group for ease of synthesis in the selection
of top-docking compounds from the first round of docking. Dock-
ing also enriched hydrogen bonding donor (HBD) and acceptor
sites (HBA). The random GDB selection used in this study had few-
er HDB (2.08) than the compounds previously selected on the basis
of known actives (5.45), but significantly more HBA (3.17 vs 2.13).
The higher HBA in the GDB random set compared to the Bayesian
hit sets might explain its better mean docking energy reflecting
high-scoring interactions with donor sites on the proteins.
As in our previous study3 virtual screening favoured acyclic
compounds, with many virtual hits showing structures reminis-
cent of 1 and 2. On the other hand the hit set also contained a
few six-membered ring heterocycles, in particular the intriguing
diketopiperazine ligands (S)-3 and (S)-4 at rank #58 and #20,
and docking energies of À8.36 kcal/mol and À8.81 kcal/mol,
respectively. In their best docking pose, the key carboxylate phar-
macophore of known amino acid inhibitors was replaced by the
diketopiperazine group, and the primary amino group, although
placed quite differently from that of glycine, interacted with anio-
nic residues similarly to the amino group of dipeptides 1 and 2
(Fig. 3). Since 3 and 4 had never been reported previously, we set
out to synthesize and test these ligands against the NMDA glycine
site.
-10
-5
0
5
10
kcal/mol
Figure 2. Distribution of docking energies of (A, red line) the 31,121 stereoisomers
generated from 8000 randomly selected GDB structures; (B, blue line) the 81,877
stereoisomers generated from the 35,810 virtual hits from the Bayesian classifier
trained with the docking results of A; (C, grey line) the 69,367 stereoiosmers
generated from the 15,061 virtual hits from the Bayesian classifier trained with
known NMDA-receptor inhibitors, as described in Ref. 3. The black vertical line
indicates the DE of glycine (À7.82 kcal/mol). Stereoisomers were generated from
SMILES using CORINA and docked into the glycine binding site of the NMDA-
receptor (pdb: 1PB7)20 using AUTODOCK3.0.5 for 10 cycles. In each case the energy of
the most favourable pose was used for scoring.
In our search for NMDA glycine site inhibitors from GDB-11, we
hypothesized that innovative ligands might be found if virtual
screening would be guided from the protein structure only without
using information from existing ligands. With docking as our main
tool,8 we first screened a randomly selected subset of 8000 GDB
structures. The structures were expanded to 31,121 stereoisomers
using CORINA9 and evaluated by AUTODOCK3.0.5,10 resulting in a typ-
ical gaussian curve for the distribution of docking energies (Fig. 2).
There were 702 molecules docking better than glycine, among
which amino acids were significantly enriched, showing the natu-
ral tendency of the binding pocket to select for that class of
compounds.
Diketopiperazine 3 was prepared as the (S) enantiomer in
four steps and 33% overall yield (Scheme 1). Peptide coupling
between CBz-protected L-asparagine (5) and glycine methyl ester
gave dipeptide 6. Oxidative Hofmann degradation12 of the aspar-
agine carboxamide side-chain using I,I-bis(trifluoroacetoxy)iodo-
benzene13 and protection of the primary amine gave the Boc-
derivative 7.14 Hydrogenation of the CBz group of the terminal
A Bayesian classifier was applied next to extract further ligands
from GDB. Bayesian classifiers determine a bioactivity probability
score for any compound from the product of the relative frequency
of occurrence of all its substructures in known active versus
a-amino group and intramolecular cyclization under mild basic
Table 1
Enrichment statistics during virtual screening
Library
GDB-11 Random set
1st Rounda top docking
Selectedb
Bayesian hitsc
2nd rounda top docking
Ref. 3 Bayesian hitsd
Ref. 3 top dockinga
Size
8000
25.8
702
141
35,810
33.8
0.42
23.3
2.29
2.35
55.9
182
24.7
18.1 (150)
41.8
2.82
15,061
21.7
7.3 (1100)
3.6
5.45
2.13
712
20.8
22.9 (163)
10.3
5.25
Monoaminese
%
28.4
1.00 (7)
16.0
3.32
3.38
71.8
42.6
2.1 (3)
24.8
1.59
2.84
55.3
Aminoacidsf
%
0.36 (29)e
9.2
Amidesg
%
HBD averageh
HBA averagei
Acyclics %
2.08
3.17
27.5
4.94
91.2
2.63
86.2
28.3
a
Molecules of the random set (1st round) or from the Bayesian hits in this work (2nd round) or from Ref. 3 docking stronger than the crystallographic ligand glycine
(DE = À7.83 kcal/mol) in the NMDA glycine site binding pocket as estimated by AUTODOCK 3.0.5.
b
Subset of 1st round top docking with simple functional groups.
All compounds from GDB-11 scoring better than the worst of the selected compounds in a Bayesian classifier trained with the 141 selected compounds as actives and all
c
structures with DE >À7.82 kcal/mol as inactives.
d
Bayesian hits from Ref. 3, in this case the classifier was trained with known actives and an ACX random set as inactives.
Monoamines are molecules without carboxylic groups and exactly one nitrogen atom with only H or saturated C-atoms neighbours.
Amino acids are monoamines with exactly one carboxylic acid. The absolute number of molecules is indicated in parentheses.
Amides are molecules with at least one amide function.
HBD is the hydrogen bond donor site count, that is, the total number of NH and OH bonds.
HBA is the hydrogen bond acceptor site count, that is, the total number of lone pairs.
e
f
g
h
i