Communications
site. Furthermore, all serine proteases bind their substrates on
the unprimed side through hydrogen bonds to the peptide
backbone.
subpockets and ligand occupants across a protein family is
easily possible.
Once a particular ligand fragment had been selected as a
putative building block for the construction of a combinato-
rial library, a search in the Sigma–Aldrich catalogue was
performed using the program JME.[16] For the different
subpockets, these searches retrieved possible reaction com-
ponents, which are given in the Supporting Information. The
list of commercially available starting materials was then
evaluated with respect to attached functional groups that are
suited to connect the individual building blocks by a generally
applicable synthetic route. In the synthesis design, we decided
to focus on ester, amide, and sulfonamide bond formation as
well as on nucleophilic substitution reactions. Three frag-
ments, (S)-prolinol, (S)-2-(hydroxymethyl)piperidine, and
(S)-valinol, were finally selected as central moieties to
address the S2 pocket (Scheme 1).
We picked thrombin as a reference for this case study. This
enzyme exposes three well-defined binding subpockets to
recognize the N-terminal substrate. Using these cavities as
queries for a Cavbase search revealed about 4000 hits in each
case. For the S1 subpocket, 646 cavities from other serine
proteases were matched; for the S3 pocket, 550 such cavities
were found. In case of the S2 pocket, only 274 entries from
other serine proteases were retrieved, thus indicating the
structural uniqueness of S2 in thrombin. This selectivity-
determining pocket is rather tight and spatially restricted by
the thrombin-specific 60 loop.
Closer analysis of the matching S3 pockets highlights the
hydrophobic character of this site, which is strongly deter-
mined by a highly conserved tryptophan residue forming the
floor of the pocket (e.g. acrosin, cathepsin G, factor VIIa,
factor IXa, factor Xa, factor XIa, chymase, trypsin, C1s-
elastase, tryptase, t-plasminogen activator, protein C, and
granzyme A). Similarly, the search retrieved S1 pockets from,
for example, cathepsin G, factor VIIa, factor IXa, factor XIa,
trypsin, tryptase, protein C, factor Xa, t-plasminogen activa-
tor, enteropeptidase, and urokinase.
Subsequently, the actual occupants were extracted in
terms of ligand fragments. As a first attempt, we decided to
design a compound library for thrombin. Thus, in each step, a
comparison with the thrombin reference subpocket was
performed to retrieve only fragments that potentially exhibit
a similar interaction pattern with the target protein. Evaluat-
ing the occupants of the S2 pocket matches highlights the
unique shape of this pocket in thrombin: among the 100 best-
ranked solutions, 98 examples are found in other thrombin
structures.
To further diversify the central P2 moiety, a glycine spacer
was attached between P2 and P3. The functional groups of
this additional spacer are appropriate to address the non-
specific backbone recognition site exposed by all serine
proteases. We chose ester bond formation to connect the
central moiety with a group propitious to address the S1
pocket. Finally, to place a suitable building block into S3, an
amide bond was formed using either the amino functionality
in the P2 central moiety or the glycine spacer. Alternatively,
bond formation through nucleophilic substitution was envis-
aged. In Scheme 1, carboxylic acid derivatives suited to
address the S1 pocket are listed together with carboxylic and
sulfonic acid derivatives as well as the chloro derivatives to
address the S3 pocket. The selected P1 to P3 building blocks
were assembled on the computer following Scheme 2 and
docked into the binding pocket of thrombin using the
combinatorial module of FlexX.[17,18] Two libraries, with and
without the glycine spacer, were assembled, each comprising
507 possible members. For each docking run, the 10 best
solutions were stored and rescored using DrugScoreCSD.[19]
Visual inspection of the docking solutions was performed
using the graphical interface to evaluate the generated
docking modes in terms of per-atom scoring contributions.[20]
For synthetic simplicity we decided to use the 3-chloro-
benzyl and 4-cyanophenyl portion from the list of best-scored
solutions to address the S1 pocket (apart from ligands
exhibiting an amidino or guanidino portion at this site,
which involve more synthesis steps). Selection of synthesis
candidates for S2 and S3 was guided by synthetic feasibility
and reactivity differences of the starting materials. The finally
selected 2-pyridylacetic acid, p-fluorobenzoic acid, and the
tert-butyloxy carbonyl moiety for S3 showed up in several of
the best-scored hits. From the list of best-scored derivatives,
compounds 2–6 were selected, synthesized (Scheme 3), and
subjected to a photometric enzyme kinetics assay.[21] They
showed inhibition in the micromolar range (see Scheme 3 and
the Supporting Information). Subsequently, we succeeded in
diffusing 5 into crystals of thrombin, from which the complex
structure could be determined (Figure 2). This structure can
serve as a starting point to embark upon structure-based lead
optimization.
Ligand fragments with an isopropyl group, a small
aliphatic or heteroaliphatic ring such as a pyrrolidine
moiety, or partially aromatic building blocks were preferen-
tially detected (Figure 1).
In thrombin and many other members of the trypsin-like
serine proteases, the S1 subpocket is dominated by the
aspartate189 residue, which is embedded in a highly con-
served hydrophobic environment. The comparison of binding
pockets reveals ligand building blocks with comparable
pysicochemical properties, while the exact degree of similarity
in the amino acid sequence of the surrounding protein is much
less important. Nevertheless, besides the popular basic
residues derived from benzamidine, guanidine, or amino-
pyridine building blocks, halogen-substituted aromatic moi-
eties were also suggested; these interact with a conserved
tyrosine residue at the floor of the S1 pocket (Figure 1). The
most pronounced chemical diversity of occupants is proposed
by the Cavbase search for the S3 pocket. For this pocket,
possible side chains originate from other members of the
serine protease family (Figure 1).
Thus, we see the strength of this approach, as not only
well-known thrombin inhibitors are used to create the
combinatorial library. Cavbase provides easy access to an
entire class of proteins, and simple “hopping” between
ꢀ 2007 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim
Angew. Chem. Int. Ed. 2007, 46, 9105 –9109
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&
&&&&
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
&
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
Take advantage of blue reference links
&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&