Catalyst Library Evolution
FULL PAPER
tion G0 was randomly selected. Simulated evolution results
were not affected by RR variations (data not shown). The
explanation lies on the size of replacement space compared
to the proportion (about 20%) of the library that contains
moderate to efficient catalysts (Figure 4). One can assume
that a randomly selected G0 contains the same proportion
of efficient catalysts. The maximum size of the replacement
space we tested corresponds to RR=18, that is, a replace-
ment space of 75% of each generation. Since an influence
of RR on the first evolution loops is expected only if RR is
larger than 80% of the G0 size, simulated evolution results
were therefore not affected by RR variations. It also implies
that the algorithm should be robust with regards to RR var-
iations during the next evolution loops.
At last, we studied the effect of the size of the mother
generation G0. We expected that increasing the size of G0
would incorporate more chemical diversity and thus increase
the GA performance, in the cost of more HTE required at
each generation. The G0 sizes we implemented were 24, 36
and 48, the catalysts being randomly selected. Taking into
account the lack of influence of the RR variations on the
GA performance, we selected a maximum RR (14, 26 and
38 respectively, so the 10 best catalysts could eventually be
recovered and kept throughout the simulated evolution ex-
periments). Finally, NI was set to 40 to bring to light any
generation size effect. The mutation and crossover rates
were RE=2 and RM=3. To our surprise, the size of G0 did
not seem to have any influence on the GA performance
(Table 5). The MNT was unaffected by G0 size variations, as
learned from former simulations that it is advantageous to
start with a randomly selected mother generation. On the
other hand, this first generation could contain only ineffi-
cient catalysts, which can seriously impede the evolution
process, as demonstrated previously when we selected the
mother generation.
The efficiency of an optimization process is thus directly
linked to the quality of the mother generation. But we also
know, since we evaluated the whole library, that it contains
about 75% of inefficient catalysts. The probability P that n
members of G0 originate from this zone of inactivity can be
approximated by Equation (1).
ꢀ
ꢁ
n
m
N
Pðc ¼ nÞ ¼
ð1Þ
In which N is the total number of catalysts and m the
number of catalysts in the zone of inactivity. For the library
(m/N ꢀ0.75), the probability that the mother generation
contains only inefficient catalysts is thus P
N
Even if this probability is low, we have already shown that a
simulated evolution starting from such a mother generation
is unsuccessful. We forced the GA to select G0 randomly,
but solely from the zone of inactivity, that is, catalysts built
with A1, A7 and A11 were excluded from G0. The GA pa-
rameters were RE=2, RM=3 and RR=14. In addition, the
GA could perform a maximum of 199 evaluations, which is
close to 10% of the library. As shown in Figure 10, the per-
formance of the GA was low: first generations were associ-
ated to small NPF and only two of top ten catalysts were
found.
Table 5. Variation of the size of G0.
To incorporate more chemical diversity and thus to im-
prove the performance of the evolution process, one could
increase the number of catalysts in the mother generation,
from 24 to 48 for instance. However, it would prevent the
GA from meeting our requirement of 10% catalysts to be
tested. One could also increase RE and RM, but we have
shown that only 3 catalysts from top 10 were found within
the 200 first experiments (3ꢀ5.7*(200 HTE/340 HTE),
Table 4) which is a rather low improvement.
To deal with simulated evolutions that could start from a
mother generation containing only inefficient catalysts, we
developed a double algorithm (DA) which was evaluated by
simulated evolution. The first step of this algorithm consist-
ed in turning G0 into G1 using 14 mutations. Thus, the 10
best catalysts from G0 were transmitted to G1, and the
other catalysts underwent a mutation, which introduced
more chemical diversity in the mother generation. After this
high-mutation-rate step, the probability that all catalysts
from G1 came from the inefficient part of the library dra-
matically decreased to about 0.001%, that is, a very small
probability. We then applied to G1 the “common” algorithm
with RE=2 and RM=3 for the next 23 generations and for
a total of 199 evaluations (including the 14 tests necessary
for the G0–G1 high-mutation step). As depicted in
Figure 11, the results of the GA were significantly improved:
the mean NPF was increased, as well as the number of top
Size of G0
Mean top 10
MNT[b]
sMNT
A1B19M2 [%]
24[a]
36[a]
48[a]
6.5
6.5
6.1
40
41
48
14
9
15
90
80
70
[a] Results given for 10 simulated evolutions in every set. [b] The lower
the MNT, the faster the evolution is.
well as the number of top 10 recovered catalysts, which nev-
ertheless was more than doubled due to a larger NI. Thus, a
mother generation of 24 catalysts was sufficient to find the
best catalysts. The simulated evolution processes performed
with these RE and RM intermediate values provided good
results. Finally, the size of G0 might have more influence on
simulated evolution performed within a larger library, as a
G0 of 24 members, that is, more than 1% of this library, is
still a large mother generation. Note that each simulation
process led to at least one catalyst from the top 10, which
gives the reason why the column dedicated to processes
with zero top 10 catalysts is no longer presented.
Simulated evolution with a double algorithm (DA): We now
address a real case of optimization as must be managed by a
simulated evolution process: The whole library can surely
be pictured by the chemist, but only a few catalysts would
really be synthesized and evaluated. On one hand, we
Chem. Eur. J. 2009, 15, 6267 – 6278
ꢃ 2009 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim
6275