62
B. Vyas et al. / Journal of Molecular Graphics and Modelling 59 (2015) 59–71
Among the selected proteins, a universal crystal structure that
The best model was selected based on various statistical parameters
including Qt2est, R2
could serve as a representative for ALR2 protein to analyze the
interactions of its ligands, was selected based on cross-validation
results of all proteins [44]. In the cross docking protocol, the
crystal ligands of all selected protein structure were sketched,
energy minimized, and their global minimum conformation was
generated using ‘systematic conformational sampling’. Thereafter,
the generated conformation of each crystal ligand was re-docked
into the active site of each protein structure. After re-docking, the
docked conformation of each crystal ligand was compared with its
respective native crystal conformation, and RMSD between these
two conformations was determined in terms of all atoms. Finally,
an average RMSD value was calculated from the RMSD values of all
crystal ligands in each protein system, and a protein describing the
lowest value of average RMSD was selected for ligand docking.
For the docking runs, the maximum number of docked confor-
mations was set as ten, and the best docked conformation of each
molecule was selected based on docking energy and visual inspec-
tion. The visual inspection includes analysis of hydrogen bonding,
orientation and position of molecules.
train
The selected model was validated for its prediction ability and
reliability. The selected model was validated by performing “Y-
randomization” test, leave group out (LGO) cross-validation, and
calculating “applicability domain (APD)” [50,51] and statistical
“Applicability domain” was calculated to describe the reliability
of generated model to predict new molecules. For “APD” calcula-
tion, similarity measurements were calculated based on Euclidean
distances between all pairs of training and test set compounds
separately [50,51]. Initially, average (ꢁ) of all Euclidean distances
between the training set compounds was calculated, and there-
after new average <d> and standard deviation (ꢂ) of training set
molecules with distances lower than the value of ꢁ were computed.
Z is an empirical cutoff with default value of 0.5.
APD =< d > +ꢂZ
To evaluate the sensitivity of the generated models to chance
correlations ‘leave many out (LMO)’ parameter was calculated. In
LMP cross-validation, ‘n’ molecules were omitted from the data
set, and remaining molecules were utilized to develop new model,
which was further used to predict the omitted ‘n’ molecules. The
same process was continuously repeated until each molecule is
omitted and predicted once.
sion quality of generated model. For this test, the activity data of
training set molecules was scrambled, and new training sets were
generated, which were thereafter utilized to develop new QSAR
models and determine new Rt2rain values [50,51]. The average of
Rt2rain obtained from scrambled (new) training sets, denoted by
Rs2cramble, was determined that must describe low value as com-
2.3. Protein–ligand complex minimization
To remove the unfavourable steric clashes, ligand–protein
complexes—obtained from “Glide” docking—were processed using
“MM-GBSA” (Molecular Mechanics/Generalized Born Surface Area)
calculations in “Prime” (version 3.1) program [45]. For this pur-
˚
pose, the receptor segment of 7 A around the docked ligand
was kept as flexible during complex minimization. In addition
to minimization of ligand–protein complex, strain and binding
free energy of the ligand was also calculated. The binding free
energy (ꢀGbind), calculated with “MM-GBSA”, can be formalized as
ꢀGbind = ꢀEMM + ꢀGsolv − TꢀS relationship, where ꢀEMM, ꢀGsolv
and −TꢀS are the changes of the gas phase molecular mechan-
ics energy, solvation free energy and conformational entropy upon
dihedral energies), ꢀEelec (electrostatic) and ꢀEvdw (van der Waals)
energy. ꢀGsolv (salvation energy) is a sum of electrostatic solvation
energy (polar contribution), ꢀGGB, and the non-electrostatic solva-
tion component (non-polar contribution of surface energy), ꢀGSA
[46].
pared to original Rt2rain value of the selected model, if not, then the
selected model would be meaningless and generated by chance.
In addition to these validation parameters, professor Golbraikh
and Tropsha have described a criteria of few statistical parameters
that the developed QSAR model should satisfy order to get consid-
ered as a reliable model. A good QSAR model should always have
coefficient of correlation close to the ideal model (Rt2rain = 1) for
having high predictive reliability. In addition to original Rt2rain of the
generated model, the correlation coefficients of the regression lines
passing through the origin, i.e. Ro2 (predicted versus observed activ-
2
ities) and R0ꢀ (observed versus predicted activities) should also be
2.4. Atom-based 3D-QSAR: model generation and validation
2
close to Rt2rain. Moreover, [(R2 − Ro2)/R2] and [R2 − R0ꢀ ]/R2 should be
In computer assisted drug design, 3D-QSAR term describes the
development of a mathematical regression model between 3D
structural features of molecules (i.e. independent variable denoted
as X) and their corresponding biological activity values (dependent
variable denoted as Y), with the help of chemometric techniques,
e.g. Partial Least Square (PLS) analysis. The generated 3D-QSAR
model could be used to extract out the vital structural features of
molecules required to improve their biological activity, and for the
activity prediction of newly designed molecules [47–49].
2
2
less than 0.1, and the corresponding ‘slopes’ of Ro and Roꢀ regres-
sion lines, i.e. k and kꢀ should be between 0.85 and 1.15 (0.85 ≤ k,
kꢀ ≤ 1.15) [52].
2.5. Chemical synthesis
2.5.1. Material required
All chemicals required for synthetic protocol were purchased
from Sigma Aldrich, and they were >99% pure as certified by man-
ufacturer, thus used without any further purification. At each step,
the completion of chemical reaction was monitored with thin layer
chromatography {DC-Alufolien (20 cm × 20 cm) Kieselgel 60 F254
chromatic plates} using hexane:ethyl acetate (6:4) as TLC develop-
ment solvent system. The products of all chemicals reactions were
purified by either eluting on silica columns or re-crystallization
from appropriate solvent. Melting point of each compound was
noted on Labtronics digital automatic melting point apparatus. The
characterization of all compounds was performed based on their
respective spectral data. Infra red (IR) spectra of compounds was
recorded on Bruker® (Alpha E) FT-IR spectrometer, mass spectra on
To develop an atom based 3D-QSAR model, an aligned bun-
dle of all training set molecules was placed into a regular grid of
˚
˚
˚
cubes (1 A × 1 A × 1 A). Each cube of the grid is allotted a 0 or 1 “bit
value” to account for different types of atom features present in the
molecules that occupy these cubes. In this way, binary expression
ues, which was utilized as independent variable, and subsequently
correlated with biological activity to develop a 3D-QSAR model. For
model generation, “t-value” was set as less than 2, and grid spac-
˚
ing as 1 A. “PHASE” program [47,48] generates a series of 3D-QSAR
models considering progressively more PLS factors, which should
not be more than 1/5th of total number of molecules in training set.