Content uploaded by Abdallah Rhihil
Author content
All content in this area was uploaded by Abdallah Rhihil on Mar 10, 2022
Content may be subject to copyright.
Content uploaded by Abdallah Rhihil
Author content
All content in this area was uploaded by Abdallah Rhihil on Mar 10, 2022
Content may be subject to copyright.
Abstract Structure-toxicity relationships were studied
for a set of 47 insecticides by means of multiple linear
regression (MLR) and artificial neural network (ANN).
A model with three descriptors, including shape surface
[S(R2)], hydrogen-bonding acceptors [HBA(R2)] and
molar refraction [MR(R1)], showed good statistics both
in the regression (r = 0.875, s = 0.417 and q2= 0.675)
and artificial neural network model with a configuration
of [3-5-1] (r = 0.966, s = 0.200 and q2= 0.647). The sta-
tistics for the prediction on toxicity [log LD50 (lethal
dose 50, oral, rat)] in the test set of 20 organophosphorus
insecticides derivatives is (r = 0.849, s = 0.435) and (r =
0.748, s = 0.576) for MLR and ANN respectively. The
model descriptors indicate the importance of molar re-
fraction and shape contributions toward toxicity of orga-
nophosphorus insecticides derivatives used in this study.
This information is pertinent to the further design of new
insecticides.
Keywords Multiple linear regression · Artificial neural
network · Organophosphorus · Insecticides · Acute LD50 ·
Descriptors
Introduction
Although the benefits of pesticides [1] are undeniable,
attention has been focused in recent years on their im-
pact on human health and environment. Pesticide is a ge-
neric term for a variety of chemical classes such as in-
secticides, herbicides, fungicides and nematicides.
Although pesticide laws require that both risks drive
the processing terms of depth of analysis and allocation
of federal resources, two questions relative to risks are
appropriate: What is “acceptable risk”? How can we
minimize the assessing the risk?
Computer simulation techniques potentially offer a
further means to probe structure-toxicity relationships.
Quantitative Structure-Activity relationships (QSAR) [2,
3] represent the most effective computational approaches
in drug design. QSAR is largely used to predict activities
and to define pesticides models [4, 5, 6].
In the present article, we have attempted to establish
Structure-Toxicity relationships for organophosphorus
insecticides derivatives by multiple linear regression
(MLR) and artificial neural networks (ANN).
The objectives of our study were both to provide sup-
plementary information concerning the behavior of these
compounds and further define the criteria necessary for
the rational design of a new generation of organophos-
phorus type insecticides.
Material and methods
Experimental data
The organophosphorus insecticide derivatives were taken
from different literature sources [7, 8, 9]. The log LD50
(lethal dose 50, acute, oral, rat) values were used as the
dependent variable that represents LD50. The toxicity
(LD50) was expressed in mols of toxicant per kilogram of
body weight. When LD50 is given in an interval, the min-
imum value was used.
The chemical structures along with experimental tox-
icity (log LD50) data of the compounds used in this study
are shown in Table 1.
M. Zahouily (✉) · A. Rhihil · H. Bazoui · D. Zakarya
UFR Chimie Appliquée,
Laboratoire de Synthèse Organique
et Traitement de l’Information Chimique,
Département de Chimie, Faculté des Sciences et Techniques,
B.P. 146, Mohammedia 20650, Maroc
e-mail: zahouily@voila.fr
Tel.: +212-023-314705/08, Fax: +212-3-315353
A. Rhihil · H. Bazoui · S. Sebti
Laboratoire de Chimie Organique
Appliquée et Catalyse(LCOAC),
Faculté des Sciences Ben M’Sik B.P. 7955 Sidi Othmane,
Casablanca, Maroc
J Mol Model (2002) 8:168–172
DOI 10.1007/s00894-002-0074-0
ORIGINAL PAPER
Mohamed Zahouily · Abdallah Rhihil
Halima Bazoui · Saïd Sebti · Driss Zakarya
Structure-toxicity relationships study of a series
of organophosphorus insecticides
Received: 3 December 2001 / Accepted: 4 February 2002 / Published online: 16 May 2002
© Springer-Verlag 2002
169
Table 1 Chemical structures compounds studied and experimental toxicity (log LD50) values
Structural descriptors
Several structural descriptors and physicochemical vari-
ables were used to characterize the organophosphorus in-
secticide derivatives under study. Those descriptors were
calculated for the substituents R1, R2and X.
These include the octanol/water partition coefficient
(log P), [10] used as a descriptor of the hydrophobic mo-
lecular properties and electronegativity, [11] hydrogen-
bonding donors (HBD), hydrogen-bonding acceptors
(HBA) [12] and molar refraction [13].
The size and shape of the substituents were quantified
by their van der Waals volume (V), molecular weight
(MW), surface (S), [14] length (L), V/L [15] and topo-
logical descriptors [16].
60 parameters were calculated for each compound.
Statistical methods
Multiple linear regression (MLR)
Multiple linear regression was used to generate the linear
models and it was performed with the Unistat statistical
package running on a Pentium PC.
Because of the large number of descriptors consid-
ered, a stepwise multiple linear regression procedure
based on the forward-selection and backward-elimina-
tion methods was used to select the powerful descriptors.
In order to avoid all difficulties in interpretation of
the resulting models, pairs of variables with a correlation
coefficient larger than 0.7 were classified as intercorre-
lated, and only one of these was included in the screened
models. The quality of the model was proven by the cor-
relation coefficient square r2, the standard deviation s
and the Fischer test value (F), when all parameters in the
model were significant at the 95% confidence level. An
analysis of the predictive ability was carried out in two
ways. The predictive ability in the training set
(47 compounds N°1 to 47) was carried out using leave-
one-out cross-validation. For a reliable model, the
squared predictive correlation coefficient q2[17] should
be >0.60 [18]. In addition, 20 organophosphorus insecti-
cide derivatives (N°48 to 67) were retained to test the ac-
tual prediction of the model (r2and s are considered).
Artificial neural network (ANN)
The ANN [19] was trained by the back-propagation (BP)
of errors algorithm [20] had the following architecture:
●An input layer including pertinent descriptors of
MLR
●A hidden layer for which the ratio of the number of
data points in the training set and the number of con-
nections controlled by the network, ρ, is critical to the
predictive power of the neural net. The range 1.8<ρ
<2.2 [ρ= (number of data points in the training
170
set)/(number of adjustable weights controlled by the
network)] [21], was used as a guideline for an accept-
able number of neurons in the hidden layer. It is
claimed that, for ρ<<1.0, the network simply memo-
rises the data, whereas for ρ>>3.0, the network loses
its ability to generalize.
●Output layer of one neuron, representing the toxicity
(log LD50). The input and output values were normal-
ized.
After this step, the learning rate was varied from 0.01 to
0.9, and for each learning rate the momentum was exam-
ined from 0.1 to 0.9. The number of the neurons in the
hidden layer with the use of optimized momentum and
learning rate was determined.
Finally, to preclude training [22] we have studied the
variation of the root mean-squares (RMS) error versus
number of iteration and we have used two strategies for
testing the validity of the selected ANN model.
Results and discussion
Multiple linear regression analysis
Multiple linear regression was performed on the com-
pounds described in Table 1. We included all 47 organo-
phosphorus insecticides derivatives (compounds N°1 to
47) of the training set for the model generation. After col-
lecting the data, we submitted all parameters to the regres-
sion; many models were generated using this method. We
obtained the best models without constant terms [Eq. (2)]
because the constant term is not statistically significant.
However, an ideal model [Eq. (2)] is one that has high r2
and F values, low standard deviation, least numbers of in-
dependent variables, and high ability for prediction.
(1)
(2)
The statistical quality of Eq. (2) is fairly good and ac-
counts for 77% of the variance in log LD50. Low toxicity
(high log LD50 values) is associated with high shape sur-
face [S(R2)] and hydrogen-bonding acceptors [HBA(R2)]
with decreased molar refractivity [MR(R1)].
The plot of experimental log LD50 versus calculated
log LD50 is given in Fig. 1a. Cross-correlation analysis
171
showed that all pairwise correlations were ≤0.229 in this
equation, also indicating a low collinearity (see Table 2).
In the cross-validation phase, 47 subsets were created ac-
cording to the leave-one-out method and the output of
the removed compound was predicted for each subset.
The cross-validation coefficient obtained was: q2= 0.675.
The model obtained was considered to be good predic-
tive one, according to Wold [18].
As a second strategy, the toxicity of 20 organophos-
phorus insecticides was predicted by using the best MLR
model [Eq. (2)].
Results for the prediction in test set of 20 compounds
were r2= 0.721 and s = 0.435. There were two compounds
with a large estimation error for Eq. (2) (compounds N°59
and 64), and these were excluded from the standard devia-
tion of predictions (s = 0.369). Hence, these results are in
good agreement with those obtained for the training sets
and reveal an good predictive quality for the MLR model.
As biological phenomena are considered to be non-
linear by nature it therefore appears very interesting to
study the present series of compounds with the ANN
technique in order to discover possible non-linear rela-
tionships between toxicity (log LD50) and the molecular
descriptors that appeared pertinent for the linear model.
Artificial neural network analysis
The ANN was generated by using the pertinent des-
criptors appearing in the MLR model as input. A 3-5-1
neural network architecture was developed with the opti-
mum learning rate and momentum 0.2 of and 0.9, re-
spectively and with 5 000 iterations (the results of the
ANN did not vary significantly between 4 800–5 000 it-
erations). The five hidden neurons were chosen to main-
tain ρbetween 1.8 and 2.2. To verify this condition we
have tried three to eight neurons in the hidden layer and
found that five hidden neurons gives the best result for
the training and test sets, as shown in Table 3.
The [3-5-1] neural network architecture shows that
the standard deviation between calculated and observed
toxicity was 0.200, which was found to be superior to
that obtained using MLR (s = 0.417). In addition, the
correlation coefficient square between observed and cal-
culated values was 0.933. These results indicate the exis-
tence of non-linear relationships between toxicity and
molecular descriptors that appeared pertinent for the lin-
ear model. The variation root mean-squares (RMS) error
versus number of iteration is plotted in Fig. 2.
The plot in Fig. 1b indicates that there is a significant
correlation between actual values and calculated values
of logLD50.
We used the same procedure as far the MLR analysis
for testing the validity of the selected ANN model. The
Fig. 1 Experimental and predicted values from MLR (a) and
ANN (b) for the training sets.
Table 2 Correlation matrix
S (R2) ABH (R2) MR (R1)
S (R2)1
ABH (R2) 0.045 1
MR (R1) 0.039 0.229 1
Table 3 Variation of r2and s with number of hidden neurons
Hidden r2(training) s(training) r2(test) s(test)
neurons
3 0.8964 0.2389 0.5445 0.5972
4 0.9093 0.2230 0.4925 0.6185
5 0.9333 0.2000 0.5597 0.5761
6 0.9170 0.2130 0.5521 0.5813
7 0.9181 0.2113 0.5287 0.5964
8 0.9155 0.2152 0.5152 0.6028
Fig. 2 Variation of RMS error versus number iteration.
172
corresponding r2and s for the prediction in test set were
0.560 and 0.576 respectively. For the corresponding q2in
cross-validation method is 0.647 [18].
Analysis of descriptors contribution in ANN
and MLR models
To evaluate the influence of each descriptor on the calcu-
lated toxicity, we used two methods.
The first one consists of removing a descriptor and
analyzing the statistical coefficient between observed
and calculated using MLR and ANN. Comparison be-
tween these statistics and those calculated by MLR and
ANN when no descriptor was removed gave an idea
about the importance of the descriptor removed [23]. In-
deed, when the descriptor [RM(R1)] is remove, the mod-
el obtained is of lower quality (r2is only 0.212 and 0.370
for MLR and ANN respectively (Table 4).
The second method consists to use the relation estab-
lished by Chastrette [24] [Eq. (3)] to calculate the contri-
bution of each descriptor (Table 4).
(3)
Ci: Contribution of descriptor i
∆mi: The mean of deviation absolute values between
the observed and estimated toxicity for all compounds.
These contributions allow the following classification:
MR (R1) >S (R2) >HBA(R2). These results confirm the
large effect of the substituent R1on the toxicity see mol-
ecules: 9 (R1= Me) – 11(R1= Et) = -2.03 (∆activity) and
19 (R1= Me) – 21(R1 = Et) = 0.88].
To ensure that the results obtained in MLR and ANN
were not due to chance and lend credence to our results,
we have run a scrambling experiment. The dependent
variable log (LD50) is randomly scrambled and then the
same algorithms used in MLR and ANN run once again.
The statistical results as the correlation coefficient
square r2and the standard deviation s of its results are
compared with the r2and s of the MLR and ANN models
developed in this work. The r2values were 0.017 and
0.561 compared with 0.766 and 0.933 for the s values we
have obtained 0.793 and 0.513 compared with 0.417 and
0.200 for the training set in MLR and ANN, respectively.
This test confirms and clearly shows that the descriptors
selected in this study describe very well toxicity studied.
Conclusion
Two important consequences emerge from the present
report.
Firstly, taking into account the complex nature of mod-
eled biological phenomena, on the one hand, and the large
number of compounds analyzed, on the other hand, our re-
sults clearly indicate that the molar refraction is prime im-
portance for the toxicity of the organophosphorus insecti-
cides derivatives under study. In addition, the approach
used for the contributions and classification of descriptors
in MLR and ANN, may be of help in QSAR interpretations.
Secondly, this results revealed good stability of stud-
ied structure-toxicity relationships, and confirm the fact
that toxicity depends, in a great part, on the structural
features of the insecticide.
References
1. Young AL (1987) Pesticides minimising the risk. In: Ragsdale
NN, Kuhr RJ (eds) ACS symposium series 336. American
chemical society, Washington, pp 10–40
2. Devillers J, Karcher W (1990) Environmental chemistry and
toxicology. In: Karcher W, Devillers J (eds) Kluwer academic
press publishers, Dordrecht, pp 181–195
3. Bazoui H, Zahouily M, Sebti S, Boulajaaj S, Zakarya D (2002)
Structure-toxicity relationships study of a series of organophos-
phorus insecticides. J Mol Mod DOI 10.1007/s00894-001-0054-9
4. Livingstone DJ (1989) Res Pest Sci 27:287–304
5. Nendza M (1991) Chemsphere 22:613–623
6. Vighi M, Garlanda MM, Calamari D (1991) Sci Total Environ
109/110:605–622
7. Büchel KH (1983) Chemistry of pesticides. John Wiley &
Sons, pp 48–124
8. Meister RT, Fitzgerald GT, Zilenziger A (1987) Farm chemi-
cal handbook. Meister Publishing Co, pp 42–208
9. Thomson TW (1972) Agricultural chemicals book I. Insecti-
cides, pp 160–266
10. Nys GG, Rekker RF (1974) Eur J Med Chem Ther 4:361–375
11. Pauling L (1960) The nature of chemical bond, 3rd edn. Cornell
University Press, Ithaca NY, p 85
12. Yokohama T, Taft RW, Kamlet M J (1976) J Am Chem Soc
98:3233–3235
13. Weast RC (1988) Handbook of chemistry and physics, 1st edn.
CRS, p E 318
14. Bondi A (1964) J Phys Chem 68:441–451
15. Zakarya D, Rayadh A, Samih M, Lakhlifi T (1994) Tetrahe-
dron Lett 35:2345–2348
16. Randic M (1984) J Chem Inf Comput Sci 24:164 –175
17. Tetko IV, Villa AEP, Livingstone DJ (1996) J Chem Inf Com-
put Sci 36:794–803
18. Wold S (1991) Quant Struct Act Relat 10:191–193
19. Data pro Qnet 2000 for Windows V2 K build 721 neural net-
work modelling. Vesta Services Inc, Winnetka, IL 60093, USA
20. Rumhelart DE, Hinton CE, Williams RJ (1986) Nature 323:
533–536
21. So S, Richards WG (1992) J Med Chem 35:3201–3207
22. Defernez M, Kemsley EK (1999) Analyst 124:1675–1681
23. Chastrette M, Zakarya D, Peyraud JF (1994) Eur J Med Chem
29:343–348
24. Cherquaoui D, Esseffar M, Villemin D, Cence JM, Chastrette
M, Zakarya D (1998) New J Chem 22:839–843
Table 4 Evaluating the impact of each descriptor in ANN and
MLR
Removed C% C% r2sr
2s
descriptor
MLR aANNaMLRbMLRbANNbANNb
S(R2) 34 30 0.3091 0.657 0.6708 0.444
HBA(R2) 23 28 0.6906 0.439 0.7242 0.407
RM(R1) 43 42 0.2116 0.810 0.3697 0.615
a the contribution (C%) of descriptor given by the second method
described in the text.
b Given by the first method described in the text.