Main results of the CALEIDOS project

One of the main activities in CALEIDOS has been to check, on a number of properties, whether the results  given by QSAR methods are in agreement with the values submitted by the registrants. In the following we comment on the main results.

The evaluation below refers to the general assessment done on a population of substances. The user of in silico models should always pay attention to the results for the specific substance of interest. In other words, it may be that a model which usually gave good result is not predictive for a specific substance, or conversely that a model which often does not provide good result works for a specific compound. One of the main results of the CALEIDOS project was the demonstration that the results are better when supported by a suitable applicability domain assessment. This is also in line with REACH requirement (Annex XI).

Models for mutagenicity

Most of the models are for the Ames test.Ames test


85-90%The mutagenicity evaluations of the experimental test is about 85-90%. The best models gave results with accuracy in the same range, keeping into account the applicability domain.

CALEIDOS built up a consensus model which includes models based on statistical and mechanistic (rule-based) approaches. These models are freely available within VEGA. The applicability domain is calculated within the VEGA system.

The consensus model gave good results in the prediction of the chemical submitted by registrants for REACH. Results were also good focusing the study on the “new” chemicals, which were without information from previously published studies, or from studies used for the training and test sets of the VEGA models.

Other models which gave similar or somehow lower results were ToxTree and TEST. Commercial models gave results in the same range.

As a conclusion, many in silico models gave predictions with good agreement with experimental values submitted by registrants.

Models for carcinogenicity

A couple of models gave acceptable results:

ToxTreegave good accuracy values in the prediction of the substances submitted for REACH, while sensitivity was lower (in the JRC version, or in the VEGA version).

SARpy gave good results for CLP. SARpy is a new model which will be soon available within VEGA version, and is based on structural alerts.

Models for developmental toxicity

This endpoint is one of the most complex, since it is related to a series of toxicological processes and the number of chemicals with experimental values is relatively low.  As a consequence, the prediction of this endpoint is difficult. Very recently some new models appeared. The model developed by Wu et al. (Chem. Res. Toxicol. 2013) gave promising results. This model is now available VEGA. Another new model has been implemented with SARpy, and will be also available within VEGA. Commercial models for this endpoint, gave worse results.

Moldes for logP

TestingA number of models have been tested, including EPISuite, VEGA


DevelopingNew models have been developed by Mario Negri (CORAL and QSARpy).

PredictionThe prediction of the chemicals registered for REACH gave R2 lower than 0.6, with the exception of CORAL and QSARpy.

Taking into account Improvementthe applicability domain as within VEGA, R2 greatly improved.

Models for BCF

On the basis of the previous exercise done within ANTARES to identify models with good performance, EPISuite and VEGA were used.  The prediction of the chemicals registered for REACH gave R2 lower than 0.5. Taking into account the applicability domain as within VEGA, R2 greatly improved.

Models for fish acute toxicity

On the basis of the previous exercise done within ANTARES to identify models with good performance, ECOSAR, TerraQSAR, TEST and VEGA were used.  A new model has been developed using kNN.

low R2The prediction of the chemicals registered for REACH gave low R2.


Careful Taking into account the applicability domain as within VEGA, there was some improvement, but results were still not good. Results on some fish species (guppy) were better using the kNN model.

Users should be careful in the use of the QSAR models for fish, possibly run more than one model, and consider the applicability domain of the models.

The read across exercise

27 substances were used for this exercise, on three endpoints: mutagenicity (Ames test), BCF and fish acute toxicity.

Questionnaires About 200 questionnaires have been received from anonymous participants. We greatly thank them for their work.

BCF evaluationsThe BCF values for the 9 substances given by the participants show a quite good agreement for most of the substances.

Fish evaluationsIn the case fish acute toxicity the agreement in the values was lower.

Mutagenicity evaluationsFor mutagenicity the results were reproducible among participants using ToxRead, while this was not the case for participants using the OECD Toolbox. Participants using this program used different profiles.

False positiveFor mutagenicity we have received the experimental values from Health Japan (Dr Masa Honma) at the end of the exercise. Many false positive predictions have been obtained.

SimplicityAbout the evaluation of the software simplicity, participants found toxRead easy, while the OECD Toolbox was evaluated less simple.

Other CALEIDOS outcomes

New modelsWithin CALEIDOS a number of new models have been developed, also in collaboration with other EC funded projects (PROSIL and ToxBank).

These new models are available within VEGA and toxRead, which is a specific program for read across. Now about 30 models are available within VEGA. VEGA has about 2400 users, and toxRead more than 300.

We thank the collaboration with German UBA (also within the PROMETHEUS project), Health Canada, Health Japan, Ministero della Salute, US EPA, the EC projects PROSIL, EDESIA, and ToxBank.