Home Environmental Modeling & Computational Mass Spectrometry (EMCMS) This is the official website of the EMCMS group led by Dr. Saer Samanipour

New Preprint - Optimization of Molecular Fingerprints

(Molecular Fingerprints Optimization for Enhanced Predictive Modeling)[https://www.sciencedirect.com/science/article/pii/S0304389423007690]

The human exposome is represented by a vast number of chemicals, the fate and behavior of which remain largely unexplored. While modeling approaches are commonly employed to address this challenge, there is a recognized need for alternative molecular representations, such as molecular fingerprints. However, existing algorithms for computing molecular fingerprints may incorporate irrelevant or insufficient information for accurate activity prediction. In this study, we present an algorithm designed to optimize molecular fingerprints. This algorithm combines the relevant bits of information, aiming to enrich the final fingerprint for predicting specific behavioral properties. To achieve this, relevant variables (i.e. bits) for prediction were collected from six non-hashed fingerprints and fused into a master fingerprint. We used fish toxicity as a proof of concept. The RFR model was developed based on the master fingerprint. It demonstrated comparable results to conventional descriptor-based models with R$^2$ $\approx 0.9$ for the training set and R$^2$ $\approx 0.6$ for the test set. The molecular fingerprints have the advantage of being consistent and interpretable. Consequently, we were able to confirm the relevance of variables to the toxicity prediction. The final model outperformed each of the models based on individual fingerprints in the number of chemicals with prediction error, that fell in the range of +/- one standard deviation of residuals. The number of cases with the lower prediction error was on average four times higher for the master fingerprint-based model. The algorithm developed for optimizing molecular fingerprints is universal and can be applied to various case studies.