Effect of adjusting the temperature parameter in the contrastive learning loss on the distribution of molecules in the latent space as visualized via the t-SNE algorithm. For clarity, only a random subset of 2000 natural products is shown. (A) Learning based purely on the cross-entropy objective function. (B-E) The temperature scalar (as in [112]) was varied between 0.02 and 0.5 as indicated. (Reducing t below led to numerical instabilities.) All drugs, fluorophores, and Recon2 metabolites are plotted, along with a randomly chosen 2000 natural products (as in [113]).

Effect of adjusting the temperature parameter in the contrastive learning loss on the distribution of molecules in the latent space as visualized via the t-SNE algorithm. For clarity, only a random subset of 2000 natural products is shown. (A) Learning based purely on the cross-entropy objective function. (B-E) The temperature scalar (as in [112]) was varied between 0.02 and 0.5 as indicated. (Reducing t below led to numerical instabilities.) All drugs, fluorophores, and Recon2 metabolites are plotted, along with a randomly chosen 2000 natural products (as in [113]).

Source publication
Article
Full-text available
The question of molecular similarity is core in cheminformatics and is usually assessed via a pairwise comparison based on vectors of properties or molecular fingerprints. We recently exploited variational autoencoders to embed 6M molecules in a chemical space, such that their (Euclidean) distance within the latent space so formed could be assessed...

Contexts in source publication

Context 1
... clock time for training an epoch on a single NVIDIA-V100-GPU system was ca. 30 s and 23 min for the two datasets illustrated in Figure 3 gives an overall picture using t-SNE [127,128] of the dataset used. Figure 3A recapitulates that published previously, using standard VAE-type ELBO/K-L divergence learning alone, while panels Figure 3B-E show the considerable effect of varying the temperature scalar (as in [112]). [112]) was varied between 0.02 and 0.5 as indicated. ...
Context 2
... clock time for training an epoch on a single NVIDIA-V100-GPU system was ca. 30 s and 23 min for the two datasets illustrated in Figure 3 gives an overall picture using t-SNE [127,128] of the dataset used. Figure 3A recapitulates that published previously, using standard VAE-type ELBO/K-L divergence learning alone, while panels Figure 3B-E show the considerable effect of varying the temperature scalar (as in [112]). [112]) was varied between 0.02 and 0.5 as indicated. ...
Context 3
... can clearly be seen from Figure 3B-E that as the temperature was increased in the series 0.02, 0.05, 0.1, and 0.5, the tightness and therefore the separability of the clusters progressively decreased. For instance, by mainly looking at the fluorophores (red colors) in the plotted latent space for each of the four temperatures, the separability as well as tightness of the cluster was best for the 0.02 and 0.05 temperatures. ...

Citations

... However, the vanilla Transformer has no explicit latent space, such as those found in RNN autoencoders, as used in [8]. There exist works in literature that construct a Transformer-based autoencoder, such as Frag-Net (a contrastive learning-based Transformer model) [12] and ReLSO (Regularized Latent Space Optimization) [13]. However, these architectures employ differing approaches, such as contrastive learning in FragNet, and property prediction, along with three regularization penalty terms in ReLSO. ...
... We require a latent space, along with a decoder, to construct a decision space for optimization, along with the ability to generate a molecule from the vectorized latent representation. Some models that fit this criteria include SMILES Transformer [22], FragNet [12], and MolMIM [15]. For our experimentation, we employ the Frag-Net architecture over SMILES Transformer, as it uses learnable compression methods and contrastive learning for latent space regularization. ...
... Using contrastive learning, we apply two latent Transformers for molecular generation, FragNet [12] and ReLSO [13], and determine the best model for latent molecular representation. Figures 2 and 3 illustrate the architectures of FragNet and ReLSO. ...
Article
Full-text available
Background Drug design is a challenging and important task that requires the generation of novel and effective molecules that can bind to specific protein targets. Artificial intelligence algorithms have recently showed promising potential to expedite the drug design process. However, existing methods adopt multi-objective approaches which limits the number of objectives. Results In this paper, we expand this thread of research from the many-objective perspective, by proposing a novel framework that integrates a latent Transformer-based model for molecular generation, with a drug design system that incorporates absorption, distribution, metabolism, excretion, and toxicity prediction, molecular docking, and many-objective metaheuristics. We compared the performance of two latent Transformer models (ReLSO and FragNet) on a molecular generation task and show that ReLSO outperforms FragNet in terms of reconstruction and latent space organization. We then explored six different many-objective metaheuristics based on evolutionary algorithms and particle swarm optimization on a drug design task involving potential drug candidates to human lysophosphatidic acid receptor 1, a cancer-related protein target. Conclusion We show that multi-objective evolutionary algorithm based on dominance and decomposition performs the best in terms of finding molecules that satisfy many objectives, such as high binding affinity and low toxicity, and high drug-likeness. Our framework demonstrates the potential of combining Transformers and many-objective computational intelligence for drug design.
... Transformers models [47][48][49] have shown to be greatly beneficial for molecular generative tasks as well. Yang et al. [48] proposed a Transformer-encoder-based generative model with transfer learning and reinforcement learning for designing new drugs with desirable activity against the protein target BRAF. ...
... Yang et al. [48] proposed a Transformer-encoder-based generative model with transfer learning and reinforcement learning for designing new drugs with desirable activity against the protein target BRAF. Shrivastava and Kell [47] established a novel and disentangled latent space by coupling Transformers with contrastive learning, which allows similar molecules to cluster together in an effective and interpretable way. ReLSO [49] combines the encoding ability of the Transformer with an autoencoder-type bottleneck that produces information-rich, interpretable and low dimensional latent representations and jointly generates protein sequences as well as predict fitness from latent representations. ...
... (1) Our experiments were conducted using deep generative models on the DEL framework. Even though our objective is to explore the potential of modern AI models for multi-target drug design, rather than compete with all the state-of-the-art AI models, it would be beneficial to expand our work by integrating the most recent models (such as equivariant neural networks [9,[31][32][33], diffusion models [42][43][44][45][46], and Transformers [47][48][49] for multi-objective and multi-target drug design. Emphasizing on the latent representation space, it is acknowledged that since the vanilla Transformer does not have an obvious latent space at the bottleneck such as that produced by a VAE [47], it cannot be directly integrated in the DEL framework. ...
Preprint
Full-text available
Background: Drug discovery is a time-consuming and expensive process. Artificial intelligence (AI) methodologies have been adopted to cut costs and speed up the drug development process, serving as promising in silico approaches to efficiently design novel drug candidates targeting various health conditions. Most existing AI-driven drug discovery studies follow a single-target approach which focuses on identifying compounds that bind a single target (i.e., one-drug-one-target approach). Polypharmacology is a relatively new concept that takes a systematic approach to search for a compound (or a combination of compounds) that can bind two or more carefully selected protein biomarkers simultaneously to synergistically treat the disease. Recent studies have demonstrated that multi-target drugs offer superior therapeutic potentials compared to single-target drugs. However, it is intuitively thought that searching for multi-target drugs is more challenging than finding single-target drugs. At present, it is unclear how AI approaches perform in designing multi-target drugs. Results: In this paper, we comprehensively investigated the performance of multi-objective AI approaches for multi-target drug design. Conclusion: Our findings are quite counterintuitive demonstrating that, in fact, AI approaches for multi-target drug design are able to efficiently generate more high-quality novel compounds than the single-target approaches while satisfying a number of constraints.
... However, the vanilla Transformer has no explicit latent space, such as those found in RNN autoencoders, as used in [8]. There exist works in literature that construct a Transformer-based autoencoder, such as FragNet (a contrastive learning-based Transformer model) [12] and ReLSO (Regularized Latent Space Optimization) [13]. However, these architectures employ differing approaches, such as contrastive learning in FragNet, and property prediction, along with three regularization penalty terms in ReLSO. ...
... We require a latent space, along with a decoder, to construct a decision space for optimization, along with the ability to generate a molecule from the vectorized latent representation. Some models that fit this criteria include SMILES Transformer [22], FragNet [12], and MolMIM [15]. For our experimentation, we employ the Frag-Net architecture over SMILES Transformer, as it uses learnable compression methods and contrastive learning for latent space regularization. ...
... Using contrastive learning, we apply two latent Transformers for molecular generation, FragNet [12] and ReLSO [13], and determine the best model for latent molecular representation. Figures 2 and 3 illustrate the architectures of FragNet and ReLSO. ...
Preprint
Full-text available
Background: Drug design is a challenging and important task that requires the generation of novel and effective molecules that can bind to specific protein targets. Artificial intelligence (AI) algorithms have recently showed promising potential to expedite the drug design process. However, existing methods adopt multi-objective approaches which limits the number of objectives. Results: In this paper, we expand this thread of research from the many-objective perspective, by proposing a novel framework that integrates a latent Transformer-based model for molecular generation, with a drug design system that incorporates absorption, distribution, metabolism, excretion, and toxicity (ADMET) prediction, molecular docking, and many-objective metaheuristics. We compared the performance of two latent Transformer models (ReLSO and FragNet) on a molecular generation task and show that ReLSO outperforms FragNet in terms of reconstruction and latent space organization. We then explored six different many-objective metaheuristics based on evolutionary algorithms and particle swarm optimization on a drug design task involving potential drug candidates to human lysophosphatidic acid receptor 1 (LPA1), a cancer-related protein target. Conclusion: We show that multi-objective evolutionary algorithm based on dominance and decomposition (MOEA/DD) performs the best in terms of finding molecules that satisfy many objectives, such as high binding affinity and low toxicity, and high drug-likeness. Our framework demonstrates the potential of combining Transformers and many-objective computational intelligence for cancer drug design.
... Examples of autoencoder-based methods include ChemVAE (Gómez-Bombarelli et al. 2018) and AllSMILES VAE (Alperstein, Cherkasov, and Rolfe 2019). Transformerbased models include ChemBERTa (Chithrananda, Grand, and Ramsundar 2020), SMILESTransformer (Honda, Shi, and Ueda 2019), and FragNet (Shrivastava and Kell 2021). Less common, however, is the composed architecture of a transformer autoencoder. ...
... For molecular data, both SMILES and graph representations have been explored in the context of contrastive learning. The FragNet model proposed by Shrivastava and Kell (2021) utilized the normalized temperature-scaled cross entropy (NT-Xent) (Sohn 2016) loss to map enumerated SMILES of identical molecules nearby in the latent space. Insofar as graphs, Wang et al. (2021) similarly used the NT-Xent loss to maximize the agreement between pairs of augmented graphs ("views") describing the same molecule; here, each view (i.e. ...
Article
In deep learning for drug discovery, molecular representations are often based on sequences, known as SMILES, which allow for straightforward implementation of natural language processing methodologies, one being the sequence-to-sequence autoencoder. However, we observe that training an autoencoder solely on SMILES is insufficient to learn molecular representations that are semantically meaningful, where semantics are specified by the structural (graph-to-graph) similarities between molecules. We demonstrate by example that SMILES-based autoencoders may map structurally similar molecules to distant codes, resulting in an incoherent latent space that does not necessarily respect the semantic similarities between molecules. To address this shortcoming we propose Semantically-Aware Latent Space Autoencoder (SALSA) for molecular representations: a SMILES-based transformer autoencoder modified with a contrastive task aimed at learning graph-to-graph similarities between molecules. To accomplish this, we develop a novel dataset comprised of sets of structurally similar molecules and opt for a supervised contrastive loss that is able to incorporate full sets of positive samples. We evaluate semantic awareness of SALSA representations by comparing to its ablated counterparts, and show empirically that SALSA learns representations that maintain 1) structural awareness, 2) physicochemical awareness, 3) biological awareness, and 4) semantic continuity.
... Attention mechanisms [47] have been initially introduced for natural machine translation to allow models focus on the most important input data. In deep clustering, it has been used for enhancing the embedded representation in speech separation [48], but also combined with autoencoders for handwritten recognition [49] and molecular similarity [50]. The requirement of a two-step learning process in deep clustering algorithms derives from the different nature of the network and clustering losses, which hinders their integration. ...
Article
Full-text available
Deep learning has been recently used to extract the relevant features for representing input data also in the unsupervised setting. However, state-of-the-art techniques focus mostly on algorithmic efficiency and accuracy rather than mimicking the input manifold. On the contrary, competitive learning is a powerful tool for replicating the input distribution topology. It is cognitive/biologically inspired as it is founded on Hebbian learning, a neuropsychological theory claiming that neurons can increase their specialization by competing for the right to respond to/represent a subset of the input data. This paper introduces a novel perspective by combining these two techniques: unsupervised gradient-based and competitive learning. The theory is based on the intuition that neural networks can learn topological structures by working directly on the transpose of the input matrix. At this purpose, the vanilla competitive layer and its dual are presented. The former is representative of a standard competitive layer for deep clustering, while the latter is trained on the transposed matrix. The equivalence of the layers is extensively proven both theoretically and experimentally. The dual competitive layer has better properties. Unlike the vanilla layer, it directly outputs the prototypes of the data inputs, while still allowing learning by backpropagation. More importantly, this paper proves theoretically that the dual layer is better suited for handling high-dimensional data (e.g., for biological applications), because the estimation of the weights is driven by a constraining subspace which does not depend on the input dimensionality, but only on the dataset cardinality. This paper has introduced a novel approach for unsupervised gradient-based competitive learning. This approach is very promising both in the case of small datasets of high-dimensional data and for better exploiting the advantages of a deep architecture: the dual layer perfectly integrates with the deep layers. A theoretical justification is also given by using the analysis of the gradient flow for both vanilla and dual layers.
... This still allows considerable discrimination; however, even a normalized vector of just 20 in which an individual may be in the upper or lower half admits 2 20 (approximately 1 million) possibilities. 109,117 In the case of microclots, we consider that (distributions in) the many 1,000s of individual metabolites [118][119][120] and proteins 121 in serum or plasma can potentially each affect the size and shape of the microclots. This "harvesting" of all the molecules to which the fibrinogen is exposed, and which then determines how it polymerizes, effectively concentrates the vast numbers of metabolites, proteins, and even transcripts into a smaller number of dimensions; the microclots essentially act as a surrogate for the metabolome, proteome, and transcriptome present in the plasma at the time of clotting. ...
Article
Full-text available
Microscopy imaging has enabled us to establish the presence of fibrin(ogen) amyloid (fibrinaloid) microclots in a range of chronic, inflammatory diseases. Microclots may also be induced by a variety of purified substances, often at very low concentrations. These molecules include bacterial inflammagens, serum amyloid A, and the S1 spike protein of severe acute respiratory syndrome coronavirus 2. Here, we explore which of the properties of these microclots might be used to contribute to differential clinical diagnoses and prognoses of the various diseases with which they may be associated. Such properties include distributions in their size and number before and after the addition of exogenous thrombin, their spectral properties, the diameter of the fibers of which they are made, their resistance to proteolysis by various proteases, their cross-seeding ability, and the concentration dependence of their ability to bind small molecules including fluorogenic amyloid stains. Measuring these microclot parameters, together with microscopy imaging itself, along with methodologies like proteomics and imaging flow cytometry, as well as more conventional assays such as those for cytokines, might open up the possibility of a much finer use of these microclot properties in generative methods for a future where personalized medicine will be standard procedures in all clotting pathology disease diagnoses.
... ChemMaps [NMF17] uses PCA for visualizing correlation between compound datasets. FragNet [SK21] computes molecular similarity among huge databases and visualizes the distribution of molecules by applying t-SNE. Naveja et al. [NMF19] introduced constellation plots by identifying groups of compounds using t-SNE for interpreting structure-activity relationships in chemical space. ...
Article
Full-text available
Exploratory analysis of the chemical space is an important task in the field of cheminformatics. For example, in drug discovery research, chemists investigate sets of thousands of chemical compounds in order to identify novel yet structurally similar synthetic compounds to replace natural products. Manually exploring the chemical space inhabited by all possible molecules and chemical compounds is impractical, and therefore presents a challenge. To fill this gap, we present ChemoGraph, a novel visual analytics technique for interactively exploring related chemicals. In ChemoGraph, we formalize a chemical space as a hypergraph and apply novel machine learning models to compute related chemical compounds. It uses a database to find related compounds from a known space and a machine learning model to generate new ones, which helps enlarge the known space. Moreover, ChemoGraph highlights interactive features that support users in viewing, comparing, and organizing computationally identified related chemicals. With a drug discovery usage scenario and initial expert feedback from a case study, we demonstrate the usefulness of ChemoGraph.
... The MegaMolBART model does not fulfil invariance to randomized SMILES inputs even though SMILES randomization was used as data augmentation during training [52]. Contrastive training approaches have been proposed [58] to enforce SMILES invariance in the model, but this is outside of the scope of this study due to computational efforts required to pretraining the model from scratch. ...
Preprint
Full-text available
Explainability techniques are crucial in gaining insights into the reasons behind the predictions of deep learning models, which have not yet been applied to chemical language models. We propose an explainable AI technique that attributes the importance of individual atoms towards the predictions made by these models. Our method backpropagates the relevance information towards the chemical input string and visualizes the importance of individual atoms. We focus on self-attention Transformers operating on molecular string representations and leverage a pretrained encoder for finetuning. We showcase the method by predicting and visualizing solubility in water and organic solvents. We achieve competitive model performance while obtaining interpretable predictions, which we use to inspect the pretrained model.
... 5 A latent space representation of organic molecules was constructed with a transformer model and contrastive learning approach. 22 A transformer model, pre-trained on ChEMBL to generate valid SMILES, was trained on molecules active against a given protein target by transfer learning and reinforcement learning, and used to generate novel molecules. 8 Note that in all these papers generated molecules were validated only based on their easily computable properties, such as logP and QED (typically, by comparisons of distributions of these properties with those for the input or reference sets); the most advanced comparison, as far as we know, used docking scores and similarity to known bioactive molecules. ...
Preprint
Full-text available
Meaningful exploration of the chemical space of druglike molecules in drug design is a highly challenging task due to a combinatorial explosion of possible modifications of molecules. In this work, we address this problem with transformer models, a type of machine learning (ML) model, with recent demonstrated success in applications to machine translation and other tasks. By training transformer models on pairs of similar bioactive molecules from the public ChEMBL dataset, we enable them to learn medicinal-chemistry-meaningful, context-dependent transformations of molecules, including those absent from the training set. Most generated molecules are highly plausible and follow similar distributions of simple properties (molecular weight, polarity, hydrogen bond donor and acceptor numbers) as the training dataset. By retrospective analysis of the performance of transformer models on ChEMBL subsets of ligands binding to COX2, DRD2, or HERG protein targets, we demonstrate that the models can generate structures identical or highly similar to highly active ligands, despite the models having not seen any ligands active against the corresponding protein target during training. Thus, our work demonstrates that transformer models, originally developed to translate texts from one natural language to another, can be easily and quickly extended to “translations” from known molecules active against a given protein target to novel molecules active against the same target, and thereby contribute to hit expansion in drug design.
... Compared with traditional representation methods, automatic molecular representation learning models perform better on most drug discovery tasks [23][24][25] . With the rise of unsupervised learning in natural language processing 26,27 , recent approaches that incorporate unsupervised learning with one-dimensional sequential strings, such as the simplified molecular-input line-entry system (SMILES) [28][29][30][31] and International Chemical Identifier (InChI) [32][33][34] , or two-dimensional (2D) graphs [35][36][37][38][39] , have also been developed for various computational drug discovery tasks. Yet, their accuracy in extracting informative vectors for description of molecular identities and biological characteristics of the molecules is limited. ...
Article
Full-text available
The clinical efficacy and safety of a drug is determined by its molecular properties and targets in humans. However, proteome-wide evaluation of all compounds in humans, or even animal models, is challenging. In this study, we present an unsupervised pretraining deep learning framework, named ImageMol, pretrained on 10 million unlabelled drug-like, bioactive molecules, to predict molecular targets of candidate compounds. The ImageMol framework is designed to pretrain chemical representations from unlabelled molecular images on the basis of local and global structural characteristics of molecules from pixels. We demonstrate high performance of ImageMol in evaluation of molecular properties (that is, the drug’s metabolism, brain penetration and toxicity) and molecular target profiles (that is, beta-secretase enzyme and kinases) across 51 benchmark datasets. ImageMol shows high accuracy in identifying anti-SARS-CoV-2 molecules across 13 high-throughput experimental datasets from the National Center for Advancing Translational Sciences. Via ImageMol, we identified candidate clinical 3C-like protease inhibitors for potential treatment of COVID-19.