Molecular optimization by capturing chemist’s intuition using deep neural networks

Jiazhen He; Huifang You; Emil Sandström; Eva Nittinger; Esben Jannik Bjerrum; Christian Tyrchan; Werngard Czechtizky; Ola Engkvist

doi:10.1186/s13321-021-00497-0

Molecular Optimization by Capturing Chemist’s Intuition Using Deep Neural Networks

10.26434/chemrxiv.12941744.v1 ◽

2020 ◽

Author(s):

Jiazhen He ◽

huifang you ◽

Emil Sandström ◽

eva nittinger ◽

Esben Jannik Bjerrum ◽

...

Keyword(s):

Machine Translation ◽

Language Processing ◽

Deep Neural Networks ◽

Chemical Transformations ◽

Proof Of Concept ◽

Matched Molecular Pairs ◽

Main Challenge ◽

Property Changes ◽

Transformer Model ◽

The Given

A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist's intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: <i>logD</i>, <i>solubility</i>, and <i>clearance</i>, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.

Download Full-text

Molecular Optimization by Capturing Chemist's Intuition Using Deep Neural Networks

10.21203/rs.3.rs-101137/v1 ◽

2020 ◽

Author(s):

Jiazhen He ◽

Huifang You ◽

Emil Sandström ◽

Eva Nittinger ◽

Esben Bjerrum ◽

...

Keyword(s):

Machine Translation ◽

Language Processing ◽

Deep Neural Networks ◽

Chemical Transformations ◽

Proof Of Concept ◽

Matched Molecular Pairs ◽

Main Challenge ◽

Property Changes ◽

Transformer Model ◽

The Given

Abstract A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist's intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: logD, solubility, and clearance, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.

Download Full-text

Molecular Optimization by Capturing Chemist’s Intuition Using Deep Neural Networks

10.26434/chemrxiv.12941744.v2 ◽

2020 ◽

Author(s):

Jiazhen He ◽

huifang you ◽

Emil Sandström ◽

eva nittinger ◽

Esben Jannik Bjerrum ◽

...

Keyword(s):

Machine Translation ◽

Language Processing ◽

Deep Neural Networks ◽

Chemical Transformations ◽

Proof Of Concept ◽

Matched Molecular Pairs ◽

Main Challenge ◽

Property Changes ◽

Transformer Model ◽

The Given

A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist's intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: <i>logD</i>, <i>solubility</i>, and <i>clearance</i>, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.

Download Full-text

Molecular Optimization by Capturing Chemist’s Intuition Using Deep Neural Networks

10.26434/chemrxiv.12941744 ◽

2020 ◽

Author(s):

Jiazhen He ◽

huifang you ◽

Emil Sandström ◽

eva nittinger ◽

Esben Jannik Bjerrum ◽

...

Keyword(s):

Machine Translation ◽

Language Processing ◽

Deep Neural Networks ◽

Chemical Transformations ◽

Proof Of Concept ◽

Matched Molecular Pairs ◽

Main Challenge ◽

Property Changes ◽

Transformer Model ◽

The Given

A main challenge in drug discovery is finding molecules with a desirable balance of multiple properties. Here, we focus on the task of molecular optimization, where the goal is to optimize a given starting molecule towards desirable properties. This task can be framed as a machine translation problem in natural language processing, where in our case, a molecule is translated into a molecule with optimized properties based on the SMILES representation. Typically, chemists would use their intuition to suggest chemical transformations for the starting molecule being optimized. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. We seek to capture the chemist's intuition from matched molecular pairs using machine translation models. Specifically, the sequence-to-sequence model with attention mechanism, and the Transformer model are employed to generate molecules with desirable properties. As a proof of concept, three ADMET properties are optimized simultaneously: <i>logD</i>, <i>solubility</i>, and <i>clearance</i>, which are important properties of a drug. Since desirable properties often vary from project to project, the user-specified desirable property changes are incorporated into the input as an additional condition together with the starting molecules being optimized. Thus, the models can be guided to generate molecules satisfying the desirable properties. Additionally, we compare the two machine translation models based on the SMILES representation, with a graph-to-graph translation model HierG2G, which has shown the state-of-the-art performance in molecular optimization. Our results show that the Transformer can generate more molecules with desirable properties by making small modifications to the given starting molecules, which can be intuitive to chemists. A further enrichment of diverse molecules can be achieved by using an ensemble of models.

Download Full-text

Transformer Neural Network for Structure Constrained Molecular Optimization

10.26434/chemrxiv.14416133 ◽

2021 ◽

Author(s):

Jiazhen He ◽

Felix Mattsson ◽

Marcus Forsberg ◽

Esben Jannik Bjerrum ◽

Ola Engkvist ◽

...

Keyword(s):

Neural Network ◽

Drug Discovery ◽

Antiviral Drug ◽

The Other ◽

Chemical Transformations ◽

Drug Candidates ◽

The Core ◽

Matched Molecular Pairs ◽

Main Challenge ◽

Transformer Model

Finding molecules with a desirable balance of multiple properties is a main challenge in drug discovery. Here, we focus on the task of molecular optimization, where a starting molecule with promising properties needs to be further optimized towards the desirable properties. Typically, chemists would apply chemical transformations to the starting molecule based on their intuition. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. In particular, a chemist would be interested in keeping one part of the starting molecule (core) constant, while substituting the other part (R-group), to optimize the starting molecule towards desirable properties. Motivated by this, we train a Transformer model, Transformer-R, to generate R-groups given the starting molecule (with its core and R-group specified) and the specified desirable properties. The generated R-groups will be attached to the core to form the final molecules, which are guaranteed to keep the core of interest and are expected to satisfy the desirable properties in the input. Our model could accelerate the process of optimizing antiviral drug candidates in terms of various properties of interest, e.g. pharmacokinetics.

Download Full-text

Transformer Neural Network for Structure Constrained Molecular Optimization

10.26434/chemrxiv.14416133.v1 ◽

2021 ◽

Author(s):

Jiazhen He ◽

Felix Mattsson ◽

Marcus Forsberg ◽

Esben Jannik Bjerrum ◽

Ola Engkvist ◽

...

Keyword(s):

Neural Network ◽

Drug Discovery ◽

Antiviral Drug ◽

The Other ◽

Chemical Transformations ◽

Drug Candidates ◽

The Core ◽

Matched Molecular Pairs ◽

Main Challenge ◽

Transformer Model

Finding molecules with a desirable balance of multiple properties is a main challenge in drug discovery. Here, we focus on the task of molecular optimization, where a starting molecule with promising properties needs to be further optimized towards the desirable properties. Typically, chemists would apply chemical transformations to the starting molecule based on their intuition. A widely used strategy is the concept of matched molecular pairs where two molecules differ by a single transformation. In particular, a chemist would be interested in keeping one part of the starting molecule (core) constant, while substituting the other part (R-group), to optimize the starting molecule towards desirable properties. Motivated by this, we train a Transformer model, Transformer-R, to generate R-groups given the starting molecule (with its core and R-group specified) and the specified desirable properties. The generated R-groups will be attached to the core to form the final molecules, which are guaranteed to keep the core of interest and are expected to satisfy the desirable properties in the input. Our model could accelerate the process of optimizing antiviral drug candidates in terms of various properties of interest, e.g. pharmacokinetics.

Download Full-text

Context-Aware Neural Machine Translation for Korean Honorific Expressions

Electronics ◽

10.3390/electronics10131589 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1589

Author(s):

Yongkeun Hwang ◽

Yanghoon Kim ◽

Kyomin Jung

Keyword(s):

Machine Translation ◽

Deep Neural Networks ◽

Contextual Information ◽

Context Aware ◽

Neural Machine Translation ◽

Translation Quality ◽

Sentence Level ◽

Proposed Model ◽

The Given ◽

The Relationship

Neural machine translation (NMT) is one of the text generation tasks which has achieved significant improvement with the rise of deep neural networks. However, language-specific problems such as handling the translation of honorifics received little attention. In this paper, we propose a context-aware NMT to promote translation improvements of Korean honorifics. By exploiting the information such as the relationship between speakers from the surrounding sentences, our proposed model effectively manages the use of honorific expressions. Specifically, we utilize a novel encoder architecture that can represent the contextual information of the given input sentences. Furthermore, a context-aware post-editing (CAPE) technique is adopted to refine a set of inconsistent sentence-level honorific translations. To demonstrate the efficacy of the proposed method, honorific-labeled test data is required. Thus, we also design a heuristic that labels Korean sentences to distinguish between honorific and non-honorific styles. Experimental results show that our proposed method outperforms sentence-level NMT baselines both in overall translation quality and honorific translations.

Download Full-text

Molecular Transformer for Chemical Reaction Prediction and Uncertainty Estimation

10.26434/chemrxiv.7297379.v1 ◽

2018 ◽

Cited By ~ 3

Author(s):

Philippe Schwaller ◽

Teodoro Laino ◽

Theophile Gaudin ◽

Peter Bolgar ◽

Costas Bekas ◽

...

Keyword(s):

Machine Translation ◽

Organic Synthesis ◽

Chemical Reaction ◽

Medicinal Chemistry ◽

Uncertainty Estimation ◽

Forward Problem ◽

Chemical Transformations ◽

Reaction Prediction ◽

Transformer Model ◽

Uncertainty Score

<div><div><div><p>Organic synthesis is one of the key stumbling blocks in medicinal chemistry. A necessary yet unsolved step in planning synthesis is solving the forward problem: given reactants and reagents, predict the products. Similar to other works, we treat reaction prediction as a machine translation problem between SMILES strings of reactants-reagents and the products. We show that a multi-head attention Molecular Transformer model outperforms all algorithms in the literature, achieving a top-1 accuracy above 90% on a common benchmark dataset. Our algorithm requires no handcrafted rules, and accurately predicts subtle chemical transformations. Crucially, our model can accurately estimate its own uncertainty, with an uncertainty score that is 89% accurate in terms of classifying whether a prediction is correct. Furthermore, we show that the model is able to handle inputs without reactant-reagent split and including stereochemistry, which makes our method universally applicable.</p></div></div></div>

Download Full-text

BioNMT: A Biomedical Neural Machine Translation System

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2020.6.3988 ◽

2020 ◽

Vol 15 (6) ◽

Author(s):

Hongtao Liu ◽

Yanchun Liang ◽

Liupu Wang ◽

Xiaoyue Feng ◽

Renchu Guan

Keyword(s):

Foreign Language ◽

Machine Translation ◽

Deep Neural Networks ◽

Translation System ◽

Neural Machine Translation ◽

Translation Model ◽

Machine Translation System ◽

Biomedical Field ◽

Biomedical Texts ◽

Transformer Model

To solve the problem of translation of professional vocabulary in the biomedical field and help biological researchers to translate and understand foreign language documents, we proposed a semantic disambiguation model and external dictionaries to build a novel translation model for biomedical texts based on the transformer model. The proposed biomedical neural machine translation system (BioNMT) adopts the sequence-to-sequence translation framework, which is based on deep neural networks. To construct the specialized vocabulary of biology and medicine, a hybrid corpus was obtained using a crawler system extracting from universal corpus and biomedical corpus. The experimental results showed that BioNMT which composed by professional biological dictionary and Transformer model increased the bilingual evaluation understudy (BLEU) value by 14.14%, and the perplexity was reduced by 40%. And compared with Google Translation System and Baidu Translation System, BioNMT achieved better translations about paragraphs and resolve the ambiguity of biomedical name entities to greatly improved.

Download Full-text

Advanced Tamil POS Tagger for Language Learners

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j8886.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 741-745

Keyword(s):

Machine Translation ◽

Language Learners ◽

Language Processing ◽

Research Work ◽

Important Work ◽

Parts Of Speech ◽

Pos Tagging ◽

Pos Tagger ◽

The Given ◽

Speech Identification

In the emerging technology Natural Language Processing, machine translation is one of the important roles. The machine translation is translation of text in one language to another with the implementation of Machines. The research topic POS Tagging is one of the most basic and important work in Machine translation. POS tagging simply, we say that to assign the Parts of speech identification for each word in the given sentence. In my research work, I tried the POS Tagging for Tamil language. There may be some numerous research were done in the same topic. I have viewed this in different and very detailed implementation. Most of the detailed grammatical identifications are made for this proposed research. It is very useful to know the basic grammar in Tamil language

Download Full-text