Chemical Shift Prediction using Message Passing Neural Networks

Cobas, Carlos, Iglesias, Isaac, Kemsley, E. Kate ORCID: https://orcid.org/0000-0003-0669-3883, Lachenmann, Marcel, Ponte, Santi, Tonge, Nicola and Williamson, David (2024) Chemical Shift Prediction using Message Passing Neural Networks. In: SMASH Small Molecule Conference, 2024-09-15 - 2024-09-18, Hotel Champlain.

[thumbnail of SMASH_MPNN_Poster]
Preview
PDF (SMASH_MPNN_Poster) - Published Version
Available under License Creative Commons Attribution No Derivatives.

Download (5MB) | Preview

Abstract

We are using advanced artificial intelligence approaches - artificial neural networks, ensembles, and deep learning - to enhance chemical shift prediction, spectral assignment, and automated structural verification in NMR spectroscopy, collectively known as the ‘forward’ problems. Message Passing Neural Networks (MPNNs) have emerged as a promising architecture for this purpose. These networks naturally handle molecular structures as graphs, with atoms as nodes and bonds as edges. The key advantage of MPNNs is their simultaneous use of node feature information and their connectivity as described by the graph adjacency matrix. Our ongoing work involves training MPNNs on large (>10,000s) collections of molecular structures, fully annotated with experimentally observed proton (1H) and carbon (13C) chemical shifts. The stochastic nature of the approach allows for improved performance by pooling predictions from ensembles of trained MPNNs for each target nucleus. This is conveniently executed in parallel on the multiple GPU nodes of an HPC facility. Initial results for both nuclei have yielded prediction errors that compare favourably with those reported in the literature. For example, from application to a large test set (n ~ 28,000 nodes) of previously unseen structures, the median absolute error in prediction is ~1.2 ppm for 13C. For 1H, the median absolute error is 0.09 ppm. The error distributions are fat-tailed compared to the normal distribution but are smooth, symmetric, and can be well-represented by a Gaussian kernel density method. This suggests a data-driven, probabilistic route to structural assignment and verification. Key areas for further research include: investigating the balance of node subgraph representations in the training set and their impact on prediction performance; exploring alternative graph-theoretical representations of molecular structures to better characterize molecular diversity; and extending the capabilities of the model beyond diastereotopic protons to address stereoisomerism more widely.

Item Type: Conference or Workshop Item (Poster)
Faculty \ School: Faculty of Science > School of Chemistry, Pharmacy and Pharmacology
Depositing User: LivePure Connector
Date Deposited: 16 Aug 2024 11:30
Last Modified: 18 Sep 2024 01:38
URI: https://ueaeprints.uea.ac.uk/id/eprint/96264
DOI:

Downloads

Downloads per month over past year

Actions (login required)

View Item View Item