Predicting blood–brain barrier permeability of molecules with a large language model and machine learning

Eddie T.C. Huang, Jai Sing Yang, Ken Y.K. Liao, Warren C.W. Tseng, C. K. Lee, Michelle Gill, Colin Compas, Simon See, Fuu Jen Tsai*

*Corresponding author for this work

Research output: Contribution to journalJournal Article peer-review

1 Scopus citations

Abstract

Predicting the blood–brain barrier (BBB) permeability of small-molecule compounds using a novel artificial intelligence platform is necessary for drug discovery. Machine learning and a large language model on artificial intelligence (AI) tools improve the accuracy and shorten the time for new drug development. The primary goal of this research is to develop artificial intelligence (AI) computing models and novel deep learning architectures capable of predicting whether molecules can permeate the human blood–brain barrier (BBB). The in silico (computational) and in vitro (experimental) results were validated by the Natural Products Research Laboratories (NPRL) at China Medical University Hospital (CMUH). The transformer-based MegaMolBART was used as the simplified molecular input line entry system (SMILES) encoder with an XGBoost classifier as an in silico method to check if a molecule could cross through the BBB. We used Morgan or Circular fingerprints to apply the Morgan algorithm to a set of atomic invariants as a baseline encoder also with an XGBoost classifier to compare the results. BBB permeability was assessed in vitro using three-dimensional (3D) human BBB spheroids (human brain microvascular endothelial cells, brain vascular pericytes, and astrocytes). Using multiple BBB databases, the results of the final in silico transformer and XGBoost model achieved an area under the receiver operating characteristic curve of 0.88 on the held-out test dataset. Temozolomide (TMZ) and 21 randomly selected BBB permeable compounds (Pred scores = 1, indicating BBB-permeable) from the NPRL penetrated human BBB spheroid cells. No evidence suggests that ferulic acid or five BBB-impermeable compounds (Pred scores < 1.29423E−05, which designate compounds that pass through the human BBB) can pass through the spheroid cells of the BBB. Our validation of in vitro experiments indicated that the in silico prediction of small-molecule permeation in the BBB model is accurate. Transformer-based models like MegaMolBART, leveraging the SMILES representations of molecules, show great promise for applications in new drug discovery. These models have the potential to accelerate the development of novel targeted treatments for disorders of the central nervous system.

Original languageEnglish
Article number15844
Pages (from-to)15844
JournalScientific Reports
Volume14
Issue number1
DOIs
StatePublished - 09 07 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Keywords

  • Artificial intelligence (AI)
  • Blood–brain barrier (BBB) permeability
  • Machine learning
  • Natural Products Research Laboratories (NPRL)
  • Drug Discovery/methods
  • Endothelial Cells/metabolism
  • Humans
  • Computer Simulation
  • Blood-Brain Barrier/metabolism
  • Permeability
  • Machine Learning

Fingerprint

Dive into the research topics of 'Predicting blood–brain barrier permeability of molecules with a large language model and machine learning'. Together they form a unique fingerprint.

Cite this