Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists

Dian Jeng Li, Yu Chen Kao, Shih Jen Tsai, Ya Mei Bai, Ta Chuan Yeh, Che Sheng Chu, Chih Wei Hsu, Szu Wei Cheng, Tien Wei Hsu*, Chih Sung Liang*, Kuan Pin Su

*Corresponding author for this work

Research output: Contribution to journalJournal Article peer-review

3 Scopus citations

Abstract

AIM: Large language models (LLMs) have been suggested to play a role in medical education and medical practice. However, the potential of their application in the psychiatric domain has not been well-studied.

METHOD: In the first step, we compared the performance of ChatGPT GPT-4, Bard, and Llama-2 in the 2022 Taiwan Psychiatric Licensing Examination conducted in traditional Mandarin. In the second step, we compared the scores of these three LLMs with those of 24 experienced psychiatrists in 10 advanced clinical scenario questions designed for psychiatric differential diagnosis.

RESULT: Only GPT-4 passed the 2022 Taiwan Psychiatric Licensing Examination (scoring 69 and ≥ 60 being considered a passing grade), while Bard scored 36 and Llama-2 scored 25. GPT-4 outperformed Bard and Llama-2, especially in the areas of 'Pathophysiology & Epidemiology' (χ 2  = 22.4, P < 0.001) and 'Psychopharmacology & Other therapies' (χ 2  = 15.8, P < 0.001). In the differential diagnosis, the mean score of the 24 experienced psychiatrists (mean 6.1, standard deviation 1.9) was higher than that of GPT-4 (5), Bard (3), and Llama-2 (1).

CONCLUSION: Compared to Bard and Llama-2, GPT-4 demonstrated superior abilities in identifying psychiatric symptoms and making clinical judgments. Besides, GPT-4's ability for differential diagnosis closely approached that of the experienced psychiatrists. GPT-4 revealed a promising potential as a valuable tool in psychiatric practice among the three LLMs.

Original languageEnglish
Pages (from-to)347-352
Number of pages6
JournalPsychiatry and Clinical Neurosciences
Volume78
Issue number6
Early online date26 02 2024
DOIs
StatePublished - 06 2024

Bibliographical note

© 2024 The Authors. Psychiatry and Clinical Neurosciences © 2024 Japanese Society of Psychiatry and Neurology.

Keywords

  • ChatGPT
  • Taiwanese psychiatric licensing examination
  • chatbot
  • differential diagnosis in psychiatry
  • psychiatric application
  • Diagnosis, Differential
  • Mental Disorders/diagnosis
  • Educational Measurement/standards
  • Humans
  • Psychiatrists
  • Taiwan
  • Adult
  • Psychiatry

Fingerprint

Dive into the research topics of 'Comparing the performance of ChatGPT GPT-4, Bard, and Llama-2 in the Taiwan Psychiatric Licensing Examination and in differential diagnosis with multi-center psychiatrists'. Together they form a unique fingerprint.

Cite this