ChatGPT Can Score Approximately 60% Passing Threshold For US Medical Licensing Exam: Study

ChatGPT, a machine learning platform developed by American artificial intelligence research laboratory Open AI, which is at the centre of several controversies due to concerns regarding the accuracy of its content, can almost pass the United States Medical Licensing Exam (USMLE), according to a study published February 9 in the open-access journal PLOS Digital Health. The study was conducted by Tiffany Kung, Victor Tseng and their colleagues at Ansible Health, a healthcare startup that provides technology-enhanced treatments for pulmonary diseases.

Milestones achieved by ChatGPT in the USMLE

The new study found that ChatGPT can score at or around the approximately 60 per cent threshold for the USMLE. The responses provided by ChatGPT for the exam were found to make coherent, internal sense, and contained frequent insights.

How ChatGPT works

ChatGPT is a new artificial intelligence system known as a large language model, a deep learning algorithm that can process large amounts of text, and recognise, summarise, translate, predict and generate text and other content based on knowledge obtained from massive datasets. ChatGPT is designed to generate human-like writing by predicting upcoming word sequences, and unlike most chatbots, it cannot search the Internet. The machine learning platform uses word relationships predicted by its internal process to generate text.

ALSO READ | Machine Learning: How Does It Work? Here’s How It Helps Augment Artificial Intelligence

What is USMLE?

USMLE, the test ChatGPT can almost pass, is a highly standardised and regulated series of three examinations — Steps 1, 2CK and 3. One needs to pass these examinations to obtain medical licensure in the US. As part of the study, the team of researchers tested ChatGPT’s performance on the USMLE.

Medical students and physicians-in-training can appear for the USMLE, which assesses knowledge spanning most medical disciplines, ranging from biochemistry, to diagnostic reasoning and bioethics.

ChatGPT demonstrated high-degree of concordance across its responses

The study authors removed the image-based questions, and tested ChatGPT on 350 of 376 public questions available from the June 2022 USME release.

The indeterminate responses by the software were removed. According to the study, the software scored between 52.4 per cent and 75 per cent across the three USMLE exams. Each year, the passing threshold is approximately 60 per cent.

Also, ChatGPT was found to demonstrate 94.6 per cent concordance across all its responses. In formal terms, concordance refers to consistency.

The software produced at least one significant insight, which was new, non-obvious, and clinically valid, for 88.9 per cent of its responses.

ChatGPT beat its counterpart model

Another software, called PubMedGPT, which has been trained exclusively on biomedical domain literature, scored 50.8 per cent on an older dataset of USMLE-style questions. Therefore, ChatGPT exceeded the performance of its counterpart model.

Significance of the study

According to the authors, the study findings provide a glimpse of ChatGPT’s potential to enhance medical education. The software also has the potential to improve clinical practice, the authors believe.

The authors noted that clinicians at Ansible Health already use ChatGPT to rewrite jargon-heavy reports to make it easier for patients to comprehend their medical reports.

ChatGPT is the first software to achieve the benchmark of scoring approximately 60 per cent in the USMLE, the authors said, adding that it marks a notable milestone in AI maturation.

The authors also believe that large language models such as ChatGPT may potentially assist human learnings in a medical education setting, and can be used during clinical decision-making.