Researchers from the University of Toronto administered a standard radiologist exam to ChatGPT — an artificial intelligence (AI) chatbot developed by OpenAI — which reportedly earned an overall score of 81%.

According to the researchers, the 150-question text-based exam, which mimicked those given by radiology boards in both the U.S. and Canada, was administered to the two currently available versions of ChatGPT: the older GPT 3.5 and the enhanced GPT 4.


The different versions of the chatbot were reportedly asked to answer the same question sets, categorized as both lower order and higher order questions. The researchers explained that the lower order questions focused on the basic understanding of the topic areas as well as knowledge recall, while higher order questions dealt with applying, analyzing and synthesizing information.

According to the test results, the GPT 3.5 version of ChatGPT scored just 69% on the question set — scoring 84% on the higher order questions and just 60% on the lower order questions. Meanwhile, the GPT-4 version of ChatGPT scored 81% on the question set — scoring 81% on the higher-order questions.

Surprising researchers, however, was the performance of GPT-4 on lower order questions. This version of the chatbot reportedly got 12 questions wrong that GPT 3.5 answered correctly.

"We were initially surprised by ChatGPT’s accurate and confident answers to some challenging radiology questions, but then equally surprised by some very illogical and inaccurate assertions,” the researchers explained.

The study is detailed in the article, Performance of ChatGPT on a Radiology Board-style Examination: Insights into Current Strengths and Limitations, which appears in the journal Radiology.

To contact the author of this article, email