AI Systems Struggle to Analyze Data from Multiple Hospitals

Siobhan Treacy | November 08, 2018

A new study from the Icahn School of Medicine at Mount Sanai found that artificial intelligence (AI) tools trained to detect pneumonia on chest X-rays have decreased performance when using data from multiple hospital computer systems. If a patient had to transfer hospitals, the AI system has lower overall performance when faced with analyzing data from a different hospital's computer system.

Artificial intelligence seems like the next big thing for medical diagnoses, but it is not perfect yet. (Source: Pixabay)

The study’s findings suggest that AI needs to be tested for long-term performance and across a wide range of hospital computer systems before it is implemented in a hospital. If these AI tools are not thoroughly tested, patient lives could be in danger.

Other studies suggest that AI image classification doesn’t generalize new data perfectly. There is a growing interest among medical researchers in using convolutional neural networks (CNN), a kind of AI software, to analyze medical imaging for computer-aided disease diagnosis.

During the study, the researchers looked closely at AI-identified pneumonia in 158,000 chest X-rays from three hospitals. In three out of five comparisons, the CNN’s performance was degraded when analyzing X-rays from outside its original health system.

CNNs were able to accurately figure out which hospital the X-ray was taken in. The system cheated when performing a predictive task that was based on how predominant pneumonia is in the training center. Using deep learning models for medicine is difficult because these systems require many parameters, making it hard to identify specifics in an X-ray.

"Our findings should give pause to those considering rapid deployment of artificial intelligence platforms without rigorously assessing their performance in real-world clinical settings reflective of where they are being deployed," says senior author Eric Oermann, M.D., instructor in neurosurgery at the Icahn School of Medicine at Mount Sinai. "Deep learning models trained to perform medical diagnosis can generalize well, but this cannot be taken for granted since patient populations and imaging techniques differ significantly across institutions."

"If CNN systems are to be used for medical diagnosis, they must be tailored to carefully consider clinical questions, tested for a variety of real-world scenarios, and carefully assessed to determine how they impact accurate diagnosis," says first author John Zech, a medical student at the school.

The study was published in PLOS Medicine.