close
close

Mondor Festival

News with a Local Lens

Health Systems Unite to Publicly Test and Rank Best AI Models
minsta

Health Systems Unite to Publicly Test and Rank Best AI Models

This audio is automatically generated. Please let us know if you have any comments.

Since the launch of ChatGPT in 2022, technology companies have rushed to bring AI-based generative tools to the healthcare market. However, suppliers face a dilemma over what – and whether – to buy.

As Google, Amazon, Microsoft and OpenAI are rapidly expanding their range of artificial intelligence offerings, vendors say they are unsure how to compare product effectiveness or determine which tool might best meet their specific needs.

A group of health systems, led by Boston-based Mass General Brigham, is hoping to solve that problem.

On Wednesday, the university medical center launched the Collaboration on the AI ​​challenge in healthcarewhich will allow participating clinicians to test the latest AI offerings in simulated clinical environments. Clinicians will pit the models against each other in head-to-head competitions and produce public rankings of the commercial tools by the end of the year.

Participating health systems say the ability to directly compare AI products is long overdue.

Despite the rapid proliferation of AI in healthcare, the industry has been slow to agree on how to assess quality. Industry groups have attempted to deploy assessment frameworks, but the guidelines remain in draft form.

Without standardized assessment measures, it’s difficult to compare even the most similar tools, said Richard Bruce, associate professor of radiology and vice chair of informatics at the University of Wisconsin School of Medicine and Public Health.

“Are there any (common) metrics that directly compare them? Currently, to my knowledge, aside from user surveys and anecdotes, the tools are not directly compared to each other,” he said. “There’s no easy way to compare apples to apples.”

So far, Emory Healthcare, the departments of radiology at the University of Wisconsin School of Medicine and Public Health and the University of Washington School of Medicine, and the industry group at American College of Radiology are participating in this collaboration. MGB has announced plans to expand the program.

Health systems will initially test nine models, according to an MGB spokesperson, including products from Microsoft, Google, Amazon Web Services, OpenAI and Harrison.AI.

Clinicians will evaluate the models on draft report generation, key findings, differential diagnosis and other factors, according to MGB.

The parameters for evaluating models “evolve,” Bruce said, and may depend on the tool’s clinical use case. For example, although model accuracy will always be heavily weighted, there are some cases, such as when the model is used to produce a textual report, where readability may be more important.

“Some of it will depend largely on subjective quality,” Bruce said. “Do I feel that the style in which this text is presented is more easily readable or more patient-friendly? »

Ultimately, health systems will create a “ranking” of tools, according to Duhyant Sahani, professor and chair of the department of radiology at the University of Washington.

The ranking will be used both to provide feedback to technology companies and to help health systems purchase technology, according to MGB.

Health systems that aren’t directly participating in the challenge might be able to use the rankings to decide which tools to purchase, according to Sahani — which he sees as a win for health equity.

In the race to implement AI, experts have expressed concern that resource-constrained providers, who may not have time to research new tools, could be left behind .

“Health systems can use transparent rankings to inform decision-making and establish benchmarks,” Sahani said. “The consortium’s ideas and best practices can be adopted by non-participating health systems. »

Google and Microsoft declined to comment for this article.