Diagnostic Challenge: AI vs Doctors reviews three different studies that compare the diagnostic performance of various large language models (LLMs) vs doctors.
CLICK HERE TO READ THE ARTICLE
The studies' respective structures are somewhat different. And, they arrive at different outcomes. But, when reviewing all three, it appears clear that powerful proprietary LLMs like Chat GPT 4 far outperform doctors. And, in some cases doctors can benefit from extracting the information from LLMs. Yet, their diagnostic performance is still not as good as just using the LLM alone.
INTRODUCTION
To evaluate how well Large Language Models (LLMs) can diagnose health conditions vs doctors, I refer to three recent papers:
- Hager paper using a “Clinical Decision Making” framework, January 26, 2024 [1]
2. Goh paper using a “Diagnostic Reasoning” framework, 2024. Test period November 29 to December 29, 2023 [2]
3. McDuff paper using a “Accurate Differential Diagnosis” framework, November 2023 [3]