AI Programs Pass U.S. Medical Licensing Exam: Implications for Medical Education and Research
ChatGPT and Flan-PaLM: The Future of AI in Medicine?
Two AI programs, including ChatGPT, developed by OpenAI, have passed the U.S. Medical Licensing Examination (USMLE), according to recent papers. The USMLE is comprised of three exams: Step 1, Step 2 CK, and Step 3. One paper, published on medRxiv, investigated ChatGPT's performance on the USMLE without any special training or reinforcement prior to the exams.
The results showed that ChatGPT was able to perform at or near the passing threshold for all three exams without any specialized training or reinforcement. Another paper, published on arXiv, evaluated the performance of another large language model, Flan-PaLM, on the USMLE.
The model was heavily modified using a collection of medical question-answering databases called the MultiMedQA. Flan-PaLM achieved 67.6% accuracy in answering the USMLE questions, which was about 17 percentage points higher than the previous best performance.
The papers suggest that large language models may have the potential to assist with medical education and potentially, clinical decision-making. However, healthcare professionals have expressed concerns over these developments, especially when ChatGPT is listed as an author on research papers.
Despite the promising results, there are still concerns about the use of AI programs in medical research and education. One of the main concerns is whether AI programs are truly capable of making meaningful scholarly contributions to a paper. Some critics argue that AI tools lack the ability to understand and interpret complex medical concepts, and thus may not be able to provide accurate or meaningful insights.
Another concern is the ethical implications of listing AI programs as co-authors on research papers. AI programs cannot consent to be a co-author, and some argue that it is not ethical to do so.
Despite these concerns, researchers continue to explore the potential of AI programs in medical education, research, and clinical decision-making. The authors of the papers believe that large language models have the potential to be beneficial tools in medicine, but they also acknowledge the need for further research and collaboration between AI researchers, clinicians, social scientists, ethicists, policymakers, and other interested parties to responsibly translate these early findings into improved medical practice.
The future of AI in medicine is still uncertain, but the recent success of ChatGPT and Flan-PaLM on the USMLE suggests that AI programs have the potential to be valuable tools in medical education and research. However, it's important to continue to study and understand the limitations and ethical considerations of using AI in the medical field.