
Healthcare workers and patients feel more warmth from AI-generated medical responses than from actual doctors, a surprising analysis of 15 studies shows. The largest study examined 2,164 patient interactions, with similar patterns emerging across smaller datasets.
ChatGPT and similar AI chatbots scored roughly two points higher than human healthcare professionals on 10-point empathy scales when responding to patient questions via text. AI had a 73% probability of being rated as more empathic than human practitioners in head-to-head comparisons.
“In text-only scenarios, AI chatbots are frequently perceived as more empathic than human HCPs,” study authors wrote. The meta-analysis from the Universities of Nottingham and Leicester pooled data from 13 of the 15 studies comparing AI chatbots to doctors, nurses, and other healthcare workers.
The results, published in the British Medical Bulletin, challenge long-held assumptions about human connection in medicine and run counter to a 2019 UK government report that called empathy an “essential human skill that AI cannot replicate.”
AI Shows Empathy Edge Across Medical Specialties
ChatGPT-4 outperformed human clinicians in nine separate studies spanning cancer care, thyroid conditions, mental health, autism, and general medical inquiries. For thyroid questions, the AI scored 1.42 standard deviations above human surgeons in empathy ratings. Mental health queries showed similar patterns, with ChatGPT-4 scoring 0.97 standard deviations higher than licensed mental health professionals.
Patient complaints revealed the starkest gaps. When handling grievances across hospital departments, ChatGPT-4 scored 2.08 standard deviations higher than human patient relations officers.
The AI advantage appeared consistent regardless of who evaluated the responses. When both physicians and patients reviewed the same set of answers about systemic lupus, ChatGPT-4 received higher empathy ratings from physicians. For questions about multiple sclerosis, patient representatives using a validated empathy scale rated AI responses more favorably than neurologist responses.
Studies drawing from Reddit health forums and patient portals showed similar trends. Questions ranged from interpreting blood test results to managing chronic conditions to understanding cancer treatment options. Across this variety, AI responses were more likely to be rated as warm, understanding, and considerate of patient concerns.
Dermatology provided the sole exception. In both studies examining skin-related questions, dermatologists outperformed ChatGPT-3.5 and Med-PaLM 2, though researchers couldn’t explain this specialty-specific pattern.
The Text Message Caveat
All studies evaluated text-based interactions exclusively. Even when one study converted AI responses to audio, empathy ratings came from written transcripts alone.
A doctor’s nod, forward lean, or eye contact often conveys understanding as powerfully as words. Text-based healthcare interactions represent a small portion of patient care, though their use grows with patient portals and telemedicine.
Studies also relied on proxy evaluators rather than patients receiving actual care. Healthcare professionals, medical students, patient representatives, and researchers rated empathy in responses to real patient questions. Direct patient feedback might differ, particularly since healthcare providers and patients often rate empathy differently.
Most studies used custom, unvalidated empathy scales. Raters typically scored responses on 1-5 or 1-10 scales ranging from “not empathetic” to “very empathetic.” Only one study employed the CARE scale, a validated 10-item instrument designed specifically for measuring therapeutic empathy in clinical consultations.
The studies couldn’t determine whether AI’s perceived empathy advantage translates to better health outcomes. While empathic communication has been linked to reduced patient pain and anxiety, improved medication adherence, and higher satisfaction with care, these studies measured perception rather than clinical impact.
Twenty Percent of UK Doctors Already Use ChatGPT
The research lands as AI adoption in healthcare accelerates. One in five UK general practitioners now uses generative AI tools for tasks like writing patient correspondence. Over 117,000 patients across 31 NHS mental health services have interacted with Wysa, an AI-powered digital therapist, according to Wysa’s website.
Study authors propose a collaborative model where doctors draft initial responses while AI enhances tone and empathic language, with clinicians ensuring medical accuracy. This approach could reduce physician workload while potentially improving patient satisfaction.
Empathic delivery means little if medical advice proves wrong. AI reliability concerns persist, and gains in perceived warmth could vanish if responses contain factual errors or incomplete guidance.
How the Research Was Conducted
Researchers searched seven databases for studies published through November 2024, identifying 15 qualifying studies from 2023-2024. Most used unvalidated single-item scales asking raters to score empathy from 1-5 or 1-10. Only one employed a validated instrument, the CARE scale designed for measuring therapeutic empathy.
Fourteen studies assessed ChatGPT variants (versions 3.5 or 4), while others examined Claude, Gemini Pro, Le Chat, ERNIE Bot, and Med-PaLM 2. Patient questions came from emails in private medical records, Reddit and public forums, real-time chat transcripts, and in-person reception interactions. The largest dataset included 2,164 live outpatient queries at a Chinese hospital.
Nine studies had moderate risk of bias; six showed serious risk. Common problems included curated patient queries potentially skewing results, reliance on Reddit communities where users may face barriers to formal care, and supervised AI designs where human experts reviewed outputs before release.
Source : https://studyfinds.org/empathy-chatgpt-more-human-than-doctors/