(AI in Healthcare – Research Project)
Author:
- Máté Lukács
Date: April 2025
This repository presents a research project analyzing the application of large language models (LLMs) in the diagnosis and management of heart failure (HF) – a major global health challenge affecting more than 64 million people worldwide.
The study critically evaluates three state-of-the-art LLMs — GPT-4x (OpenAI), Gemini (Google DeepMind), and DeepSeek — focusing on their clinical reasoning, technical and medical limitations, and ethical considerations in the context of cardiovascular care.
The project addresses three primary dimensions:
- Reasoning capabilities of LLMs compared to expert cardiologists.
- Technical and medical limitations, including accuracy, guideline adherence, and adaptability to clinical nuance.
- Ethical implications of deploying LLMs in healthcare, covering issues such as bias, transparency, accountability, and patient trust.
- Conducted a structured literature and guideline-informed evaluation of LLMs in heart failure care.
- Designed case-based prompts to simulate diagnostic and therapeutic decision-making.
- Compared model responses with ESC, ACC/AHA, and AHA guidelines and expert clinical reasoning.
- Assessed both strengths and limitations of each model.
-
GPT-4x (OpenAI)
- Strong in guideline-based reasoning and pharmacological pathways.
- Limited by static knowledge base, hallucinations, and lack of patient-specific adaptability.
-
Gemini (DeepMind)
- Excels in contextual integration (socio-demographics, comorbidities).
- Opacity of training data and reliance on structured prompts reduce trustworthiness.
-
DeepSeek
- Highly structured and consistent in guideline summaries.
- Outputs often rigid, formulaic, and less clinically adaptive.
| Model | Strengths | Limitations | Best Use Case |
|---|---|---|---|
| GPT-4x | Accurate, guideline-based, versatile | Hallucinations, no patient context | Medical education, decision support |
| Gemini | Context-aware, integrates broader factors | Closed-source, prompt-dependent | Policy, medical planning, guideline synthesis |
| DeepSeek | Structured, factually consistent | Rigid, lacks nuance | Standardized workflows, information retrieval |
.
├── Comparing_LLMs_in_HeartFailure.pdf # Full research paper
├── README.md # Project overview
└── LICENSE # MIT License- v1.0.0: Initial research release (comparative study).
- v1.1.0: Extend with quantitative evaluation of LLM outputs using clinical benchmarks.
- v2.0.0: Expand to other cardiovascular conditions (e.g., atrial fibrillation, coronary artery disease).
This project is licensed under the MIT License – see the LICENSE file for details.
This research project was officially submitted as part of a team assignment.
- Research, analysis, and methodology were conducted entirely by Máté Lukács.
- For assignment submission purposes, teammates' names were included:
- Ádám Földvári
- Héctor Carlos Flores Reynoso