Skip to content

mateluky/llm-heart-failure-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Comparing LLMs for Heart Failure Diagnosis and Therapy

(AI in Healthcare – Research Project)

Author:

  • Máté Lukács

Date: April 2025

This repository presents a research project analyzing the application of large language models (LLMs) in the diagnosis and management of heart failure (HF) – a major global health challenge affecting more than 64 million people worldwide.

The study critically evaluates three state-of-the-art LLMs — GPT-4x (OpenAI), Gemini (Google DeepMind), and DeepSeek — focusing on their clinical reasoning, technical and medical limitations, and ethical considerations in the context of cardiovascular care.

📑 Research Scope

The project addresses three primary dimensions:

  1. Reasoning capabilities of LLMs compared to expert cardiologists.
  2. Technical and medical limitations, including accuracy, guideline adherence, and adaptability to clinical nuance.
  3. Ethical implications of deploying LLMs in healthcare, covering issues such as bias, transparency, accountability, and patient trust.

🔬 Methodology

  • Conducted a structured literature and guideline-informed evaluation of LLMs in heart failure care.
  • Designed case-based prompts to simulate diagnostic and therapeutic decision-making.
  • Compared model responses with ESC, ACC/AHA, and AHA guidelines and expert clinical reasoning.
  • Assessed both strengths and limitations of each model.

📊 Key Findings

  • GPT-4x (OpenAI)

    • Strong in guideline-based reasoning and pharmacological pathways.
    • Limited by static knowledge base, hallucinations, and lack of patient-specific adaptability.
  • Gemini (DeepMind)

    • Excels in contextual integration (socio-demographics, comorbidities).
    • Opacity of training data and reliance on structured prompts reduce trustworthiness.
  • DeepSeek

    • Highly structured and consistent in guideline summaries.
    • Outputs often rigid, formulaic, and less clinically adaptive.

⚖️ Comparative Summary

Model Strengths Limitations Best Use Case
GPT-4x Accurate, guideline-based, versatile Hallucinations, no patient context Medical education, decision support
Gemini Context-aware, integrates broader factors Closed-source, prompt-dependent Policy, medical planning, guideline synthesis
DeepSeek Structured, factually consistent Rigid, lacks nuance Standardized workflows, information retrieval

📁 Repository Structure

.
├── Comparing_LLMs_in_HeartFailure.pdf # Full research paper
├── README.md # Project overview
└── LICENSE # MIT License

🧭 Roadmap

  • v1.0.0: Initial research release (comparative study).
  • v1.1.0: Extend with quantitative evaluation of LLM outputs using clinical benchmarks.
  • v2.0.0: Expand to other cardiovascular conditions (e.g., atrial fibrillation, coronary artery disease).

📄 License

This project is licensed under the MIT License – see the LICENSE file for details.

👥 Credits

This research project was officially submitted as part of a team assignment.

  • Research, analysis, and methodology were conducted entirely by Máté Lukács.
  • For assignment submission purposes, teammates' names were included:
    • Ádám Földvári
    • Héctor Carlos Flores Reynoso

About

Research project evaluating large language models (LLMs) for heart failure diagnosis and therapy, focusing on reasoning, limitations, and ethical implications.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors