KCI_RV_Framework

This repository provides a pilot implementation of a multi-agent evaluation framework for assessing the trustworthiness of Large Language Models (LLMs) in medical domains.

Citation

If you find this work useful in your research, please cite the following paper:

Shin, E., Ko, S., Yang, U., Won, W., Na, K., Woo, H., & Lee, Y. (2025). A RV Framework for Evaluating the Trustworthiness of Medical Large Language Models. Smart Media Journal, 14(12), 74-84.

@article{shin2025rv,
  title={A RV Framework for Evaluating the Trustworthiness of Medical Large Language Models},
  author={Shin, Eunji and Ko, Siyeon and Yang, Uijun and Won, Woohyung and Na, Kyungmin and Woo, Hyekyung and Lee, Youngho},
  journal={Smart Media Journal},
  volume={14},
  number={12},
  pages={74--84},
  year={2025}
}

Medical LLM Trustworthiness Evaluation – Pilot Code

This repository provides a pilot implementation of a multi-agent evaluation framework for assessing the trustworthiness of Large Language Models (LLMs) in medical domains.

🔎 Purpose

Demonstrate how diagnostic responses from LLMs can be evaluated using a multi-dimensional rubric (Accuracy, Explainability, Consistency, Safety).
Compare external evaluation and self-evaluation (LLM self-critique), showing their differences and complementarity.
Provide pilot results as a proof-of-concept for the proposed framework in our research paper.

📂 Dataset

The pilot uses open clinical QA datasets (e.g., MedQA) as substitutes for real-world diagnostic cases.
Each case is structured into JSON format including:
- summary
- evidence_list
- criteria
- final_judgment

⚙️ Usage

Install dependencies
```
pip install -r requirements.txt
```

한국어 (README)

의료 LLM 신뢰성 평가 – 파일럿 코드

이 저장소는 의료 분야에서 대규모 언어모델(LLM)의 신뢰성 평가를 위한 멀티에이전트 프레임워크를 파일럿으로 구현한 코드입니다.

🔎 목적

임상 응답(진단 텍스트)을 다차원 루브릭(정확성, 설명가능성, 일관성, 안전성)으로 평가하는 방법을 시연합니다.
외부 평가자 평가와 LLM 자기 평가(Self-Critique) 결과를 비교하여 차이점을 확인합니다.
연구 논문에서 제안하는 평가 체계의 개념 증명(Proof-of-Concept)을 제공합니다.

📂 데이터셋

실제 임상 데이터를 대체하기 위해 공개 임상 QA 데이터셋(MedQA)을 사용했습니다.
각 케이스는 JSON 형식으로 구성됩니다:
- summary
- evidence_list
- criteria
- final_judgment

⚙️ 실행 방법

필수 라이브러리 설치
```
pip install -r requirements.txt
```

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
case_study		case_study
data		data
pilot_code		pilot_code
report		report
result		result
visualization		visualization
LICENSE		LICENSE
README.md		README.md
[KCI]_RV_Framework_V03.ipynb		[KCI]_RV_Framework_V03.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

KCI_RV_Framework

Citation

Medical LLM Trustworthiness Evaluation – Pilot Code

🔎 Purpose

📂 Dataset

⚙️ Usage

한국어 (README)

의료 LLM 신뢰성 평가 – 파일럿 코드

🔎 목적

📂 데이터셋

⚙️ 실행 방법

About

Uh oh!

Releases

Packages

Languages

License

belovelace/KCI_RV_Framework

Folders and files

Latest commit

History

Repository files navigation

KCI_RV_Framework

Citation

Medical LLM Trustworthiness Evaluation – Pilot Code

🔎 Purpose

📂 Dataset

⚙️ Usage

한국어 (README)

의료 LLM 신뢰성 평가 – 파일럿 코드

🔎 목적

📂 데이터셋

⚙️ 실행 방법

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages