Skip to content

Add DISGENET API integration as a new database tool#288

Open
MoiraClimentGispert wants to merge 2 commits intosnap-stanford:mainfrom
MoiraClimentGispert:feature/disgenet
Open

Add DISGENET API integration as a new database tool#288
MoiraClimentGispert wants to merge 2 commits intosnap-stanford:mainfrom
MoiraClimentGispert:feature/disgenet

Conversation

@MoiraClimentGispert
Copy link

@MoiraClimentGispert MoiraClimentGispert commented Mar 12, 2026

Summary

This PR by MedBioinformatics adds DISGENET as a new data source in Biomni, following the Adding New Data (web API) contribution guidelines.

DISGENET™(https://disgenet.com/) is an integrated, evidence-scored knowledge layer that connects genes, variants, diseases, traits and therapeutics into a unified semantic framework designed for computational use, clinical interpretation and translational research.

What's included

Core tool (new data source via web API):

  • biomni/tool/database.py — New query_disgenet_api() function that translates natural-language prompts into DISGENET REST API calls. Includes automatic entity normalization (disease names → UMLS CUI, gene names → NCBI Gene ID), dynamic endpoint selection, and structured result parsing.
  • biomni/tool/literature.py — New query_disgenet_evidence() evidence retrieval tool for DISGENET literature-backed association queries.
  • biomni/tool/tool_description/database.py — Tool description for query_disgenet_api following existing format.
  • biomni/tool/tool_description/literature.py — Tool description for query_disgenet_evidence() following the existing format.

Agent integration:

  • biomni/agent/a1.py — Added DISGENET_API_KEY preflight check during agent initialization (warns early if key is missing; optionally prompts in interactive sessions). Moved load_dotenv call earlier to ensure env vars are available before imports.
  • biomni/config.py — Added DISGENET-related configuration options.
  • .env.example — Added DISGENET_API_KEY entry.

Key features of the DISGENET tool

  • Natural language → API endpoint translation via LLM
  • Automatic entity normalization (disease names, gene symbols → standard identifiers)
  • Support for ordering/filtering by DISGENET score, DSI, DPI, EI, pLI, pmYear
  • GDA, VDA, and DDA association queries
  • Disease class queries
  • Verbose mode for debugging (shows normalization steps, resolved endpoint, full metadata)

Test prompt

from biomni.agent import A1

agent = A1(path='./data', llm='claude-sonnet-4-20250514')
agent.go("What are the top genes associated with Parkinson's disease ordered by DISGENET score?")
agent.go("Show variant-disease associations for BRCA1 filtered by evidence index > 0.8")
agent.go("Find diseases associated with TP53 in the Musculoskeletal Diseases class")

Requirements if you want to use DISGENET tool

  • A DISGENET API key (set as DISGENET_API_KEY environment variable)
  • No additional pip dependencies beyond what Biomni already requires

Test plan

  • Tested query_disgenet_api() with GDA, VDA, and DDA queries
  • Verified entity normalization (disease names, gene symbols)
  • Confirmed agent selects DISGENET tool appropriately via natural language
  • Ran benchmark evaluation (results included in biomni/eval/benchmark_results/)
  • Verified no conflicts with existing tools or upstream changes

@MoiraClimentGispert MoiraClimentGispert changed the title Add DisGeNET API integration as a new database tool Add DISGENET API integration as a new database tool Mar 12, 2026
Integrate DISGENET REST API endpoints as a new data source in Biomni,
enabling the agent to query gene-disease associations, variant-disease
associations, and related biomedical data directly through the DISGENET
API.

Main changes:
- Add query_disgenet functions to biomni/tool/database.py with
  API-based data retrieval, normalization, and result parsing
- Add DISGENET evidence tool to biomni/tool/literature.py
- Add tool descriptions for all new DISGENET tools
- Add DISGENET_API_KEY preflight check in agent initialization

Bug Fixes & Enhancements:
biomni/llm.py
	line 38: changed config.llm_model for config.llm (self.variable of config.py)
biomni/config.py
	edited default configuration so all is centralized through env. File
biomni/env_desc.py & env_desc_cm.py
	Removed all references to DisGeNET.parquet from the codebase
biomni/agent/a1.py
	Moved load_dotenv() to before the import of default_config to avoid missing environment variables
	Added self.disgenet_api_available = self._ensure_disgenet_api_key()
biomni/tool/database.py
	line 87 changed query_llm_for_api, now handles diff. formats "Failed to parse LLM response, no attribute .strip"
	Line 67: Use string replacement instead of .format() to avoid issues with curly braces in JSON

Required .env variables:
DISGENET_API_KEY = Required for DISGENET API access
If no API key is provided, a warning message appears together with the option to provide it or continue without it regardless.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant