🧠 Sheffield Researcher RAG UI

A custom fork of OpenWebUI modified to support Graph Retrieval Augmented Generation (Graph RAG) across scraped researcher profiles from the University of Sheffield.

The project aims to enable natural language querying over academic profiles, providing contextual, AI driven answers to research related queries.

✨ Features

Graph RAG: Query academic knowledge via an LLM-powered LangGraph agent
Neo4j Knowledge Graph: Stores relationships between people, departments, and research interests
LangGraph Integration: Intelligent routing of queries with context aware tool use
Chat Persistence: Async PostgreSQL based chat history storage

🧰 Tech Stack

Frontend: Custom fork of OpenWebUI
Backend: FastAPI with LangGraph agent logic
LLM Interface: Supports OpenAI, Ollama, and other backends
GraphDB: Neo4j for graph-based retrieval and reasoning
Async Storage: PostgreSQL with PostgresSaver for chat memory

🚀 Getting Started

Clone the repository

git clone [email protected]:RSE-Sheffield/uos-grants.git
cd uos-grants

Set Up Environment

cp .env.example .env

Start with Docker Compose

docker compose up --build

Other versions of docker compose may require you to run

docker-compose up --build

You may also add the -d flag to run the stack in the background.

⚙️ Environment Variables

Environment variables needed to configure response generation, embedding, and database connection.

# Response model variables
LLM_MODEL_PROVIDER=openai
LLM_MODEL=gpt-4.1-nano-2025-04-14
LLM_API_KEY=sk-...

# Neo4j graph rag generation variables
GRAPH_LLM_PROVIDER=openai
GRAPH_LLM_MODEL=gpt-4.1-nano-2025-04-14
GRAPH_LLM_API_KEY=sk-...

# Embedding model variables
EMBEDDING_MODEL_PROVIDER=openai
EMBEDDING_MODEL_NAME=text-embedding-3-large
EMBEDDING_DIMENSIONS=3072
EMBEDDING_MODEL_API_KEY=sk-...
EMBEDDING_NODES=Research_Interest, Department, Person

The following environment variables are configured in the docker-compose.yaml file for the open-webui container, and should align with the setup of your postgres and neo4j setups.

# Database variables, should align with the postgres container variables.
DATABASE_URL: postgresql://user:pass@postgres:5432/uos_grants
CHAT_MEMORY_DB_URI: postgresql://user:pass@postgres:5432/uos_grants

# Neo4j variables, should align with the neo4j container variables.
NEO4J_URI: bolt://neo4j:7685
NEO4J_USERNAME: neo4j
NEO4J_PASSWORD: your_neo4j_password

🧪 Usage

🗃️ Database and Graph Population

How profiles are fetched, stored in PostgreSQL, and used to build the Neo4j graph.

The population of both the PostgreSQL and Neo4j databases is fully automated. This ensures that researcher information is regularly collected, structured, and updated without manual intervention.

🏗️ Initial Setup

Steps to scrape data, extract structured information, and build the graph.

Sitemap Scraping The system begins by fetching the University of Sheffield sitemap. All URLs containing /people/ are extracted as candidate staff profile pages.
Profile Extraction and Storage Each profile page is scraped, and the following fields are collected where available:

Full name
Contact details (email, phone, address)
School or department
Research interests
Full profile text
Last modified date (from the sitemap XML)
These are stored in a PostgreSQL database, with the last_modified timestamp used to track changes over time.

Graph Construction in Neo4j After all profiles are scraped, the system builds a Neo4j graph:

A Person node is created for each staff member, with attributes like name and URL.
Related entities (e.g., School, Role, Email, Address, Telephone) are created as individual nodes and connected to the person node.
Research interests (if present) are passed to a configurable LLM to extract individual topics.
Each unique research interest is stored as a Research_Interest node and linked to the corresponding staff member(s).
The graph ensures node reuse, so duplicate schools or shared interests are only created once and reused via relationships.

Embedding Generation Nodes for Person, School, and Research_Interest are embedded using a configurable embedding model. These embeddings are used for semantic search and retrieval when querying the graph.

🔄 Ongoing Updates

How the system stays up-to-date with periodic re-scraping and graph updates.

To keep the data current:

The sitemap is periodically re-fetched.
For each /people/ link, the last_modified value is compared to the stored value in PostgreSQL.
If the timestamps differ:
- The profile is re-scraped.
- The corresponding Person node and its direct relationships are deleted and rebuilt in Neo4j using the same logic as the initial setup.
Updates are automatically triggered:
- On Docker Compose startup
- Periodically while the stack is running

This ensures the graph remains accurate and up-to-date with minimal human intervention.

🧑‍💻 Using the UI

How to interact with the system via natural language queries.

Navigate to http://localhost or the URL of the host platform.
Enter a research-related query such as:

"Name 5 researchers who work in sustainable energy?"
Responses are generated based on:
- Matching research interests
- Graph traversal using relationships in the Neo4j knowledge graph.
- Reasoning and response by the LangGraph agent powered by the configured LLM.

🛠️ Model Configuration

Explanation of configurable LLMs and embedding providers using LangChain.

The system supports fully configurable LLM and embedding model providers via LangChain integrations. This allows you to easily switch between providers and models depending on your use case, budget, or availability.

🧾 Supported Providers

Table of all compatible providers and their LangChain integration keys.

The following providers are currently supported:

Provider Key	LangChain Integration
`openai`	`langchain-openai`
`anthropic`	`langchain-anthropic`
`azure_openai`	`langchain-openai`
`azure_ai`	`langchain-azure-ai`
`google_vertexai`	`langchain-google-vertexai`
`google_genai`	`langchain-google-genai`
`bedrock`	`langchain-aws`
`bedrock_converse`	`langchain-aws`
`cohere`	`langchain-cohere`
`fireworks`	`langchain-fireworks`
`together`	`langchain-together`
`mistralai`	`langchain-mistralai`
`huggingface`	`langchain-huggingface`
`groq`	`langchain-groq`
`ollama`	`langchain-ollama`
`google_anthropic_vertex`	`langchain-google-vertexai`
`deepseek`	`langchain-deepseek`
`ibm`	`langchain-ibm`
`nvidia`	`langchain-nvidia-ai-endpoints`
`xai`	`langchain-xai`
`perplexity`	`langchain-perplexity`

🎯 Model Selection

How to choose specific models for your use case.

Each provider supports one or more models. You can configure models by setting the appropriate model name string. For example:

To use OpenAI’s GPT-4o Mini: gpt-4o-mini
To use Google’s Gemini 2.5 Pro: gemini-2.5-pro

Refer to the specific provider’s documentation for a full list of supported model variants.

🔐 Authentication

How to securely provide API keys for model access.

Your provider-specific API key should be supplied via the relevant API_KEY environment variable. This key is used to authenticate all model requests.

🧩 Model Roles

Defines the distinct responsibilities of each model:

Three distinct models can be configured:

LLM Response Model
Used to generate responses to user queries.
Embedding Model
Generates embedding vectors for nodes and incoming queries to support semantic search and retrieval.
Graph Generation Model
Processes staff profile texts to extract research interests via an LLM, which are then structured into the Neo4j graph.

Each model can be independently configured to use different providers and model variants.

Name		Name	Last commit message	Last commit date
Latest commit History 9,890 Commits
.github		.github
backend		backend
cypress		cypress
docs		docs
kubernetes		kubernetes
loaders		loaders
scripts		scripts
src		src
static		static
test/test_files/image_gen		test/test_files/image_gen
uos_grants		uos_grants
.dockerignore		.dockerignore
.env.example		.env.example
.eslintignore		.eslintignore
.eslintrc.cjs		.eslintrc.cjs
.gitattributes		.gitattributes
.gitignore		.gitignore
.npmrc		.npmrc
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Caddyfile.localhost		Caddyfile.localhost
Dockerfile		Dockerfile
Dockerfile.scraper		Dockerfile.scraper
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_OWEBUI.md		README_OWEBUI.md
TROUBLESHOOTING.md		TROUBLESHOOTING.md
UOS_CHANGELOG.md		UOS_CHANGELOG.md
confirm_remove.sh		confirm_remove.sh
cypress.config.ts		cypress.config.ts
demo.gif		demo.gif
dev.Dockerfile		dev.Dockerfile
docker-compose-dev.yaml		docker-compose-dev.yaml
docker-compose.yaml		docker-compose.yaml
hatch_build.py		hatch_build.py
i18next-parser.config.ts		i18next-parser.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
pyproject.toml		pyproject.toml
run-compose.sh		run-compose.sh
run-ollama-docker.sh		run-ollama-docker.sh
run.sh		run.sh
svelte.config.js		svelte.config.js
tailwind.config.js		tailwind.config.js
tsconfig.json		tsconfig.json
update_ollama_models.sh		update_ollama_models.sh
uv.lock		uv.lock
vite.config.ts		vite.config.ts
wait-for-it.sh		wait-for-it.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Sheffield Researcher RAG UI

✨ Features

🧰 Tech Stack

🚀 Getting Started

⚙️ Environment Variables

🧪 Usage

🗃️ Database and Graph Population

🏗️ Initial Setup

🔄 Ongoing Updates

🧑‍💻 Using the UI

🛠️ Model Configuration

🧾 Supported Providers

🎯 Model Selection

🔐 Authentication

🧩 Model Roles

About

Uh oh!

Releases 1

Packages

Languages

License

RSE-Sheffield/uos-grants

Folders and files

Latest commit

History

Repository files navigation

🧠 Sheffield Researcher RAG UI

✨ Features

🧰 Tech Stack

🚀 Getting Started

⚙️ Environment Variables

🧪 Usage

🗃️ Database and Graph Population

🏗️ Initial Setup

🔄 Ongoing Updates

🧑‍💻 Using the UI

🛠️ Model Configuration

🧾 Supported Providers

🎯 Model Selection

🔐 Authentication

🧩 Model Roles

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages