This repository describes a web application to support the efforts of the SP-EU project. Social Prescribing is a way to bridge the gap between health service providers (for example general practicioners) and non-medical support providers called link workers. It’s based on the understanding that many health issues are related to social, emotional or practical needs such as loneliness, isolation, or problems with debt or housing.
This application caters to link workers. The application can be used to insert, manage and lookup information about organizations, offers, events, etc that seem relevant to the social prescribing effort. For a quick introduction to the app you can watch this short tutorial.
The project has a flake.nix, so you can enter the development environment with:
nix develop
Common commands (to be used from within the wisen subdirectory):
- Run backend server:
clj -M:backend - Run backend server with repl:
clj -M:backend -r PORT - Run backend tests:
clj -T:build test-clj - Run frontend watch:
clj -T:build cljs-watch
The frontend then runs on http://localhost:4321. The cljs test display is at http://localhost:9501/.
Ollama + Keycloak backing services can be started in a local QEMU VM via
nix run .#dev-vmNote that on macOS only this requires a properly configured and booted
linux-builder (nix run nixpkgs#darwin.linux-builder). Follow the instructions in
the manual on how to configure your macOS system to automatically dispatch
aarch64-linux builds to the linux-builder.
After you updated dependencies in `deps.edn` you have to update the corresponding lockfile as well:
nix run .#update-clj-lockfileThe live system is hosted at https://sp-eu.active-group.de. Deployment happens continually via CI pipeline.
We use agenix to manage secrets (mainly the DB password used by the Keycloak service). See this directory.
The project consists of a ClojureScript frontend and a Clojure backend. The backend application uses a git repository as its data store. The data model consists of a knowledge graph based on the Resource Description Framework (RDF). Knowledge graph data is stored as RDF serializations in the git repository. The application periodically reads from the git repository and presents the resulting knowledge graph to users of the frontend to query and update. Users can modify this knowledge graph however they see fit. When they issue a modification request, the backend packs the corresponding changes up in a git commit and pushes this commit to the origin repository. This arrangement has many practical advantages over traditional data stores such as RDBMS.
- Versioning by default: All changes are versioned and can be rolled back if the need arises.
- Use of external software: Since we store the knowledge graph data as RDF serializations (N-Triples, JSON-LD), we can leverage the existing ecosystem of the semantic web. For example, a power-user may completely forego our custom web frontend and solely interact with the application’s data by using an application like Protégé.
- Bulk import: It may be worthwhile to convert existing data stores to RDF and then import this data into the SP-EU app in bulk. This can easily and transparently be achieved with the git data store model.
In addition to the git storage model and the RDF data model there are a number of supporting components. The following diagram shows the relevant components and their interactions for a search query originating at a user of the web frontend.
flowchart TD
U[User] -->|Search!| A
A[Web frontend] -->|Search request| B(Backend handler)
B --> C{Access module}
C -->|lookup| D[Cache]
D -->|manage| G{Jena data store}
D -->|manage| H{Lucene index}
D -->|populate from| E[Repository access]
E -->|manage files| I{git repository}
We use JGit to talk to the git repository. We mostly use the lower-level plumbing APIs.
Knowledge graphs are managed with the help of the Apache Jena library ecosystem. Jena handles serialization, deserialization and querying (with SPARQL) of knowledge graphs. We use Jena graphs in-memory and not as storage devices. The single-source-of-truth is always to be found in the serialized files in the git repository. Jena knowledge graphs are created on demand, predictively or kept in caches.
SPARQL – the query language of the semantic web and therefore the language supported by Apache Jena – lacks expressivity for our purposes. Most of our queries are two-dimensional: Users look for a fuzzy search term (“Sports for elderly”) in a specific map area. Both fuzzy semantic searches and geo-searches are badly supported by SPARQL. We therefore use a two-layered system to resolve user queries. We first lookup fuzzy semantic search terms and geo queries in a separate Lucene index. This presents us with a list of URIs (resource identifiers) which we then use in a SPARQL query passed to a Jena knowledge graph. Lucene indices are sometimes created on demand (very slow), and mostly kept in caches.
Lucene indices and Jena knowledge graphs are often cached for faster subsequent reads. All cached objects are keyed by the corresponding commit hash. Currently we keep four pairs of Lucene index and Jena knowledge graph around in a least-recently used fashion.
For each interaction the web frontend explicitly refers to a specific git commit hash. Currently, this commit hash is obtained by frequently issuing a request for the “head” of the data store. For reading, searching, and querying, this “head request” is the only impure interaction between frontend and backend. All subsequent read requests are pure.
The following sequence diagram shows a sample interaction between frontend, backend, and an external user directly accessing the git repository.
sequenceDiagram
Frontend->>+Backend: head?
Backend->>+Local Repository: head?
Local Repository-->>-Backend: 7ba34c...
Backend-->>-Frontend: 7ba34c...
User-->>Local Repository: New commit!
Frontend->>+Backend: head?
Backend->>+Local Repository: head?
Local Repository-->>-Backend: 8aaab...
Backend-->>-Frontend: 8aaab...
Users can search for resources in the knowledge graph(s) via the web frontend. Most searches name a fuzzy search term (“Sports for elderly”) and a geographic area defined by longitude and latitude ranges. As described in the previous section, all read and search requests explicitly refer to a git commit hash and can be considered pure functions.
The following sequence diagram shows a sample interaction between the relevant components handling a search query. In this scenario the corresponding cache is not yet populated.
sequenceDiagram
Frontend->>Access: search for commit id 8aaab...
Access->>Cache: get cached jena graph + lucene index for 8aaab...
Cache->>Repository: get jena graph for 8aaab...
Repository->>Cache: jena graph
Cache->>Cache: Compute lucene index from jena graph
Cache->>Access: jena graph + computed lucene index
Access->>Lucene index: Lookup fuzzy search term and geographic area
Lucene index->>Access: List of relevant URIs
Access->>Jena graph: run SPARQL CONSTRUCT query
Jena graph->>Access: return result graph
Access->>Frontend: return result graph
Users can either use the web frontend to modify the knowledge graph or directly access the serialized RDF data in the git repository. The following sequence diagram shows a sample interaction between the relevant components handling a change request from a user of the web frontend.
sequenceDiagram
Frontend->>Access: Change request based on 8aaab...
Access->>Repository: Apply changeset based on 8aaab...
Repository->>Access: Result commit id: 399be...
Access->>Cache: Prepopulate cache for 399be...
Access->>Frontend: Result commit id: 399be...
RDF supports so called blank nodes. A blank node doesn’t have an explicit URI but can still appear in triples. Blank nodes are assigned temporary blank node IDs. However, the RDF standard fails to mention how far these temporary IDs reach. There are no explicit introductions of blank node IDs. This makes it impossible to work with blank nodes in a larger context. We therefore require all resources to be assigned explicit URIs whenever knowledge graph fragments or derived information moves from the backend to the frontend. We call the process of assigning explicit URIs: skolemization.
When a user creates resources in the frontend, they are allowed to use blank nodes during editing. A user can then commit their changes to the backend (“write”). Later they issue queries to the knowledge graph where the result may contain information that they previously committed (“query”). In order to fulfill the constraint that all knowledge graph fragments must be skolemized when they move from backend to frontend, we could either skolemize eagerly on “write”, we could skolemize lazily on “read” or we could decide to skolemize at any point in between. Currently, we skolemize eagerly for writes originating at the web frontend and we skolemize lazily for writes directly to the git repository.
Users can insert knowledge graph fragments however they see fit. In order to make the geo search work, however, we have to have geo information attached to resources. This could in theory be done by explicitly assigning longitude and latitude values, but that’s very cumbersome and error-prone in practice. We therefore provide a service that translates postal address information (e.g. “Hechinger Str. 12/1 in 72072 Tübingen, Germany”) into longitude and latitude. It’s not vital that this “geocoding” process happens eagerly on every write. Currently, we decide to kick-off geocoding asynchronouslyhttps://github.com/active-group/sp-eu/blob/main/wisen/src/main/wisen/backend/access.clj#L184. It can also be triggered manually.