Add utility function to compute single-graph semantic similarity #41

bschilder · 2024-08-23T17:23:23Z

A very common thing users might want to do is to compute the semantic similarity between nodes in a graph and then store that data back in the edges of the graph (to use as edge weights later).

I've created the following function to automate this.

#' Add semantic similarity
#'
#' First computes semantic similarity between all pairs of nodes in a graph.
#' Then adds the continuous similarity score as an edge attribute.
#' @param sim_fun The similarity function to use.
#'  Default is link[igraph]{similarity}.
#' @param sim_col Name of the new edge attribute to store the similarity score.
#' @param ... Additional arguments passed to the similarity function
#'  (\code{sim_fun}).
#' @returns Graph object with similarity added as a new edge attribute.
#' @export
#' @examples
#' filename <- system.file("extdata", "eds_marfan_kg.tar.gz", package = "monarchr")
#' g <- file_engine(filename) |>
#'           fetch_nodes(query_ids = "MONDO:0007525") |>
#'           expand(predicates = "biolink:has_phenotype",
#'                  categories = "biolink:PhenotypicFeature")|>
#'           expand(categories = "biolink:Gene")
#' g <- graph_semsim(g)
#' edges(g)$similarity
graph_semsim <- function(graph,
					    sim_fun=igraph::similarity,
					    sim_col="similarity",
					    ...){
	from <- to <- NULL;
	message("Computing pairwise node similarity.")
	X <- sim_fun(graph, ...)
	rownames(X) <- colnames(X) <- igraph::V(graph)$name
	graph <- graph|>
		activate(edges)|>
		dplyr::mutate(!!sim_col:=purrr::map2_dbl(from, to, ~ X[.y, .x]))
	return(graph)
}

bschilder · 2024-08-23T17:27:27Z

Of course, this won't yield the "true" semantic similarity between the nodes unless your graph is sufficiently large and complete to accurately characterise the relationships between the nodes. But still, can come in handy.

bschilder · 2024-08-23T17:34:01Z

monarch_semsim does something a bit like this, but instead compares two graph to each other.
Is it possible to query the semantic similarity API for a single graph instead (so we can get accurate similarity scores between all edge-connected nodes within the graph)? Might be a nice complement to my graph_semsim function which only considers the local graph instance.

oneilsh · 2024-08-27T17:20:23Z

The Monarch semsim API does take two sets of node IDs, and computes the best match from each in set A to those in set B, and vice-versa (essentially designed to support https://monarchinitiative.org/explore#phenotype-explorer).

I suppose for a single graph we could just pass the nodes as both set A and set B, but the functionality giving just the best match means that each node will just be reported to match itself (I think). Perhaps this is a feature request for the API - all-vs-all semantic similarity queries. Could be intensive though given the O(n^2) nature of the result. Tagging @kevinschaper

bschilder · 2024-10-01T15:06:07Z

The Monarch semsim API does take two sets of node IDs, and computes the best match from each in set A to those in set B, and vice-versa (essentially designed to support https://monarchinitiative.org/explore#phenotype-explorer).

I suppose for a single graph we could just pass the nodes as both set A and set B, but the functionality giving just the best match means that each node will just be reported to match itself (I think). Perhaps this is a feature request for the API - all-vs-all semantic similarity queries. Could be intensive though given the O(n^2) nature of the result. Tagging @kevinschaper

Yeah, instead of returning only the top 1 similar node I'd want to return each node's similarities with each other node. I agree, for large graphs this would be a massive computation. Perhaps precomputing this and storing it as a separate database (version-controlled and regenerated for each KG release). Is this something that would be feasible @kevinschaper ?

bschilder changed the title ~~Add utility function to compute semantic similarity~~ Add utility function to compute single-graph semantic similarity Aug 23, 2024

bschilder mentioned this issue Jan 8, 2025

monarchr 1.5.1 #52

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add utility function to compute single-graph semantic similarity #41

Add utility function to compute single-graph semantic similarity #41

bschilder commented Aug 23, 2024 •

edited

Loading

bschilder commented Aug 23, 2024 •

edited

Loading

bschilder commented Aug 23, 2024 •

edited

Loading

oneilsh commented Aug 27, 2024

bschilder commented Oct 1, 2024

Add utility function to compute single-graph semantic similarity #41

Add utility function to compute single-graph semantic similarity #41

Comments

bschilder commented Aug 23, 2024 • edited Loading

bschilder commented Aug 23, 2024 • edited Loading

bschilder commented Aug 23, 2024 • edited Loading

oneilsh commented Aug 27, 2024

bschilder commented Oct 1, 2024

bschilder commented Aug 23, 2024 •

edited

Loading

bschilder commented Aug 23, 2024 •

edited

Loading

bschilder commented Aug 23, 2024 •

edited

Loading