Skip to content

Latest commit

 

History

History
95 lines (70 loc) · 5.91 KB

README.md

File metadata and controls

95 lines (70 loc) · 5.91 KB

Graph Merge

Notice

"""
THIS MATERIAL WAS PREPARED AS AN ACCOUNT OF WORK SPONSORED BY AN AGENCY OF THE UNITED STATES GOVERNMENT. NEITHER THE UNITED STATES GOVERNMENT NOR THE UNITED STATES DEPARTMENT OF ENERGY, NOR BATTELLE, NOR ANY OF THEIR EMPLOYEES, NOR ANY JURISDICTION OR ORGANIZATION THAT HAS COOPERATED IN THE DEVELOPMENT OF THESE MATERIALS, MAKES ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LEGAL LIABILITY OR RESPONSIBILITY FOR THE ACCURACY, COMPLETENESS, OR USEFULNESS OR ANY INFORMATION, APPARATUS, PRODUCT, SOFTWARE, OR PROCESS DISCLOSED, OR REPRESENTS THAT ITS USE WOULD NOT INFRINGE PRIVATELY OWNED RIGHTS. REFERENCE HEREIN TO ANY SPECIFIC COMMERCIAL PRODUCT, PROCESS, OR SERVICE BY TRADE NAME, TRADEMARK, MANUFACTURER, OR OTHERWISE DOES NOT NECESSARILY CONSTITUTE OR IMPLY ITS ENDORSEMENT, RECOMMENDATION, OR FAVORING BY THE UNITED STATES GOVERNMENT OR ANY AGENCY THEREOF, OR BATTELLE MEMORIAL INSTITUTE. THE VIEWS AND OPINIONS OF AUTHORS EXPRESSED HEREIN DO NOT NECESSARILY STATE OR REFLECT THOSE OF THE UNITED STATES GOVERNMENT OR ANY AGENCY THEREOF. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

PACIFIC NORTHWEST NATIONAL LABORATORY operated by BATTELLE for the UNITED STATES DEPARTMENT OF ENERGY under Contract DE-AC05-76RL01830
"""

Overview

The graph_merge package was designed to assist with analyzing two different but potentially similar graphs through node attributes. To utilize the package, you will likely want to create a new virtual environment and pip install the graph_merge package. The package comes with a visualiztion tool that requires graphviz, which must be installed separately. Alternatively, the graph can be saved as a .gml that can be imported into the user's prefered graph visualization tool.

The tool was originally created for comparing Bills of Materials (BOMs) but can compare any two graphs as long as the following is true:

  1. The graphs contain at least one node that user knows should be matching.
  2. Each node in both graphs contains the node attribute/s that is being used to match the nodes.

The tool will output a single Networkx graph object where nodes are labeled according to which graph they belong.

Create the Merged Graph

To merge the graphs, use the functionality found in the GraphMerger class. First you'll need to instantiate the class:

from graph_merge.merge_graphs import GraphMerger

merger = GraphMerger(graph1, graph2)

Generate DFS NodeMap

To merge the graphs you will need a NodeMap by running the GraphMerger.get_depth_first_search_map method. This method takes the list of node attributes you want to match on and what similarity function to use for the matching (along with the parameters for it). Currently, this package supports the following similarity functions (more information can be found in the graph_merge.similarity.similarity package):

  • exact_match: Exact string matching (case sensitive).
  • contains: If either string is a substring of the other.
  • jaro_match: Jaro-Winkler string similarity from the Levenshtein package.
  • ratio_match": Ratio match for the Levenshtein package.
  • string_similarity_cosine": TF-IDF cosine similarity using the SciKit-Learn package.

We encourage you to explore the node attribute matching functionality. It is able to represent a large variety of conditional logics depending on how you structure the attribute parameter.

To generate a NodeMap you will need a starting node for at least graph 1 to begin the search from. If you choose a starting node for graph 1 and graph 2 then they will assumed to be similar, even if the chosen similarity function would not match them.

Note: this method can take a long time to finish as it is calculating an N^2 similarity matrix!

An example call using exact string matching on just the name attribute:

node_map = merger.get_depth_first_search_map(["name"], match_function_args={"exact_match": {}}, g1_starting_node="starting_node")

get_depth_first_search_map returns a NodeMap object that is used in the next step and can also be cached.

Merge the Graphs

You will now merge the graphs by calling GraphMerger.create_merge_graph_from_mapping. To use this method you need to pass in the previously generated NodeMap object:

merged_graph = merger.create_merge_graph_from_mapping(node_map)

This method returns a NetworkX graph that both has any nodes that could be merged together, and all of the nodes that were unable to be merged.

Additionally, the property current_graph on the GraphMerger object is populated with the same merged graph. You can now explore that graph using whatever graph tool you would like. We have supplied some helper functions that can export the graph using a variety of tools. We have options for exporting as GML, PNG using NetworkX and matplotlib for rendering, and holoviews. Please explore the graph_merge.visual module to find out more about them. You will need to install the visual dependencies by installing the visual dependency group (ex: pip install merge_graph[visual]).

Creating Overlays

Lastly, we have provided a way to calculate an overlay of the merged graph. This allows you to the current graph with additional edges for what nodes could be potentially merged together given additional similarity metrics. Here is an example using the exact match merger we calculated previously that is then overlaid with a jaro similarity to detect nodes that are similar and could potentially be merged:

overlay_mappings = merger.overlay_node_map_on_graph(attributes=["name"], match_function_args={"jaro_match": {"threshold": 0.6}})

In this example overlay_mappings will contain a dictionary of the newly proposed node mappings. merger.current_graph will contain additional edges with an edge attribute called "graph" with a value of "Possible Match".