Skip to content

tddschn/easygraph-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

title emoji colorFrom colorTo sdk sdk_version app_file pinned
Easygraph Bench
📉
yellow
indigo
gradio
3.23.0
app.py
false

easygraph-bench

Benchmark using all datasets

This repository includes code for benchmarking the performance of graph libraries including easygraph.

Objectives

Objective 1

Benchmarking code that compares the performance of the 2 graph libraries easygraph (with and without C++ binding) and networkx.

Objective 2

Benchmarking code that compares the performance of 6 graph libraries
easygraph, networkx, igraph, graphtool, networkit, snap-stanford.
on these methods loading, 2-hops, shortest path, k-core, page rank, strongly connected components.

Benchmarking method

timeit.Timer.autorange is used to run the specified methods on the graph objects.

If the method returns a Generator, the result will be exhausted.

See get_Timer_args() for more details.

Benchmarked methods

For Objective 1

See config.py for more details.

  • clustering_methods: ["average_clustering", "clustering"]
    (eg.average_clustering vs nx.average_clustering, ...)
  • shortest_path_methods: [('Dijkstra', 'single_source_dijkstra_path')]
    (eg.Dijkstra vs nx.single_source_dijkstra_path)
  • connected_components_methods: ["is_biconnected", "biconnected_components"]
  • mst_methods: ['minimum_spanning_tree']
    C++ binding not supported for this method yet.
  • other_methods: ['density', 'constraint'].

For Objective 2

Click to expand

Source: graph-benchmark-code.yaml

The data (except the easygraph related part) is this yaml file is extracted from timlrx's graph-benchmark repository with semgrep.

See the timlrx directory for more details.

easygraph:
  loading: '''read_edgelist(filename, delimiter="\t", nodetype=int, create_using=eg.DiGraph()).cpp()'''
  loading_undirected: '''read_edgelist(filename, delimiter="\t", nodetype=int, create_using=eg.Graph()).cpp()'''
  # page rank: "'pagerank(g)'"
  shortest path: f'Dijkstra(g, {nodeid})'
  strongly connected components: "'[i for i in strongly_connected_components(g)]'"
  graphtool:
  2-hops: '"shortest_distance(g, g.vertex(0), max_dist=2).a"'
  k-core: "'kcore_decomposition(g).a'"
  loading:
      '''''''load_graph_from_csv(filename, directed=True, csv_options={''delimiter'':
      ''\t'', ''quotechar'': ''"''})'''''''
  loading_undirected:
      '''''''load_graph_from_csv(filename, directed=False, csv_options={''delimiter'':
      ''\t'', ''quotechar'': ''"''})'''''''
  page rank: "'pagerank(g, damping=0.85, epsilon=1e-3, max_iter=10000000).a'"
  shortest path: '"shortest_distance(g, g.vertex(0)).a"'
  strongly connected components:
      "'cc, _ = label_components(g, vprop=None, directed=True,
      attractors=False); cc.a'"
  igraph:
  k-core: '"g.coreness(mode=''all'')"'
  loading: '"Graph.Read(filename, format=''edges'')"'
  loading_undirected: '"Graph.Read(filename, format=''edges'', directed=False)"'
  page rank: '"g.pagerank(damping=0.85)"'
  shortest path: '"g.shortest_paths([g.vs[0]])"'
  strongly connected components: '"[i for i in g.components(mode=STRONG)]"'
  networkit:
  k-core: '"nk.centrality.CoreDecomposition(g).run().scores()"'
  loading:
      '"nk.graphio.EdgeListReader(separator=''\t'', firstNode=0, continuous=True,
      directed =True).read(filename)"'
  loading_undirected: '"nk.graphio.EdgeListReader(separator=''\t'', firstNode=0, continuous=True).read(filename)"'
  page rank: '"nk.centrality.PageRank(g, damp=0.85, tol=1e-3).run().scores()"'
  shortest path: '"nk.distance.BFS(g, 0, storePaths=False).run().getDistances(False)"'
  strongly connected components: '"nk.components.StronglyConnectedComponents(g).run().getPartition().getVector()"'
  networkx:
  2-hops: f'single_source_shortest_path_length(g, {nodeid}, cutoff=2)'
  k-core: "'core.core_number(g)'"
  loading: '''read_edgelist(filename, delimiter="\t", nodetype=int, create_using=nx.DiGraph())'''
  loading_undirected: '''read_edgelist(filename, delimiter="\t", nodetype=int, create_using=nx.Graph())'''
  page rank: "'pagerank(g, alpha=0.85, tol=1e-3, max_iter=10000000)'"
  shortest path: f'shortest_path_length(g, {nodeid})'
  strongly connected components: "'[i for i in strongly_connected_components(g)]'"
  snap:
  2-hops: '"snap.GetNodesAtHop(g, 0, 2, NodeVec, True)"'
  k-core: '"snap.GetKCoreNodes(g, CoreIDSzV)"'
  loading: '"snap.LoadEdgeListStr(snap.PNGraph, filename, 0, 1)"'
  page rank: '"snap.GetPageRank(g, PRankH, 0.85, 1e-3, 10000000)"'
  shortest path: '"snap.GetShortPath(g, 0, NIdToDistH, True)"'
  strongly connected components: '"snap.GetSccs(g, Components)"'
  

Run

Run locally

Prerequisites

python >= 3.10 is required.

First, you need to download datasets manually or with a script like this one.

To run these scripts, you need to clone the repo and install the dependencies listed in requirements.txt.

To install easygraph:
As of 8/6/2022, wheel for python-easygraph is not available on PyPI, and you need to build it yourself and install the module by running the following code.

git clone https://github.com/easy-graph/Easy-Graph && cd Easy-Graph && git checkout pybind11
pip install pybind11
python3 setup.py install

To install other 5 graph libraries on conda, run

conda install -c conda-forge python-igraph graph-tool networkit snap-stanford -y
python3 -m pip install networkx

For Objective 1

You can run benchmarking on a single dataset with the ./bench_*.py scripts,
or run benchmarking on a set of datasets with the ./entrypoint_*.py scripts, or run all of them ./bench.sh.

For Objective 2

You can run benchmarking on a single dataset for a single library with the ./profile_*.py scripts, or use the convenience script ./profile_entrypoint.sh to profile a bunch of datasets for the 6 libraries (you may need to adjust the dataset locations for this script to work).

# Download datasets
cp scripts/download_data.sh . && bash download_data.sh

# Get LCC datasets from the downloaded datasets
./get_lcc_edgelist.py

# Generate the scripts, only bench what you want to bench
gen-scripts-20230328-directed-only:
	./gen_profile_scripts_with_suffix_wrapper.py '20230328-pagerank-scc-directed-only' --tools 'igraph' 'easygraph' --methods 'page rank' 'strongly connected components' --directed-datasets-only

Scripts usage

# For Objective 1
# the ./bench_*.py scripts are for benchmarking easygraph and networkx on a single or all datasets

$ ./bench_cheminfo.py --help

ENZYMES_g1: nodes: 37 edges: 84
usage: bench_cheminfo.py [-h] [-G {clustering,shortest-path,connected-components,mst} [{clustering,shortest-path,connected-components,mst} ...]] [-C]

EasyGraph & NetworkX side-by-side benchmarking

optional arguments:
  -h, --help            show this help message and exit
  -G {clustering,shortest-path,connected-components,mst} [{clustering,shortest-path,connected-components,mst} ...], --method-group {clustering,shortest-path,connected-components,mst} [{clustering,shortest-path,connected-components,mst} ...]
  -C, --skip-cpp-easygraph, --skip-ceg
                        Skip benchmarking cpp_easygraph methods (default: False)
# for Objective 2
# the ./profile_*.py scripts are for profiling the one graph library on dataset of your choice.

# examples:
# ./profile_igraph.py my_dataset/my_network.edgelist -n 1000
# run the benchmarked methods of igraph on my_network.edgelist dataset for 1000 times, the dataset will be read as a directed graph.
# ./profile_networkx_undirected.py bio.edgelist
# read bio.edgelist as an undirected graph.

$ ./profile_easygraph.py D
usage: profile_easygraph.py [-h] [-n INT] PATH

Benchmark easygraph

positional arguments:
  PATH                  path to the dataset file in tab-separated edgelist format

options:
  -h, --help            show this help message and exit
  -n INT, --iteration INT
                        iteration count when benchmarking, auto-determined if unspecified (default: None)

Run on GitHub Actions (for Objective 1 only)

Fork this repo, go to the Actions tab and click Run Workflow.

Result visualization (for Objective 1 only)

timeit results are saved in csv files, and seaborn is used to render and save the figures in the images/ directory.

Image generation is slow, use -D or --skip-draw when running ./bench_*.py to skip image generation.

Datasets

See dataset_loaders.py and dataset for details.

The er_* Erdos-Renyi random graphs are generated with eg.erdos_renyi_P(), available here.

Dataset Name nodes edges is_directed average_degree density type
cheminformatics 37 168 True 9.08108108108108 0.12612612612612611 easygraph.classes.directed_graph.DiGraph
cheminformatics_lcc 37 84 False 4.54054054054054 0.12612612612612611 networkx.classes.graph.Graph
eco 1258 7619 False 12.112877583465819 0.009636338570776308 networkx.classes.graph.Graph
eco_lcc 1258 7619 False 12.112877583465819 0.009636338570776308 networkx.classes.graph.Graph
bio 1458 1948 False 2.672153635116598 0.0018340107310340413 easygraph.classes.graph.Graph
bio_lcc 1458 1948 False 2.672153635116598 0.0018340107310340413 networkx.classes.graph.Graph
road_sampled 2075 1132 False 1.0910843373493977 0.0005260773082687548 networkx.classes.graph.Graph
facebook 4039 88234 True 43.69101262688784 0.0054099817517196435 networkx.classes.digraph.DiGraph
facebook_lcc 4039 88234 False 43.69101262688784 0.010819963503439287 networkx.classes.graph.Graph
coauthorship_sampled 4340 6398 False 2.9483870967741934 0.0006795084343798557 networkx.classes.graph.Graph
uspowergrid 4941 6594 False 2.66909532483303 0.0005403026973346214 networkx.classes.graph.Graph
uspowergrid_lcc 4941 6594 False 2.66909532483303 0.0005403026973346214 networkx.classes.graph.Graph
pgp_sampled 6465 18906 True 5.848723897911833 0.00045240747972709105 networkx.classes.digraph.DiGraph
wikivote_lcc 7066 100736 False 28.512878573450326 0.004035793145569756 networkx.classes.graph.Graph
wikivote 7115 100762 True 29.146591707659873 0.0020485375110809584 networkx.classes.digraph.DiGraph
hepth_lcc 8638 24827 False 5.748321370687659 0.0006655460658431931 networkx.classes.graph.Graph
pgp_undirected_sampled 8781 51939 False 11.829859924837718 0.0013473644561318586 networkx.classes.graph.Graph
enron_sampled 9301 79905 False 17.182023438339964 0.001847529401972039 networkx.classes.graph.Graph
hepth 9877 25998 False 5.264351523742027 0.000533044909248889 networkx.classes.graph.Graph
condmat_lcc 21363 91342 False 8.551420680616019 0.0004003099279382089 networkx.classes.graph.Graph
condmat 23133 93497 False 8.083430596982666 0.0003494479766981958 networkx.classes.graph.Graph
enron_lcc 33696 180811 False 10.731896961063628 0.0003185011711252004 networkx.classes.graph.Graph
enron 36692 183831 False 10.020222391802028 0.00027309755503535 networkx.classes.graph.Graph
pgp 39796 301498 True 15.15217609810031 0.00019037788790175037 networkx.classes.digraph.DiGraph
pgp_undirected 39796 197150 False 9.908030957885215 0.00024897677994434515 networkx.classes.graph.Graph
pgp_lcc 39796 197150 False 9.908030957885215 0.00024897677994434515 networkx.classes.graph.Graph
pgp_undirected_lcc 39796 197150 False 9.908030957885215 0.00024897677994434515 networkx.classes.graph.Graph
road_lcc 126146 161950 False 2.567659695907916 2.035482734874879e-05 networkx.classes.graph.Graph
road 129164 165435 False 2.5616270787525934 1.9832514564949666e-05 easygraph.classes.graph.Graph
amazon 262111 1234877 True 9.42254998836371 1.7974419114806206e-05 networkx.classes.digraph.DiGraph
amazon_lcc 262111 899792 False 6.86573245685988 2.6194088195261075e-05 networkx.classes.graph.Graph
amazon_sampled 262111 1234877 True 9.42254998836371 1.7974419114806206e-05 networkx.classes.digraph.DiGraph
coauthorship 402392 1234019 False 6.1334171653512 1.5242431280399412e-05 networkx.classes.graph.Graph
google_lcc 855802 4291352 False 10.028843120254452 1.1718662539836308e-05 networkx.classes.graph.Graph
google 875713 5105039 True 11.659160021605253 6.656960291514363e-06 networkx.classes.digraph.DiGraph
pokec 1632803 30622564 True 37.50919614919865 1.148614349725155e-05 networkx.classes.digraph.DiGraph
pokec_lcc 1632803 22301964 False 27.31739713854029 1.6730379518484355e-05 networkx.classes.graph.Graph
er_500 500 2511 False 10.044 0.020128256513026053 easygraph.classes.graph.Graph
er_1000 1000 4950 False 9.9 0.00990990990990991 easygraph.classes.graph.Graph
er_5000 5000 24920 False 9.968 0.001993998799759952 easygraph.classes.graph.Graph
er_10000 10000 50023 False 10.0046 0.0010005600560056005 easygraph.classes.graph.Graph
er_paper_20221213_50000 50000 60316 False 2.41264 4.8253765075301506e-05 easygraph.classes.graph.Graph
er_paper_20221213_500000 500000 70315 False 0.28126 5.625211250422501e-07 easygraph.classes.graph.Graph
er_paper_20221213_1000000 1000000 80266 False 0.160532 1.6053216053216054e-07 easygraph.classes.graph.Graph

Objective 1:

Bench results REST API

Results downloads

You can download the benchmarking results on the Releases page.

The server release contains benchmarking run on a powerful EC2 server (c5.2xlarge), for the paper.

FAQ

How do I use your benchmarking code on other datasets?

Simply write a function load_<dataset_name>() and add it to dataset_loaders.py. Checkout the examples in that file.

Then duplicate any of the benchmarking script, and replace the loading function with your own loader.