From ce6ed4c719799f74402a44f5bc8843cc8187ee0f Mon Sep 17 00:00:00 2001 From: Rick Ratzel Date: Mon, 10 Jun 2024 13:29:47 -0500 Subject: [PATCH 1/6] Doc cleanup for nx-cugraph: fixed typos, cleaned up various descriptions, renamed notebook to match naming convetion. --- docs/cugraph/source/nx_cugraph/nx_cugraph.md | 27 +++++++------ ...ching.ipynb => nx_cugraph_benchmark.ipynb} | 38 +++++++++---------- 2 files changed, 30 insertions(+), 35 deletions(-) rename notebooks/cugraph_benchmarks/{nx_cugraph_codeless_switching.ipynb => nx_cugraph_benchmark.ipynb} (74%) diff --git a/docs/cugraph/source/nx_cugraph/nx_cugraph.md b/docs/cugraph/source/nx_cugraph/nx_cugraph.md index 854f755cb6..ece7221099 100644 --- a/docs/cugraph/source/nx_cugraph/nx_cugraph.md +++ b/docs/cugraph/source/nx_cugraph/nx_cugraph.md @@ -1,18 +1,17 @@ ### nx_cugraph -Whereas previous versions of cuGraph have included mechanisms to make it -trivial to plug in cuGraph algorithm calls. Beginning with version 24.02, nx-cuGraph -is now a [networkX backend](). -The user now need only [install nx-cugraph]() -to experience GPU speedups. +nx-cugraph is a [NetworkX +backend]() that provides GPU acceleration to many popular NetworkX algorithms. -Lets look at some examples of algorithm speedups comparing CPU based NetworkX to dispatched versions run on GPU with nx_cugraph. +By simply [installing and enabling nx-cugraph](), users can see significant speedup on workflows where performance is hindered by the default NetworkX implementation. With nx-cugraph, users can have GPU-based, large-scale performance without changing their familiar and easy-to-use NetworkX code. + +Let's look at some examples of algorithm speedups comparing NetworkX with and without GPU accelration using nx-cugraph. Each chart has three measurements. -* NX - running the algorithm natively with networkX on CPU. -* nx-cugraph - running with GPU accelerated networkX achieved by simply calling the cugraph backend. This pays the overhead of building the GPU resident object for each algorithm called. This achieves significant improvement but stil isn't compleltely optimum. -* nx-cugraph (preconvert) - This is a bit more complicated since it involves building (precomputing) the GPU resident graph ahead and reusing it for each algorithm. +* NX - default NetworkX, no GPU acceleration +* nx-cugraph - GPU-accelerated NetworkX using nx-cugraph. This involves an internal conversion/transfer of graph data from CPU to GPU memory +* nx-cugraph (preconvert) - GPU-accelerated NetworkX using nx-cugraph with the graph data pre-converted/transfered to GPU ![Ancestors](../images/ancestors.png) @@ -44,7 +43,7 @@ user@machine:/# ipython bc_demo.ipy You will observe a run time of approximately 7 minutes...more or less depending on your cpu. -Run the command again, this time specifiying cugraph as the NetworkX backend of choice. +Run the command again, this time specifiying cugraph as the NetworkX backend. ``` user@machine:/# NETWORKX_BACKEND_PRIORITY=cugraph ipython bc_demo.ipy ``` @@ -52,12 +51,12 @@ This run will be much faster, typically around 20 seconds depending on your GPU. ``` user@machine:/# NETWORKX_BACKEND_PRIORITY=cugraph ipython bc_demo.ipy ``` -There is also an option to add caching. This will dramatically help performance when running multiple algorithms on the same graph. +There is also an option to cache the graph conversion to GPU. This can dramatically improve performance when running multiple algorithms on the same graph. ``` -NETWORKX_BACKEND_PRIORITY=cugraph CACHE_CONVERTED_GRAPH=True ipython bc_demo.ipy +NETWORKX_BACKEND_PRIORITY=cugraph CACHE_CONVERTED_GRAPHS=True ipython bc_demo.ipy ``` -When running Python interactively, cugraph backend can be specified as an argument in the algorithm call. +When running Python interactively, the cugraph backend can be specified as an argument in the algorithm call. For example: ``` @@ -65,4 +64,4 @@ nx.betweenness_centrality(cit_patents_graph, k=k, backend="cugraph") ``` -The latest list of algorithms that can be dispatched to nx-cuGraph for acceleration is found [here](https://github.com/rapidsai/cugraph/blob/main/python/nx-cugraph/README.md#algorithms). +The latest list of algorithms supported by nx-cuGraph can be found [here](https://github.com/rapidsai/cugraph/blob/main/python/nx-cugraph/README.md#algorithms). diff --git a/notebooks/cugraph_benchmarks/nx_cugraph_codeless_switching.ipynb b/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb similarity index 74% rename from notebooks/cugraph_benchmarks/nx_cugraph_codeless_switching.ipynb rename to notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb index e05544448b..f7f39286dd 100644 --- a/notebooks/cugraph_benchmarks/nx_cugraph_codeless_switching.ipynb +++ b/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb @@ -4,13 +4,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Benchmarking Performance of NetworkX with Rapids GPU-based nx_cugraph backend vs on cpu\n", + "# Benchmarking Performance of NetworkX with and without the RAPIDS GPU-based nx_cugraph backend\n", "# Skip notebook test\n", - "This notebook demonstrates compares the performance of nx_cugraph as a dispatcher for NetworkX algorithms. \n", + "This notebook collects the run times with and without the nx_cugraph backend and graph caching enabled for three popular NetworkX algorithms: Betweenness Centrality, Breadth First Search, and Louvain Community Detection.\n", "\n", - "We do this by executing Betweenness Centrality, Breadth First Search and Louvain Community Detection, collecting run times with and without nx_cugraph backend and graph caching enabled. nx_cugraph is a registered NetworkX backend. Using it is a zero code change solution.\n", - "\n", - "In the notebook switching to the nx-cugraph backend is done via variables set using the [NetworkX config package](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig) **which requires networkX 3.3 or later !!**\n", + "In this notebook, enabling the nx-cugraph backend will be done via using the [NetworkX config API](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig) (which requires NetworkX 3.3 or later)\n", "\n", "\n", "They can be set at the command line as well.\n", @@ -19,7 +17,7 @@ "\n", "\n", "\n", - "Here is a sample minimal script to demonstrate No-code-change GPU acceleration using nx-cugraph.\n", + "Here is a sample minimal script to demonstrate no-code-change GPU acceleration using nx-cugraph.\n", "\n", "----\n", "bc_demo.ipy:\n", @@ -71,7 +69,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This installs the NetworkX cuGraph dispatcher if not already present." + "This installs nx-cugraph if not already present." ] }, { @@ -92,11 +90,10 @@ "source": [ "This is boiler plate NetworkX code to run:\n", "* betweenness Centrality\n", - "* Bredth first Search\n", + "* Breadth first Search\n", "* Louvain community detection\n", "\n", - "and report times. it is completely unaware of cugraph or GPU-based tools.\n", - "[NetworkX configurations](https://networkx.org/documentation/stable/reference/utils.html#backends) can determine how they are run." + "and report times. This code does not require modification to use with nx-cugraph.\n" ] }, { @@ -106,15 +103,15 @@ "outputs": [], "source": [ "def run_algos(G):\n", - " runtime = time.time()\n", + " starttime = time.time()\n", " result = nx.betweenness_centrality(G, k=10)\n", - " print (\"Betweenness Centrality time: \" + str(round(time.time() - runtime))+ \" seconds\")\n", - " runtime = time.time()\n", + " print (\"Betweenness Centrality time: \" + str(round(time.time() - starttime))+ \" seconds\")\n", + " starttime = time.time()\n", " result = nx.bfs_tree(G,source=1)\n", - " print (\"Breadth First Search time: \" + str(round(time.time() - runtime))+ \" seconds\")\n", - " runtime = time.time()\n", + " print (\"Breadth First Search time: \" + str(round(time.time() - starttime))+ \" seconds\")\n", + " starttime = time.time()\n", " result = nx.community.louvain_communities(G,threshold=1e-04)\n", - " print (\"Louvain time: \" + str(round(time.time() - runtime))+ \" seconds\")\n", + " print (\"Louvain time: \" + str(round(time.time() - starttime))+ \" seconds\")\n", " return" ] }, @@ -146,10 +143,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Setting the NetworkX dispatcher with an environment variable or in code using NetworkX config package which is new to [NetworkX 3.3 config](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig).\n", + "Setting the NetworkX dispatcher with an environment variable or in code using the NetworkX config API ([NetworkX 3.3+](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig)).\n", "\n", - "These convenience settinge allow turning off caching and cugraph dispatching if you want to see how long cpu-only takes.\n", - "This example using an AMD Ryzen Threadripper PRO 3975WX 32-Cores cpu completed in slightly over 40 minutes." + "This example using an AMD Ryzen Threadripper PRO 3975WX 32-Cores CPU completed in slightly over 40 minutes." ] }, { @@ -171,12 +167,12 @@ "if use_cugraph:\n", " nx.config[\"backend_priority\"]=['cugraph']\n", "else:\n", - " # Use this setting to turn off the cugraph dispatcher running in legacy cpu mode.\n", + " # Use this setting to use the default NetworkX implementation.\n", " nx.config[\"backend_priority\"]=[]\n", "if cache_graph:\n", " nx.config[\"cache_converted_graphs\"]= True\n", "else:\n", - " # Use this setting to turn off graph caching which will convertthe NetworkX to a gpu-resident graph each time an algorithm is run.\n", + " # Use this setting to disable caching of graph conversions. This will require nx-cugraph to convert and transfer the native CPU-based graph object to the GPU each time an algorithm is run.\n", " nx.config[\"cache_converted_graphs\"]= False\n" ] }, From 7b6eb7406ed57eade413d6bf856c89e91487eea3 Mon Sep 17 00:00:00 2001 From: Rick Ratzel Date: Mon, 10 Jun 2024 23:06:58 -0500 Subject: [PATCH 2/6] Removed redundant word in description. --- notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb b/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb index f7f39286dd..c4ec9c0131 100644 --- a/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb +++ b/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb @@ -8,7 +8,7 @@ "# Skip notebook test\n", "This notebook collects the run times with and without the nx_cugraph backend and graph caching enabled for three popular NetworkX algorithms: Betweenness Centrality, Breadth First Search, and Louvain Community Detection.\n", "\n", - "In this notebook, enabling the nx-cugraph backend will be done via using the [NetworkX config API](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig) (which requires NetworkX 3.3 or later)\n", + "In this notebook, enabling the nx-cugraph backend will be done using the [NetworkX config API](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig) (which requires NetworkX 3.3 or later)\n", "\n", "\n", "They can be set at the command line as well.\n", From 4b222483ce90550790399b4657000a49ffdbadf7 Mon Sep 17 00:00:00 2001 From: Rick Ratzel Date: Tue, 11 Jun 2024 14:55:03 -0500 Subject: [PATCH 3/6] Fixes several typos. --- docs/cugraph/source/nx_cugraph/nx_cugraph.md | 10 +++++----- .../cugraph_benchmarks/nx_cugraph_benchmark.ipynb | 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/cugraph/source/nx_cugraph/nx_cugraph.md b/docs/cugraph/source/nx_cugraph/nx_cugraph.md index ece7221099..75a30b0be5 100644 --- a/docs/cugraph/source/nx_cugraph/nx_cugraph.md +++ b/docs/cugraph/source/nx_cugraph/nx_cugraph.md @@ -6,12 +6,12 @@ backend](), users can see significant speedup on workflows where performance is hindered by the default NetworkX implementation. With nx-cugraph, users can have GPU-based, large-scale performance without changing their familiar and easy-to-use NetworkX code. -Let's look at some examples of algorithm speedups comparing NetworkX with and without GPU accelration using nx-cugraph. +Let's look at some examples of algorithm speedups comparing NetworkX with and without GPU acceleration using nx-cugraph. Each chart has three measurements. * NX - default NetworkX, no GPU acceleration * nx-cugraph - GPU-accelerated NetworkX using nx-cugraph. This involves an internal conversion/transfer of graph data from CPU to GPU memory -* nx-cugraph (preconvert) - GPU-accelerated NetworkX using nx-cugraph with the graph data pre-converted/transfered to GPU +* nx-cugraph (preconvert) - GPU-accelerated NetworkX using nx-cugraph with the graph data pre-converted/transferred to GPU ![Ancestors](../images/ancestors.png) @@ -43,7 +43,7 @@ user@machine:/# ipython bc_demo.ipy You will observe a run time of approximately 7 minutes...more or less depending on your cpu. -Run the command again, this time specifiying cugraph as the NetworkX backend. +Run the command again, this time specifying cugraph as the NetworkX backend. ``` user@machine:/# NETWORKX_BACKEND_PRIORITY=cugraph ipython bc_demo.ipy ``` @@ -53,7 +53,7 @@ user@machine:/# NETWORKX_BACKEND_PRIORITY=cugraph ipython bc_demo.ipy ``` There is also an option to cache the graph conversion to GPU. This can dramatically improve performance when running multiple algorithms on the same graph. ``` -NETWORKX_BACKEND_PRIORITY=cugraph CACHE_CONVERTED_GRAPHS=True ipython bc_demo.ipy +NETWORKX_BACKEND_PRIORITY=cugraph NETWORKX_CACHE_CONVERTED_GRAPHS=True ipython bc_demo.ipy ``` When running Python interactively, the cugraph backend can be specified as an argument in the algorithm call. @@ -64,4 +64,4 @@ nx.betweenness_centrality(cit_patents_graph, k=k, backend="cugraph") ``` -The latest list of algorithms supported by nx-cuGraph can be found [here](https://github.com/rapidsai/cugraph/blob/main/python/nx-cugraph/README.md#algorithms). +The latest list of algorithms supported by nx-cugraph can be found [here](https://github.com/rapidsai/cugraph/blob/main/python/nx-cugraph/README.md#algorithms). diff --git a/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb b/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb index c4ec9c0131..b1f643403f 100644 --- a/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb +++ b/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb @@ -4,9 +4,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Benchmarking Performance of NetworkX with and without the RAPIDS GPU-based nx_cugraph backend\n", + "# Benchmarking Performance of NetworkX with and without the RAPIDS GPU-based nx-cugraph backend\n", "# Skip notebook test\n", - "This notebook collects the run times with and without the nx_cugraph backend and graph caching enabled for three popular NetworkX algorithms: Betweenness Centrality, Breadth First Search, and Louvain Community Detection.\n", + "This notebook collects the run times with and without the nx-cugraph backend and graph caching enabled for three popular NetworkX algorithms: Betweenness Centrality, Breadth First Search, and Louvain Community Detection.\n", "\n", "In this notebook, enabling the nx-cugraph backend will be done using the [NetworkX config API](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig) (which requires NetworkX 3.3 or later)\n", "\n", From 1bf6eba9cf77dd7f0f4c599fcf68faa325d932da Mon Sep 17 00:00:00 2001 From: Rick Ratzel Date: Mon, 1 Jul 2024 12:47:58 -0500 Subject: [PATCH 4/6] Moved thriftpy2 pin to anchor. --- dependencies.yaml | 8 ++++---- python/cugraph-service/client/pyproject.toml | 2 +- python/cugraph-service/server/pyproject.toml | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/dependencies.yaml b/dependencies.yaml index c37d208077..135e7ccd1c 100644 --- a/dependencies.yaml +++ b/dependencies.yaml @@ -495,7 +495,9 @@ dependencies: common: - output_types: [conda, pyproject] packages: - - &thrift thriftpy2 + # this thriftpy2 entry can be removed entirely (or switched to a '!=') + # once a new release of that project resolves https://github.com/Thriftpy/thriftpy2/issues/281 + - &thrift thriftpy2<=0.5.0 python_run_cugraph_service_server: common: - output_types: [conda, pyproject] @@ -545,9 +547,7 @@ dependencies: - output_types: [conda] packages: - pylibwholegraph==24.8.* - # this thriftpy2 entry can be removed entirely (or switched to a '!=') - # once a new release of that project resolves https://github.com/Thriftpy/thriftpy2/issues/281 - - thriftpy2<=0.5.0 + - *thrift test_python_pylibcugraph: common: - output_types: [conda, pyproject] diff --git a/python/cugraph-service/client/pyproject.toml b/python/cugraph-service/client/pyproject.toml index d8261c38b2..7a0e38728f 100644 --- a/python/cugraph-service/client/pyproject.toml +++ b/python/cugraph-service/client/pyproject.toml @@ -19,7 +19,7 @@ authors = [ license = { text = "Apache 2.0" } requires-python = ">=3.9" dependencies = [ - "thriftpy2", + "thriftpy2<=0.5.0", ] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../../dependencies.yaml and run `rapids-dependency-file-generator`. classifiers = [ "Intended Audience :: Developers", diff --git a/python/cugraph-service/server/pyproject.toml b/python/cugraph-service/server/pyproject.toml index d953d263af..fa89d6db43 100644 --- a/python/cugraph-service/server/pyproject.toml +++ b/python/cugraph-service/server/pyproject.toml @@ -29,7 +29,7 @@ dependencies = [ "numpy>=1.23,<2.0a0", "rapids-dask-dependency==24.8.*", "rmm==24.8.*", - "thriftpy2", + "thriftpy2<=0.5.0", "ucx-py==0.39.*", ] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../../dependencies.yaml and run `rapids-dependency-file-generator`. classifiers = [ From a6fdb26d8cf8dfecabc4d55069a61a831b682943 Mon Sep 17 00:00:00 2001 From: Rick Ratzel Date: Mon, 1 Jul 2024 13:19:26 -0500 Subject: [PATCH 5/6] Fixes more inconsistencies, adds links to NX docs. --- .../nx_cugraph_benchmark.ipynb | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb b/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb index b1f643403f..5611365aa6 100644 --- a/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb +++ b/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb @@ -6,16 +6,12 @@ "source": [ "# Benchmarking Performance of NetworkX with and without the RAPIDS GPU-based nx-cugraph backend\n", "# Skip notebook test\n", - "This notebook collects the run times with and without the nx-cugraph backend and graph caching enabled for three popular NetworkX algorithms: Betweenness Centrality, Breadth First Search, and Louvain Community Detection.\n", + "This notebook collects the run-times with and without the nx-cugraph backend and graph caching enabled for three popular NetworkX algorithms: Betweenness Centrality, Breadth First Search, and Louvain Community Detection.\n", "\n", "In this notebook, enabling the nx-cugraph backend will be done using the [NetworkX config API](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig) (which requires NetworkX 3.3 or later)\n", "\n", "\n", - "They can be set at the command line as well.\n", - "\n", - "### See this example from GTC Spring 2024\n", - "\n", - "\n", + "They can be set at the command-line as well.\n", "\n", "Here is a sample minimal script to demonstrate no-code-change GPU acceleration using nx-cugraph.\n", "\n", @@ -88,10 +84,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This is boiler plate NetworkX code to run:\n", - "* betweenness Centrality\n", - "* Breadth first Search\n", - "* Louvain community detection\n", + "This is idiomatic NetworkX code to run:\n", + "* [Betweenness Centrality](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html)\n", + "* [Breadth First Search](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.traversal.breadth_first_search.bfs_tree.html)\n", + "* [Louvain Community Detection](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.community.louvain.louvain_communities.html)\n", "\n", "and report times. This code does not require modification to use with nx-cugraph.\n" ] @@ -199,7 +195,7 @@ "source": [ "%%time\n", "run_algos(G)\n", - "print (\"Total Algorithm run time\")" + "print (\"Total Algorithm run-time\")" ] }, { From 173bac37b9814f9e6ee5a4ea754f17684755f2a4 Mon Sep 17 00:00:00 2001 From: Rick Ratzel Date: Mon, 1 Jul 2024 22:27:08 -0500 Subject: [PATCH 6/6] Refactors notebook for clarity: cleans up code cells, more changes for consistency, includes cell outputs to show benchmark comparison, rewrites explanations in markdown cells. --- .../nx_cugraph_benchmark.ipynb | 271 +++++++++++++----- 1 file changed, 200 insertions(+), 71 deletions(-) diff --git a/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb b/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb index 5611365aa6..bc57947f20 100644 --- a/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb +++ b/notebooks/cugraph_benchmarks/nx_cugraph_benchmark.ipynb @@ -4,14 +4,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Benchmarking Performance of NetworkX with and without the RAPIDS GPU-based nx-cugraph backend\n", - "# Skip notebook test\n", - "This notebook collects the run-times with and without the nx-cugraph backend and graph caching enabled for three popular NetworkX algorithms: Betweenness Centrality, Breadth First Search, and Louvain Community Detection.\n", + "# Benchmarking Performance of NetworkX without and with the RAPIDS GPU-based nx-cugraph backend\n", "\n", - "In this notebook, enabling the nx-cugraph backend will be done using the [NetworkX config API](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig) (which requires NetworkX 3.3 or later)\n", - "\n", - "\n", - "They can be set at the command-line as well.\n", + "This notebook collects the run-times without and with the nx-cugraph backend enabled for three popular NetworkX algorithms: Betweenness Centrality, Breadth First Search, and Louvain Community Detection.\n", "\n", "Here is a sample minimal script to demonstrate no-code-change GPU acceleration using nx-cugraph.\n", "\n", @@ -51,14 +46,13 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ + "import os\n", "import pandas as pd\n", - "import networkx as nx\n", - "import time\n", - "import os" + "import networkx as nx" ] }, { @@ -70,7 +64,7 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": {}, "outputs": [], "source": [ @@ -84,124 +78,259 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "This is idiomatic NetworkX code to run:\n", + "Download a patent citation dataset containing 3774768 nodes and 16518948 edges and loads it into a NetworkX graph." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "File ./data/cit-Patents.csv not found, downloading https://data.rapids.ai/cugraph/datasets/cit-Patents.csv\n" + ] + } + ], + "source": [ + "filepath = \"./data/cit-Patents.csv\"\n", + "\n", + "if os.path.exists(filepath):\n", + " url = filepath\n", + "else:\n", + " url = \"https://data.rapids.ai/cugraph/datasets/cit-Patents.csv\"\n", + " print(f\"File {filepath} not found, downloading {url}\")\n", + "\n", + "df = pd.read_csv(url, sep=\" \", names=[\"src\", \"dst\"], dtype=\"int32\")\n", + "G = nx.from_pandas_edgelist(df, source=\"src\", target=\"dst\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Define a function that can be used to run various NetworkX algorithms on the Graph created above. This can be used to compare run-times for NetworkX both without `nx-cugraph` and with `nx-cugraph` enabled.\n", + "\n", + "The following NetworkX calls will be run:\n", "* [Betweenness Centrality](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.centrality.betweenness_centrality.html)\n", "* [Breadth First Search](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.traversal.breadth_first_search.bfs_tree.html)\n", "* [Louvain Community Detection](https://networkx.org/documentation/stable/reference/algorithms/generated/networkx.algorithms.community.louvain.louvain_communities.html)\n", "\n", - "and report times. This code does not require modification to use with nx-cugraph.\n" + "This code does not require modification to use with nx-cugraph and can be used with NetworkX as-is even when no backends are installed." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ - "def run_algos(G):\n", - " starttime = time.time()\n", - " result = nx.betweenness_centrality(G, k=10)\n", - " print (\"Betweenness Centrality time: \" + str(round(time.time() - starttime))+ \" seconds\")\n", - " starttime = time.time()\n", - " result = nx.bfs_tree(G,source=1)\n", - " print (\"Breadth First Search time: \" + str(round(time.time() - starttime))+ \" seconds\")\n", - " starttime = time.time()\n", - " result = nx.community.louvain_communities(G,threshold=1e-04)\n", - " print (\"Louvain time: \" + str(round(time.time() - starttime))+ \" seconds\")\n", - " return" + "def run_algos():\n", + " print(\"\\nRunning Betweenness Centrality...\")\n", + " %time nx.betweenness_centrality(G, k=10)\n", + "\n", + " print(\"\\nRunning Breadth First Search (bfs_edges)...\")\n", + " %time list(nx.bfs_edges(G, source=1)) # yields individual edges, use list() to force the full computation\n", + "\n", + " print(\"\\nRunning Louvain...\")\n", + " %time nx.community.louvain_communities(G, threshold=1e-04)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Downloads a patent citation dataset containing 3774768 nodes and 16518948 edges and loads it into a NetworkX graph." + "## NetworkX (no backend) Benchmark Runs\n", + "**_NOTE: NetworkX benchmarks without a backend for the graph used in this notebook can take very long time. Using a Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz with 45GB of memory, the three algo runs took approximately 50 minutes._**" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Running Betweenness Centrality...\n", + "CPU times: user 7min 47s, sys: 5.61 s, total: 7min 53s\n", + "Wall time: 7min 52s\n", + "\n", + "Running Breadth First Search (bfs_edges)...\n", + "CPU times: user 28.9 s, sys: 336 ms, total: 29.2 s\n", + "Wall time: 29.1 s\n", + "\n", + "Running Louvain...\n", + "CPU times: user 42min 46s, sys: 4.8 s, total: 42min 51s\n", + "Wall time: 42min 50s\n" + ] + } + ], "source": [ - "filepath = \"./data/cit-Patents.csv\"\n", - "\n", - "if os.path.exists(filepath):\n", - " print(\"File found\")\n", - " url = filepath\n", - "else:\n", - " url = \"https://data.rapids.ai/cugraph/datasets/cit-Patents.csv\"\n", - "df = pd.read_csv(url, sep=\" \", names=[\"src\", \"dst\"], dtype=\"int32\")\n", - "G = nx.from_pandas_edgelist(df, source=\"src\", target=\"dst\")" + "run_algos()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Setting the NetworkX dispatcher with an environment variable or in code using the NetworkX config API ([NetworkX 3.3+](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig)).\n", - "\n", - "This example using an AMD Ryzen Threadripper PRO 3975WX 32-Cores CPU completed in slightly over 40 minutes." + "## NetworkX with `nx-cugraph` Benchmark Runs\n", + "Use the `nx.config` API introduced in ([NetworkX 3.3](https://networkx.org/documentation/stable/reference/backends.html#networkx.utils.configs.NetworkXConfig)) to configure NetworkX to use nx-cugraph. Both options used below can also be set using environment variables." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "metadata": {}, "outputs": [], "source": [ - "use_cugraph = True\n", - "cache_graph = True" + "# Set the prioritized list of backends to automatically try. If none of the backends in the list\n", + "# support the algorithm, NetworkX will use the default implementation).\n", + "#\n", + "# This can also be set using the environment variable NETWORKX_BACKEND_PRIORITY which accepts a\n", + "# comma-separated list.\n", + "nx.config.backend_priority = [\"cugraph\"] # Try the \"cugraph\" (nx-cugraph) backend first, then\n", + " # fall back to NetworkX\n", + "#nx.config.backend_priority = [] # Do not use any backends\n", + "\n", + "# Enable caching of graph conversions. When set to False (the default) nx-cugraph will convert\n", + "# the CPU-based NetworkX graph object to a nx-cugraph GPU-based graph object each time an algorithm\n", + "# is run. When True, the conversion will happen once and be saved for future use *if* the graph has\n", + "# not been modified via a supported method such as G.add_edge(u, v, weight=val)\n", + "#\n", + "# This can also be set using the environment variable NETWORKX_CACHE_CONVERTED_GRAPHS\n", + "nx.config.cache_converted_graphs = True\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note the warning message NetworkX generates to remind us a cached graph should not be manually mutated. This is shown because caching was enabled, and the initial call resulted in a cached graph conversion for use with subsequent nx-cugraph calls.**" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Running Betweenness Centrality...\n", + "CPU times: user 17.9 s, sys: 1.5 s, total: 19.4 s\n", + "Wall time: 19.1 s\n", + "\n", + "Running Breadth First Search (bfs_edges)...\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/conda/lib/python3.10/site-packages/networkx/utils/backends.py:1101: UserWarning: Using cached graph for 'cugraph' backend in call to bfs_edges.\n", + "\n", + "For the cache to be consistent (i.e., correct), the input graph must not have been manually mutated since the cached graph was created. Examples of manually mutating the graph data structures resulting in an inconsistent cache include:\n", + "\n", + " >>> G[u][v][key] = val\n", + "\n", + "and\n", + "\n", + " >>> for u, v, d in G.edges(data=True):\n", + " ... d[key] = val\n", + "\n", + "Using methods such as `G.add_edge(u, v, weight=val)` will correctly clear the cache to keep it consistent. You may also use `G.__networkx_cache__.clear()` to manually clear the cache, or set `G.__networkx_cache__` to None to disable caching for G. Enable or disable caching via `nx.config.cache_converted_graphs` config.\n", + " warnings.warn(warning_message)\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 50.5 s, sys: 589 ms, total: 51 s\n", + "Wall time: 50.7 s\n", + "\n", + "Running Louvain...\n", + "CPU times: user 27.4 s, sys: 3.36 s, total: 30.7 s\n", + "Wall time: 30.6 s\n" + ] + } + ], "source": [ - "if use_cugraph:\n", - " nx.config[\"backend_priority\"]=['cugraph']\n", - "else:\n", - " # Use this setting to use the default NetworkX implementation.\n", - " nx.config[\"backend_priority\"]=[]\n", - "if cache_graph:\n", - " nx.config[\"cache_converted_graphs\"]= True\n", - "else:\n", - " # Use this setting to disable caching of graph conversions. This will require nx-cugraph to convert and transfer the native CPU-based graph object to the GPU each time an algorithm is run.\n", - " nx.config[\"cache_converted_graphs\"]= False\n" + "run_algos()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Run the algorithms on GPU. \n", - "\n", - "**Note the messages NetworkX generates to remind us cached graph shouldn't be modified.**\n", - "\n", - "```\n", - "For the cache to be consistent (i.e., correct), the input graph must not have been manually mutated since the cached graph was created.\n", - "\n", - "Using cached graph for 'cugraph' backend in call to bfs_edges.\n", - "```" + "The Betweenness Centrality call above resulted in a conversion from a NetworkX Graph to a nx-cugraph Graph due to it being the first to use nx-cugraph. However, since caching was enabled, a second call will show the run-time for Betweenness Centrality without the need to convert the graph." ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": {}, - "outputs": [], + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Running Betweenness Centrality (again)...\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/opt/conda/lib/python3.10/site-packages/networkx/utils/backends.py:1128: UserWarning: Using cached graph for 'cugraph' backend in call to betweenness_centrality.\n", + "\n", + "For the cache to be consistent (i.e., correct), the input graph must not have been manually mutated since the cached graph was created. Examples of manually mutating the graph data structures resulting in an inconsistent cache include:\n", + "\n", + " >>> G[u][v][key] = val\n", + "\n", + "and\n", + "\n", + " >>> for u, v, d in G.edges(data=True):\n", + " ... d[key] = val\n", + "\n", + "Using methods such as `G.add_edge(u, v, weight=val)` will correctly clear the cache to keep it consistent. You may also use `G.__networkx_cache__.clear()` to manually clear the cache, or set `G.__networkx_cache__` to None to disable caching for G. Enable or disable caching via `nx.config.cache_converted_graphs` config.\n", + " warnings.warn(warning_message)\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "CPU times: user 1.84 s, sys: 312 ms, total: 2.15 s\n", + "Wall time: 2.12 s\n" + ] + } + ], "source": [ - "%%time\n", - "run_algos(G)\n", - "print (\"Total Algorithm run-time\")" + "print(\"\\nRunning Betweenness Centrality (again)...\")\n", + "%time result = nx.betweenness_centrality(G, k=10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ + "___\n", + "Each user is responsible for checking the content of datasets and the applicable licenses and determining if suitable for the intended use.\n", + "\n", + "Information on the U.S. Patent Citation Network dataset used in this notebook is as follows:\n", + "Authors: Jure Leskovec and Andrej Krevl\n", + "Title: SNAP Datasets, Stanford Large Network Dataset Collection\n", + "URL: http://snap.stanford.edu/data\n", + "Date: June 2014 \n", "___\n", "Copyright (c) 2024, NVIDIA CORPORATION.\n", "\n", @@ -228,7 +357,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.8" + "version": "3.10.14" } }, "nbformat": 4,