From b4a62350d27eaf3e07086866d7bf7dd8599cbac8 Mon Sep 17 00:00:00 2001 From: dzier Date: Tue, 20 Oct 2020 14:09:03 -0700 Subject: [PATCH] Update README and NGC versions post-20.10 release --- README.rst | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 101 insertions(+), 3 deletions(-) diff --git a/README.rst b/README.rst index 9969359..e3399eb 100644 --- a/README.rst +++ b/README.rst @@ -18,13 +18,59 @@ PyProf - PyTorch Profiling tool =============================== - **NOTE: You are currently on the r20.10 branch which tracks stabilization - towards the release. This branch is not usable during stabilization.** - .. overview-begin-marker-do-not-remove +PyProf is a tool that profiles and analyzes the GPU performance of PyTorch +models. PyProf aggregates kernel performance from `Nsight Systems +`_ or `NvProf +`_. + +What's New in 3.5.0 +------------------- +* Nsight System database lookup improved to speed up the runtime profile + analysis time by 50x. + +* Node names will now include class info and can be linked back to the original + Python source. + +Known Issues +------------ +* Forward-Backward kernel correlation heuristics do not work correctly with + PyTorch 1.6. Recommended work arounds include: + + * Use with PyTorch 1.5 + * Use DLProf in the `20.10 NGC Pytorch container `_ + +Features +-------- + +* Identifies the layer that launched a kernel: e.g. the association of + `ComputeOffsetsKernel` with a concrete PyTorch layer or API is not obvious. + +* Identifies the tensor dimensions and precision: without knowing the tensor + dimensions and precision, it's impossible to reason about whether the actual + (silicon) kernel time is close to maximum performance of such a kernel on + the GPU. Knowing the tensor dimensions and precision, we can figure out the + FLOPs and bandwidth required by a layer, and then determine how close to + maximum performance the kernel is for that operation. + +* Forward-backward correlation: PyProf determines what the forward pass step + is that resulted in the particular weight and data gradients (wgrad, dgrad), + which makes it possible to determine the tensor dimensions required by these + backprop steps to assess their performance. + +* Determines Tensor Core usage: PyProf can highlight the kernels that use + `Tensor Cores `_. + +* Correlate the line in the user's code that launched a particular kernel (program trace). + .. overview-end-marker-do-not-remove +The current release of PyProf is 3.5.0 and is available in the 20.10 release of +the PyTorch container on `NVIDIA GPU Cloud (NGC) `_. The +branch for this release is `r20.10 +`_. + Quick Installation Instructions ------------------------------- @@ -75,5 +121,57 @@ Quick Start Instructions .. quick-start-end-marker-do-not-remove +Documentation +------------- + +The User Guide can be found in the +`documentation for current release +`_, and +provides instructions on how to install and profile with PyProf. + +A complete `Quick Start Guide `_ +provides step-by-step instructions to get you quickly started using PyProf. + +An `FAQ `_ provides +answers for frequently asked questions. + +The `Release Notes +`_ +indicate the required versions of the NVIDIA Driver and CUDA, and also describe +which GPUs are supported by PyProf + +Presentation and Papers +^^^^^^^^^^^^^^^^^^^^^^^ + +* `Automating End-toEnd PyTorch Profiling `_. + * `Presentation slides `_. + +Contributing +------------ + +Contributions to PyProf are more than welcome. To +contribute make a pull request and follow the guidelines outlined in +the `Contributing `_ document. + +Reporting problems, asking questions +------------------------------------ + +We appreciate any feedback, questions or bug reporting regarding this +project. When help with code is needed, follow the process outlined in +the Stack Overflow (https://stackoverflow.com/help/mcve) +document. Ensure posted examples are: + +* minimal – use as little code as possible that still produces the + same problem + +* complete – provide all parts needed to reproduce the problem. Check + if you can strip external dependency and still show the problem. The + less time we spend on reproducing problems the more time we have to + fix it + +* verifiable – test the code you're about to provide to make sure it + reproduces the problem. Remove all other problems that are not + related to your request/question. + .. |License| image:: https://img.shields.io/badge/License-Apache2-green.svg :target: http://www.apache.org/licenses/LICENSE-2.0