From a775248809da829108ecf80cf422212404314fba Mon Sep 17 00:00:00 2001 From: dzier Date: Thu, 1 Oct 2020 10:38:09 -0700 Subject: [PATCH] Update README and NGC versions post-20.09 release --- README.rst | 146 +++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 142 insertions(+), 4 deletions(-) diff --git a/README.rst b/README.rst index 34a4bd5..6080f3c 100644 --- a/README.rst +++ b/README.rst @@ -17,22 +17,160 @@ PyProf - PyTorch Profiling tool =============================== - - **NOTE: You are currently on the r20.09 branch which tracks - stabilization towards the release. This branch is not usable - during stabilization.** .. overview-begin-marker-do-not-remove +PyProf is a tool that profiles and analyzes the GPU performance of PyTorch +models. PyProf aggregates kernel performance from `Nsight Systems +`_ or `NvProf +`_. + +What's New in 3.4.0 +------------------- + +* README and User Guide documentation has been updated with more installation + options and pointers + +Known Issues +------------ + +* Forward-Backward kernel correlation heuristics do not work correctly with + PyTorch 1.6. Recommended work arounds include: + + * Use with PyTorch 1.5 + * Use DLProf in the `20.09 NGC Pytorch container `_ + +Features +-------- + +* Identifies the layer that launched a kernel: e.g. the association of + `ComputeOffsetsKernel` with a concrete PyTorch layer or API is not obvious. + +* Identifies the tensor dimensions and precision: without knowing the tensor + dimensions and precision, it's impossible to reason about whether the actual + (silicon) kernel time is close to maximum performance of such a kernel on + the GPU. Knowing the tensor dimensions and precision, we can figure out the + FLOPs and bandwidth required by a layer, and then determine how close to + maximum performance the kernel is for that operation. + +* Forward-backward correlation: PyProf determines what the forward pass step + is that resulted in the particular weight and data gradients (wgrad, dgrad), + which makes it possible to determine the tensor dimensions required by these + backprop steps to assess their performance. + +* Determines Tensor Core usage: PyProf can highlight the kernels that use + `Tensor Cores `_. + +* Correlate the line in the user's code that launched a particular kernel (program trace). + .. overview-end-marker-do-not-remove +The current release of PyProf is 3.4.0 and is available in the 20.09 release of +the PyTorch container on `NVIDIA GPU Cloud (NGC) `_. The +branch for this release is `r20.09 +`_. + +Quick Installation Instructions +------------------------------- + .. quick-install-start-marker-do-not-remove +* Clone the git repository :: + + $ git clone https://github.com/NVIDIA/PyProf.git + +* Navigate to the top level PyProf directory + +* Install PyProf :: + + $ pip install . + +* Verify installation is complete with pip list :: + + $ pip list | grep pyprof + +* Should display :: + + pyprof 3.4.0 + .. quick-install-end-marker-do-not-remove +Quick Start Instructions +------------------------ + .. quick-start-start-marker-do-not-remove +* Add the following lines to the PyTorch network you want to profile: :: + + import torch.cuda.profiler as profiler + import pyprof + pyprof.init() + +* Profile with NVProf or Nsight Systems to generate a SQL file. :: + + $ nsys profile -f true -o net --export sqlite python net.py + +* Run the parse.py script to generate the dictionary. :: + + $ python -m pyprof.parse net.sqlite > net.dict + +* Run the prof.py script to generate the reports. :: + + $ python -m pyprof.prof --csv net.dict + .. quick-start-end-marker-do-not-remove +Documentation +------------- + +The User Guide can be found in the +`documentation for current release +`_, and +provides instructions on how to install and profile with PyProf. + +A complete `Quick Start Guide `_ +provides step-by-step instructions to get you quickly started using PyProf. + +An `FAQ `_ provides +answers for frequently asked questions. + +The `Release Notes +`_ +indicate the required versions of the NVIDIA Driver and CUDA, and also describe +which GPUs are supported by PyProf + +Presentation and Papers +^^^^^^^^^^^^^^^^^^^^^^^ + +* `Automating End-toEnd PyTorch Profiling `_. + * `Presentation slides `_. + +Contributing +------------ + +Contributions to PyProf are more than welcome. To +contribute make a pull request and follow the guidelines outlined in +the `Contributing `_ document. + +Reporting problems, asking questions +------------------------------------ + +We appreciate any feedback, questions or bug reporting regarding this +project. When help with code is needed, follow the process outlined in +the Stack Overflow (https://stackoverflow.com/help/mcve) +document. Ensure posted examples are: + +* minimal – use as little code as possible that still produces the + same problem + +* complete – provide all parts needed to reproduce the problem. Check + if you can strip external dependency and still show the problem. The + less time we spend on reproducing problems the more time we have to + fix it + +* verifiable – test the code you're about to provide to make sure it + reproduces the problem. Remove all other problems that are not + related to your request/question. + .. |License| image:: https://img.shields.io/badge/License-Apache2-green.svg :target: http://www.apache.org/licenses/LICENSE-2.0