Skip to content

Commit b59bc33

Browse files
caic99wanghan-iapcm
authored andcommitted
docs: add PyTorch Profiler support details to TensorBoard documentation (#4615)
This PR adds instructions on how to profile with the PyTorch backend. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Documentation** - Added a new section detailing the integration of PyTorch Profiler with TensorBoard. - Provided clear instructions on package installation, configuration adjustments, and how to visualize profiling data. - Enhanced the readability of commands and the overall formatting of the training documentation. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Han Wang <[email protected]> (cherry picked from commit 80d445b)
1 parent 6d6c3fd commit b59bc33

File tree

1 file changed

+34
-25
lines changed

1 file changed

+34
-25
lines changed

doc/train/tensorboard.md

Lines changed: 34 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -26,42 +26,51 @@ Before running TensorBoard, make sure you have generated summary data in a log
2626
directory by modifying the input script, setting {ref}`tensorboard <training/tensorboard>` to true in the training subsection will enable the TensorBoard data analysis. eg. **water_se_a.json**.
2727

2828
```json
29-
"training" : {
30-
"systems": ["../data/"],
31-
"stop_batch": 1000000,
32-
"batch_size": 1,
33-
34-
"seed": 1,
35-
36-
"_comment": " display and restart",
37-
"_comment": " frequencies counted in batch",
38-
"disp_file": "lcurve.out",
39-
"disp_freq": 100,
40-
"numb_test": 10,
41-
"save_freq": 1000,
42-
"save_ckpt": "model.ckpt",
43-
44-
"disp_training":true,
45-
"time_training":true,
46-
"tensorboard": true,
47-
"tensorboard_log_dir":"log",
48-
"tensorboard_freq": 1000,
49-
"profiling": false,
50-
"profiling_file":"timeline.json",
51-
"_comment": "that's all"
52-
}
29+
"training": {
30+
"systems": ["../data/"],
31+
"stop_batch": 1000000,
32+
"batch_size": 1,
33+
34+
"seed": 1,
35+
"_comment": " display and restart",
36+
"_comment": " frequencies counted in batch",
37+
"disp_file": "lcurve.out",
38+
"disp_freq": 100,
39+
"numb_test": 10,
40+
"save_freq": 1000,
41+
"save_ckpt": "model.ckpt",
42+
43+
"disp_training": true,
44+
"time_training": true,
45+
"tensorboard": true,
46+
"tensorboard_log_dir": "log",
47+
"tensorboard_freq": 1000,
48+
"profiling": false,
49+
"profiling_file": "timeline.json",
50+
"_comment": "that's all"
51+
}
5352
```
5453

5554
Once you have event files, run TensorBoard and provide the log directory. This
5655
should print that TensorBoard has started. Next, connect to http://tensorboard_server_ip:6006.
5756

58-
TensorBoard requires a logdir to read logs from. For info on configuring TensorBoard, run TensorBoard --help.
57+
TensorBoard requires a logdir to read logs from. For info on configuring TensorBoard, run `tensorboard --help`.
5958
One can easily change the log name with "tensorboard_log_dir" and the sampling frequency with "tensorboard_freq".
6059

6160
```bash
6261
tensorboard --logdir path/to/logs
6362
```
6463

64+
## PyTorch Profiler With TensorBoard {{ pytorch_icon }}
65+
66+
DeePMD-kit has a built-in support for [PyTorch Profiler](https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html#use-profiler-to-record-execution-events).
67+
The profiler requires extra packages for recording and visualization:
68+
`pip install tensorboard torch-tb-profiler`
69+
Set `"enable_profiler": true` in the training section of the input script, and launch a training task with 10 steps, since the default setting of the profiler scheduler is `wait=1, warmup=1, active=3, repeat=1`.
70+
The profiler will generate recording files in `tensorboard_log_dir`.
71+
72+
To [visualize the profiling data](https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html#use-tensorboard-to-view-results-and-analyze-model-performance), launch TensorBoard (see above) and navigate to the "pytorch_profiler" tab.
73+
6574
## Examples
6675

6776
### Tracking and visualizing loss metrics(red:train, blue:test)

0 commit comments

Comments
 (0)