You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: add PyTorch Profiler support details to TensorBoard documentation (#4615)
This PR adds instructions on how to profile with the PyTorch backend.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **Documentation**
- Added a new section detailing the integration of PyTorch Profiler with
TensorBoard.
- Provided clear instructions on package installation, configuration
adjustments, and how to visualize profiling data.
- Enhanced the readability of commands and the overall formatting of the
training documentation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Co-authored-by: Han Wang <[email protected]>
(cherry picked from commit 80d445b)
Copy file name to clipboardExpand all lines: doc/train/tensorboard.md
+34-25Lines changed: 34 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,42 +26,51 @@ Before running TensorBoard, make sure you have generated summary data in a log
26
26
directory by modifying the input script, setting {ref}`tensorboard <training/tensorboard>` to true in the training subsection will enable the TensorBoard data analysis. eg. **water_se_a.json**.
27
27
28
28
```json
29
-
"training" : {
30
-
"systems": ["../data/"],
31
-
"stop_batch": 1000000,
32
-
"batch_size": 1,
33
-
34
-
"seed": 1,
35
-
36
-
"_comment": " display and restart",
37
-
"_comment": " frequencies counted in batch",
38
-
"disp_file": "lcurve.out",
39
-
"disp_freq": 100,
40
-
"numb_test": 10,
41
-
"save_freq": 1000,
42
-
"save_ckpt": "model.ckpt",
43
-
44
-
"disp_training":true,
45
-
"time_training":true,
46
-
"tensorboard": true,
47
-
"tensorboard_log_dir":"log",
48
-
"tensorboard_freq": 1000,
49
-
"profiling": false,
50
-
"profiling_file":"timeline.json",
51
-
"_comment": "that's all"
52
-
}
29
+
"training": {
30
+
"systems": ["../data/"],
31
+
"stop_batch": 1000000,
32
+
"batch_size": 1,
33
+
34
+
"seed": 1,
35
+
"_comment": " display and restart",
36
+
"_comment": " frequencies counted in batch",
37
+
"disp_file": "lcurve.out",
38
+
"disp_freq": 100,
39
+
"numb_test": 10,
40
+
"save_freq": 1000,
41
+
"save_ckpt": "model.ckpt",
42
+
43
+
"disp_training": true,
44
+
"time_training": true,
45
+
"tensorboard": true,
46
+
"tensorboard_log_dir": "log",
47
+
"tensorboard_freq": 1000,
48
+
"profiling": false,
49
+
"profiling_file": "timeline.json",
50
+
"_comment": "that's all"
51
+
}
53
52
```
54
53
55
54
Once you have event files, run TensorBoard and provide the log directory. This
56
55
should print that TensorBoard has started. Next, connect to http://tensorboard_server_ip:6006.
57
56
58
-
TensorBoard requires a logdir to read logs from. For info on configuring TensorBoard, run TensorBoard --help.
57
+
TensorBoard requires a logdir to read logs from. For info on configuring TensorBoard, run `tensorboard --help`.
59
58
One can easily change the log name with "tensorboard_log_dir" and the sampling frequency with "tensorboard_freq".
60
59
61
60
```bash
62
61
tensorboard --logdir path/to/logs
63
62
```
64
63
64
+
## PyTorch Profiler With TensorBoard {{ pytorch_icon }}
65
+
66
+
DeePMD-kit has a built-in support for [PyTorch Profiler](https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html#use-profiler-to-record-execution-events).
67
+
The profiler requires extra packages for recording and visualization:
68
+
`pip install tensorboard torch-tb-profiler`
69
+
Set `"enable_profiler": true` in the training section of the input script, and launch a training task with 10 steps, since the default setting of the profiler scheduler is `wait=1, warmup=1, active=3, repeat=1`.
70
+
The profiler will generate recording files in `tensorboard_log_dir`.
71
+
72
+
To [visualize the profiling data](https://pytorch.org/tutorials/intermediate/tensorboard_profiler_tutorial.html#use-tensorboard-to-view-results-and-analyze-model-performance), launch TensorBoard (see above) and navigate to the "pytorch_profiler" tab.
73
+
65
74
## Examples
66
75
67
76
### Tracking and visualizing loss metrics(red:train, blue:test)
0 commit comments