add `ggml_backend_sched_dump_dot` #10825

foldl · 2024-12-14T10:27:07Z

This PR add a DOT dump function to sched. Comparing to the existing ggml_graph_dump_dot, this function:

backend and buft are color coded into the background of nodes.
Shows graph splits.

A demo of the usefulness: In this graph, we can find there is something abnormal at a glance. This is caused by ggml_rms_norm_inplace for input layer norm, which is probably an error in the scheduler.

Note:

ggml_graph_get_grad is also fixed for cgraph->grads is NULL.
define GGML_DOT_FULL_COLOR for full color, otherwise, a color scheme is used.

PS: Someone might say that the dumped graph is too large to be rendered. In my case (chatllm.cpp), I use --layer_spec to load only 2 or 3 layers.

ggml/src/ggml-backend.cpp

slaren · 2024-12-15T01:20:04Z

The discrepancy between buffer type and backend may be caused by a few reasons. ggml_backend_sched ignores non-executable view ops, so they may end with random assignments that are not relevant while running the graph. Copies of tensors in the splits do not have assignments in the hash table at all, since they are implicitly allocated in the split backend.

Fundamentally, ggml_backend_sched does not work on the graph directly, it works on the list of nodes that are the topological representation of the graph. Thus, I do not think that trying to reason about it by representing what it does as a graph is going to be useful. The best way to understand what ggml_backend_sched is doing is by looking at the list of nodes, which can be obtained by setting the environment variable GGML_SCHED_DEBUG to 2. I am afraid that this is just going to result to misunderstandings about what is actually happening in ggml_backend_sched, and lead to false bug reports like the supposed problem that you found.

foldl · 2024-12-15T11:03:36Z

Update:

Color is deduced from index;
Hue is used for GGML_DOT_FULL_COLOR.

foldl · 2024-12-15T11:13:03Z

... lead to false bug reports like the supposed problem that you found.

The issue is:

ggml API ggml_backend_sched_set_tensor_backend is not used correctly, or
xxx_inplace, view, or reshape operators are not handled properly.

I used this to identify the issue, updated my code (chatllm.cpp), and things worked.

This proposed function is not for debugging the scheduler, but for visualization of graph splits and backends. It may also help for debugging.

add ggml_backend_sched_dump_dot

12d8cd6

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Dec 14, 2024

fix warnings; remove ggml_backend_sched_splits_fdump_dot.

504121e

ngxson reviewed Dec 14, 2024

View reviewed changes

ggml/src/ggml-backend.cpp Outdated Show resolved Hide resolved

use id for color; simple_hash removed.

39f8347

Judd and others added 2 commits December 15, 2024 20:01

use std::string instead of static char

6ee7599

Merge branch 'master' into add_sched_dot_dump

d7de64b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `ggml_backend_sched_dump_dot` #10825

add `ggml_backend_sched_dump_dot` #10825

foldl commented Dec 14, 2024

slaren commented Dec 15, 2024

foldl commented Dec 15, 2024 •

edited

Loading

foldl commented Dec 15, 2024

add ggml_backend_sched_dump_dot #10825

Are you sure you want to change the base?

add ggml_backend_sched_dump_dot #10825

Conversation

foldl commented Dec 14, 2024

slaren commented Dec 15, 2024

foldl commented Dec 15, 2024 • edited Loading

foldl commented Dec 15, 2024

add `ggml_backend_sched_dump_dot` #10825

add `ggml_backend_sched_dump_dot` #10825

foldl commented Dec 15, 2024 •

edited

Loading