-
I am implementing knowledge distillation-based DNN model training, as illustrated in the figure below, to run the teacher and student models (blue and green blocks) in parallel with the same data batch. I've checked some popular repos like NervanaSystems/distiller and peterliht/knowledge-distillation-pytorch. They execute the forward operations of the student and teacher models in sequence (line by line), not in parallel on different devices (GPU or CPU). I am trying to speed up this training process to run the 2 models at the same time using multiple devices (e.g., loading the small, inference-only model on CPU and not interrupting the GPU training of the heavy model). What is the proper way to run 2 models in parallel with Module() API of MXNet 1.x? Should I use Python |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 6 replies
-
Note that the Python line-by-line "execution" just refers to telling the multi-threaded backend that it shall execute an operation. If there are no dependencies between the two operations, the backend will execute them in parallel automatically. |
Beta Was this translation helpful? Give feedback.
-
One of the very common reasons for the serialization between 2 models is workspace usage, which is kind of a hidden dependency. There is an environment variable |
Beta Was this translation helpful? Give feedback.
Note that the Python line-by-line "execution" just refers to telling the multi-threaded backend that it shall execute an operation. If there are no dependencies between the two operations, the backend will execute them in parallel automatically.