-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What's the difference between active_bytes
and reserved_bytes
?
#47
Comments
active_bytes
and reserved_bytes
?
PyTorch caches CUDA memory to prevent repeated memory allocatation cost, you can get more information here: https://pytorch.org/docs/stable/notes/cuda.html#cuda-memory-management In your case, the reserved bytes should be peak memory usage before |
## VGG.forward
active_bytes reserved_bytes line code
all all
peak peak
5.71G 10.80G 50 @profile
51 def forward(self, x):
3.86G 8.77G 52 out = self.features(x)
2.19G 8.77G 53 out = self.classifier(out)
2.19G 8.77G 54 return out @Stonesjtu Could you help me re-check the code above: I checkpointed the Q1: Do you know how to explain this: If I keep the same batch-size, but change how I partition the I also have two additional lines printed by the following code before the stats above printed:
which are generated by the code appended below. Q2: So how to explain the # compute output
if i < 1:
torch.cuda.reset_peak_memory_stats()
output = model(images)
loss = criterion(output, target)
if i < 1:
print('Max CUDA memory allocated on forward: ', utils.readable_size(torch.cuda.max_memory_allocated()))
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.detach().item(), images.size(0))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))
# compute gradient and do SGD step
if i < 1:
torch.cuda.reset_peak_memory_stats()
optimizer.zero_grad()
loss.backward()
optimizer.step()
if i < 1:
print('Max CUDA memory allocated on backward: ', utils.readable_size(torch.cuda.max_memory_allocated())) |
The column (or metric) e.g. you have 4
According to the pytorch documentation:
Actually it needs the cached memory at a certain point of execution, but at the time of your |
I need to show that some technique called gradient checkpointing can really save GPU memory usage during backward propagation. When I see the result there are two columns on the left showing
active_bytes
andreserved_bytes
. In my testing, while active bytes read3.83G
, the reserved bytes read9.35G
. So why does PyTorch still reserve that much GPU memory?The text was updated successfully, but these errors were encountered: