You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While working on #699 I found that increasing the size of the instruction buffer causes performance loss on a benchmark.
Here's what happened. Increased size of the instruction buffer caused higher utilization of backend (in particular ROB). Thus, it is more likely that two instructions will be ready to announce at the same time and marked as done in ROB. However, currently we collect finished instructions from FUs with no specific order. And this is the reason why we get the slowdown in crc32.
Specifically, at some point two instructions are ready to be collected and the newer (with higher ROB id) is selected and marked as done in ROB. Therefore in the next cycle we cannot retire the older instruction yet (because it is not marked as done yet) and we lose one cycle. This happens every program loop iteration and in total results in many wasted cycles.
With lower backend utilization, the instruction causing the problem wouldn't be ready to execute yet and we would retire the oldest instruction as soon as possible.
Choosing the order of the instructions we want to announce doesn't seem to be a trivial task -- sounds like a scheduling problem and we don't have full information about future instructions. I think we should simply choose the oldest instruction. By doing that we are releasing the resources as soon as possible and making space for further instructions.
The only small issue is that currently we cannot reliably tell which ROB id is older - ROB id is a circular pointer. Either we add one bit to every ROB id or we do some heuristics like a < b iff (b - a) % rob_size < rob_size / 2 (which will work with the assumption that instructions with ROB ids different more than rob_size / 2 will never be ready to announce at the same time).
What do you think about this problem?
The text was updated successfully, but these errors were encountered:
You can reliably tell which rob id is older if you know the start and end indices of its circular buffer. A nonexclusive method to get them, get_indices, already exists in the rob.
While working on #699 I found that increasing the size of the instruction buffer causes performance loss on a benchmark.
Here's what happened. Increased size of the instruction buffer caused higher utilization of backend (in particular ROB). Thus, it is more likely that two instructions will be ready to announce at the same time and marked as done in ROB. However, currently we collect finished instructions from FUs with no specific order. And this is the reason why we get the slowdown in crc32.
Specifically, at some point two instructions are ready to be collected and the newer (with higher ROB id) is selected and marked as done in ROB. Therefore in the next cycle we cannot retire the older instruction yet (because it is not marked as done yet) and we lose one cycle. This happens every program loop iteration and in total results in many wasted cycles.
With lower backend utilization, the instruction causing the problem wouldn't be ready to execute yet and we would retire the oldest instruction as soon as possible.
Choosing the order of the instructions we want to announce doesn't seem to be a trivial task -- sounds like a scheduling problem and we don't have full information about future instructions. I think we should simply choose the oldest instruction. By doing that we are releasing the resources as soon as possible and making space for further instructions.
The only small issue is that currently we cannot reliably tell which ROB id is older - ROB id is a circular pointer. Either we add one bit to every ROB id or we do some heuristics like
a < b
iff(b - a) % rob_size < rob_size / 2
(which will work with the assumption that instructions with ROB ids different more thanrob_size / 2
will never be ready to announce at the same time).What do you think about this problem?
The text was updated successfully, but these errors were encountered: