Order of announcing executed instructions #702

xThaid · 2024-05-10T16:00:50Z

While working on #699 I found that increasing the size of the instruction buffer causes performance loss on a benchmark.

Here's what happened. Increased size of the instruction buffer caused higher utilization of backend (in particular ROB). Thus, it is more likely that two instructions will be ready to announce at the same time and marked as done in ROB. However, currently we collect finished instructions from FUs with no specific order. And this is the reason why we get the slowdown in crc32.

Specifically, at some point two instructions are ready to be collected and the newer (with higher ROB id) is selected and marked as done in ROB. Therefore in the next cycle we cannot retire the older instruction yet (because it is not marked as done yet) and we lose one cycle. This happens every program loop iteration and in total results in many wasted cycles.
With lower backend utilization, the instruction causing the problem wouldn't be ready to execute yet and we would retire the oldest instruction as soon as possible.

Choosing the order of the instructions we want to announce doesn't seem to be a trivial task -- sounds like a scheduling problem and we don't have full information about future instructions. I think we should simply choose the oldest instruction. By doing that we are releasing the resources as soon as possible and making space for further instructions.

The only small issue is that currently we cannot reliably tell which ROB id is older - ROB id is a circular pointer. Either we add one bit to every ROB id or we do some heuristics like a < b iff (b - a) % rob_size < rob_size / 2 (which will work with the assumption that instructions with ROB ids different more than rob_size / 2 will never be ready to announce at the same time).

What do you think about this problem?

The text was updated successfully, but these errors were encountered:

Kristopher38 · 2024-05-11T09:12:17Z

You can reliably tell which rob id is older if you know the start and end indices of its circular buffer. A nonexclusive method to get them, get_indices, already exists in the rob.

tilk · 2024-05-13T09:03:31Z

Two thoughts:

This might be partially related to the RS select order - a bad order of selecting instructions from RS can impact performance.
Announcement needs to be reworked later for superscalarity support. Superscalarity might fix this performance issue in some cases.

All in all, I think this shouldn't be our priority now.

xThaid added the bug Something isn't working label May 10, 2024

xThaid mentioned this issue May 10, 2024

Move CoreCounter behind the frontend #699

Merged

tilk added optimization This is *just* an optimization! and removed bug Something isn't working labels May 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Order of announcing executed instructions #702

Order of announcing executed instructions #702

xThaid commented May 10, 2024

Kristopher38 commented May 11, 2024

tilk commented May 13, 2024

Order of announcing executed instructions #702

Order of announcing executed instructions #702

Comments

xThaid commented May 10, 2024

Kristopher38 commented May 11, 2024

tilk commented May 13, 2024