Skip to content

Commit

Permalink
8320379: C2: Sort spilling/unspilling sequence for better ld/st mergi…
Browse files Browse the repository at this point in the history
…ng into ldp/stp on AArch64

Macro-assembler on aarch64 can merge adjacent loads or stores
into ldp/stp[1]. For example, it can merge:
```
str     w20, [sp, openjdk#16]
str     w10, [sp, openjdk#20]
```
into
```
stp     w20, w10, [sp, openjdk#16]
```

But C2 may generate a sequence like:
```
str     x21, [sp, openjdk#8]
str     w20, [sp, openjdk#16]
str     x19, [sp, openjdk#24] <---
str     w10, [sp, openjdk#20] <--- Before sorting
str     x11, [sp, openjdk#40]
str     w13, [sp, openjdk#48]
str     x16, [sp, openjdk#56]
```
We can't do any merging for non-adjacent loads or stores.

The patch is to sort the spilling or unspilling sequence in
the order of offset during instruction scheduling and bundling
phase. After that, we can get a new sequence:
```
str     x21, [sp, openjdk#8]
str     w20, [sp, openjdk#16]
str     w10, [sp, openjdk#20] <---
str     x19, [sp, openjdk#24] <--- After sorting
str     x11, [sp, openjdk#40]
str     w13, [sp, openjdk#48]
str     x16, [sp, openjdk#56]
```

Then macro-assembler can do ld/st merging:
```
str     x21, [sp, openjdk#8]
stp     w20, w10, [sp, openjdk#16] <--- Merged
str     x19, [sp, openjdk#24]
str     x11, [sp, openjdk#40]
str     w13, [sp, openjdk#48]
str     x16, [sp, openjdk#56]
```

To justify the patch, we run `HelloWorld.java`
```
public class HelloWorld {
    public static void main(String [] args) {
        System.out.println("Hello World!");
    }
}
```
with `java -Xcomp -XX:-TieredCompilation HelloWorld`.

Before the patch, macro-assembler can do ld/st merging for
3688 times. After the patch, the number of ld/st merging
increases to 3871 times, by ~5 %.

Tested tier1~3 on x86 and AArch64.

[1] https://github.com/openjdk/jdk/blob/a95062b39a431b4937ab6e9e73de4d2b8ea1ac49/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L2079
  • Loading branch information
Faye Gao authored and Fei Gao committed Nov 21, 2023
1 parent 179f505 commit 96646f7
Showing 1 changed file with 41 additions and 3 deletions.
44 changes: 41 additions & 3 deletions src/hotspot/share/opto/output.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,12 @@ class Scheduling {
// Add a node to the current bundle
void AddNodeToBundle(Node *n, const Block *bb);

// Return true only when the stack offset of the first spill node is
// greater than the stack offset of the second one. Otherwise, return false.
// When compare_two_spill_nodes(first, second) returns true, we think that
// "second" should be scheduled before "first" in the final basic block.
bool compare_two_spill_nodes(Node* first, Node* second);

// Add a node to the list of available nodes
void AddNodeToAvailableList(Node *n);

Expand Down Expand Up @@ -2271,6 +2277,29 @@ Node * Scheduling::ChooseNodeToBundle() {
return _available[0];
}

bool Scheduling::compare_two_spill_nodes(Node* first, Node* second) {
assert(first->is_MachSpillCopy() && second->is_MachSpillCopy(), "");

OptoReg::Name first_src_lo = _regalloc->get_reg_first(first->in(1));
OptoReg::Name first_dst_lo = _regalloc->get_reg_first(first);
OptoReg::Name second_src_lo = _regalloc->get_reg_first(second->in(1));
OptoReg::Name second_dst_lo = _regalloc->get_reg_first(second);

// Comparison between stack -> reg and stack -> reg
if (OptoReg::is_stack(first_src_lo) && OptoReg::is_stack(second_src_lo) &&
OptoReg::is_reg(first_dst_lo) && OptoReg::is_reg(second_dst_lo)) {
return _regalloc->reg2offset(first_src_lo) > _regalloc->reg2offset(second_src_lo);
}

// Comparison between reg -> stack and reg -> stack
if (OptoReg::is_stack(first_dst_lo) && OptoReg::is_stack(second_dst_lo) &&
OptoReg::is_reg(first_src_lo) && OptoReg::is_reg(second_src_lo)) {
return _regalloc->reg2offset(first_dst_lo) > _regalloc->reg2offset(second_dst_lo);
}

return false;
}

void Scheduling::AddNodeToAvailableList(Node *n) {
assert( !n->is_Proj(), "projections never directly made available" );
#ifndef PRODUCT
Expand All @@ -2282,11 +2311,20 @@ void Scheduling::AddNodeToAvailableList(Node *n) {

int latency = _current_latency[n->_idx];

// Insert in latency order (insertion sort)
// Insert in latency order (insertion sort). If two MachSpillCopyNodes
// for stack spilling or unspilling have the same latency, we sort
// them in the order of stack offset. Some backends (aarch64) may also
// have more opportunities to do ld/st merging
uint i;
for ( i=0; i < _available.size(); i++ )
if (_current_latency[_available[i]->_idx] > latency)
for (i = 0; i < _available.size(); i++) {
if (_current_latency[_available[i]->_idx] > latency) {
break;
} else if (_current_latency[_available[i]->_idx] == latency &&
n->is_MachSpillCopy() && _available[i]->is_MachSpillCopy() &&
compare_two_spill_nodes(n, _available[i])) {
break;
}
}

// Special Check for compares following branches
if( n->is_Mach() && _scheduled.size() > 0 ) {
Expand Down

0 comments on commit 96646f7

Please sign in to comment.