Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add L1 Benchmarking #1

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 18 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ target/
/.bsp/
/.idea/
/test_run_dir/
benchmarking/
/.out/
106 changes: 106 additions & 0 deletions benchmarking/L1-benchmarking.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
import sys
abejgonzalez marked this conversation as resolved.
Show resolved Hide resolved

def main():
with open(sys.argv[1]) as f:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try to use python's argparse thing? https://docs.python.org/3/library/argparse.html

Notably, it lets you specify a "help" message, and makes it easier to add more arguments down the line.

prefetch_lines = f.readlines()

with open(sys.argv[2]) as f:
no_prefetch_lines = f.readlines()

misses_prevented = 0
prefetch_queue={}
prefetches_sent=[]

no_prefetch = classify_accesses(no_prefetch_lines)
no_prefetch_misses = no_prefetch['misses']
no_prefetch_hits = no_prefetch['hits']
with_prefetch = classify_accesses(prefetch_lines)
with_prefetch_hits = with_prefetch['hits']
with_prefetch_misses = with_prefetch['misses']

prefetch_hits_only = list(with_prefetch_hits)
no_prefetch_misses_only = list(no_prefetch_misses)

for addr in no_prefetch_hits:
if addr in prefetch_hits_only:
prefetch_hits_only.remove(addr) #get only new hits, blind to duplicates
for addr in with_prefetch_misses:
if addr in no_prefetch_misses_only:
no_prefetch_misses_only.remove(addr)

useful_prefetches=[] #prefetches that actually prevent a miss
num_prefetch_resps = 0
delta_sum = 0
num_prefetches_accessed = 0

for line in prefetch_lines:
if "Prefetch Addr" in line:
pref = line.split()
prefetches_sent.append(pref[4]) #add new prefetch address
elif "Prefetch Resp" in line:
pref_resp = line.split()
pref_resp_addr = pref_resp[5]
pref_resp_cycles = int(pref_resp[1])
if pref_resp_addr in prefetches_sent:
prefetch_queue[pref_resp_addr] = pref_resp_cycles #only interested in most recent response timing
num_prefetch_resps += 1
elif "Snoop" in line:
snoop = line.split()
addr = snoop[4]
cycles = int(snoop[1])
if (addr in prefetch_queue):
delta_sum += (cycles - prefetch_queue[addr])
num_prefetches_accessed += 1
if ((addr in no_prefetch_misses_only) and (addr in prefetch_hits_only)):
no_prefetch_misses_only.remove(addr) # make sure miss isn't counted twice
prefetch_hits_only.remove(addr)
misses_prevented += 1
useful_prefetches.append(addr)

#Accuracy Calculations
num_no_resp_prefetches=len(prefetches_sent)-num_prefetch_resps
num_unused_prefetches=num_prefetch_resps-len(useful_prefetches)
useless_prefetches = num_no_resp_prefetches + num_unused_prefetches

print("misses prevented: " + str(misses_prevented))

coverage = (misses_prevented + 0.0) / (misses_prevented + len(with_prefetch_misses)) * 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its easier to convert to float with float(misses_prevented)

print("coverage: " + str(coverage) + "%")

accuracy = (misses_prevented + 0.0) / (useless_prefetches + misses_prevented) * 100
print("accuracy: " + str(accuracy) + "%")

timeliness = (delta_sum + 0.0) / num_prefetches_accessed
print("timeliness: " + str(timeliness) + " cycles")



def classify_accesses(lines):
snoops = {}
all_addr = []
accesses = {"hits": [], "misses": []}
last_resp_cycle = 0
for line in lines:
if 'Snoop' in line:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This is relatively brittle. What happens if the config you build has other printfs inside it that have Resp/Snoop in it. IMO you should use Python re to create a regex match.

snoop = line.split()
snoop_cycles = snoop[1]
addr = snoop[4]
snoops[addr] = snoop_cycles
all_addr.append(addr)
elif 'Resp' in line:
#check against snoops
resp = line.split()
resp_cycles = resp[1]
resp_addr = resp[4]
if (resp_addr in snoops):
if (((int(resp_cycles) - int(snoops[resp_addr])) >= 5) and (int(resp_cycles) - int(last_resp_cycle) > 3)):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would shift the int() casting to be right when you access the string (i.e. resp_cycles = int(resp[1]))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you determine 5/3 as the consts here? I would add a small comment here for other code readers.

accesses["misses"].append(resp_addr) #add snoop addr to misses
else:
accesses["hits"].append(resp_addr)
snoops.pop(resp_addr)
last_resp_cycle = resp[1]
return accesses



main()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this pythonic:

if __name__ == "__main__":
  main()

8 changes: 8 additions & 0 deletions benchmarking/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Prefetcher Benchmarking

abejgonzalez marked this conversation as resolved.
Show resolved Hide resolved
This benchmarking test suite tests for prefetcher coverage, accuracy and timeliness.

To run the L1 prefetching benchmark tests on a single-core Saturn config, run
```
source benchmarkingL1.sh
```
15 changes: 15 additions & 0 deletions benchmarking/benchmarkingL1.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/bin/bash
# Run L1 prefetcher benchmark tests
# TODO: Add parameterization for other cores

cd ../../..
source env.sh
cd sims/vcs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Depending on where this script is run the ../../../ may not lead where you expect. Here is an example of having this work in all cases:

https://github.com/ucb-bar/chipyard/blob/dcf8da4b2d3a4deead95462fce36a6db5693ed45/scripts/build-toolchains.sh#L9-L18

make CONFIG=Prefetch2SaturnConfig
make run-binary CONFIG=Prefetch2SaturnConfig BINARY=$RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/vvadd.riscv
cp output/chipyard.TestHarness.Prefetch2SaturnConfig/vvadd.out ../../generators/bar-prefetchers/benchmarking/prefetchL1-vvadd.out
make CONFIG=PassthroughPrefetchSaturnConfig
make run-binary CONFIG=PassthroughPrefetchSaturnConfig BINARY=$RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/vvadd.riscv
cp output/chipyard.TestHarness.PassthroughPrefetchSaturnConfig/vvadd.out ../../generators/bar-prefetchers/benchmarking/no-prefetchL1-vvadd.out
cd ../../generators/bar-prefetchers/benchmarking
python L1-benchmarking.py "prefetchL1-vvadd.out" "no-prefetchL1-vvadd.out"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this file should be committed to this repo, as Prefetch2SaturnConfig and PassthroughPrefetchSaturnConfig aren't defined for most people.
Really its just wrapping
python3 L1-benchmarking.py <path-to-prefetch.out> <path-to-no-prefetch.out>.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think an option is to give as the arguments two configs that are compared against one another. Then it is up to the script caller to give two configs that are roughly equiv.

26 changes: 25 additions & 1 deletion src/main/scala/HellaCachePrefetcher.scala
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ class HellaCachePrefetchWrapperModule(pP: CanInstantiatePrefetcher, outer: Hella
outer.cache.module.io <> io
val cache = outer.cache.module

val cycle_counter = RegInit(0.U(32.W))
cycle_counter := cycle_counter + 1.U

abejgonzalez marked this conversation as resolved.
Show resolved Hide resolved
// Intercept and no-op prefetch requests generated by the core
val core_prefetch = io.cpu.req.valid && isPrefetch(io.cpu.req.bits.cmd)
when (io.cpu.req.valid && isPrefetch(io.cpu.req.bits.cmd)) {
Expand Down Expand Up @@ -84,7 +87,28 @@ class HellaCachePrefetchWrapperModule(pP: CanInstantiatePrefetcher, outer: Hella
cache.io.cpu.req.bits.phys := false.B
cache.io.cpu.req.bits.no_alloc := false.B
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a statically assigned boolean to control printing, this should be enabled by a plusArg.

  val enable_print_stats = PlusArg("prefetcher_print_stats", width=1, default=0)(0)
  when (enable_print_stats) { 
    // your print statements
  }

Then when running the sim just set EXTRA_SIM_FLAGS=+prefetcher_print_stats=1

cache.io.cpu.req.bits.no_xcpt := false.B
when (cache.io.cpu.req.fire()) { in_flight := true.B }
when (cache.io.cpu.req.fire()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave the original block, and add a separate block for the prefetch print statements, gated by a config option.

when (cache.io.cpu.req.fire() { in_flight := true.B }

if (printPrefetchingStats) {
  when (cache.io.cpu.req.fire()) {
    ...
  }
  when (prefetcher.io.snoop.valid) {
    ...
  }
  <etc>
}

You'll need to add a new parameter printPrefetcherStats to the config class WithHellaCachePrefetcher, HellaCachePrefetchWrapperFactory.apply, and HellaCachePrefetchWrapper

in_flight := true.B
//print prefetch
val last_prefetch_addr = req.bits.block_address
printf(p"Cycle: ${Decimal(cycle_counter)}\tPrefetch Addr: ${Hexadecimal(req.bits.block_address)}\n")
}
}

//print snoop
when (prefetcher.io.snoop.valid) {
val last_snoop_addr = prefetcher.io.snoop.bits.address
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having Prefetch Addr, Snoop Addr, Resp Addr, Prefetch Resp Addr, I suggest you remove the spaces to have PrefetchAddr, SnoopAddr, RespAddr, PrefetchRespAddr. This makes your parsing in the python script uniform for all the cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also (marginally) speeds up simulation since prints are semi-costly.

printf(p"Cycle: ${Decimal(cycle_counter)}\tSnoop Addr: ${Hexadecimal(prefetcher.io.snoop.bits.address)}\n")
}

//print response
when (cache.io.cpu.resp.valid && !isPrefetch(cache.io.cpu.resp.bits.cmd)) {
printf(p"Cycle: ${Decimal(cycle_counter)}\tResp Addr: ${Hexadecimal(cache.io.cpu.resp.bits.addr)}\n")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't want a person to read this output and instead have the script parse/understand it only, you can simplify this to print a bit faster (i.e. a schema like "type, addr, cycle")

}

//print prefetch response
when (cache.io.cpu.resp.valid && isPrefetch(cache.io.cpu.resp.bits.cmd)) {
printf(p"Cycle: ${Decimal(cycle_counter)}\tPrefetch Resp Addr: ${Hexadecimal(cache.io.cpu.resp.bits.addr)}\n")
}

val prefetch_fire = cache.io.cpu.req.fire() && isPrefetch(cache.io.cpu.req.bits.cmd)
Expand Down