Add L1 Benchmarking #1

schwarz-em · 2022-02-20T05:28:18Z

The main changes in this PR are adding a python script (L1-benchmarking.py) to test coverage, accuracy and timeliness and adding a bash script (benchmarkingL1.sh) to run the tests automatically. The benchmarking script reads from the printed values in the RTL sim .out files to test the prefetcher metrics, so the printfs in the HellaCache prefetcher wrapper are necessary.

The way to calculate cache metrics is not standardized, so I'm open to suggestions on possible ways to make the calculations more useful/accurate.

…nto ella-prefetching

jerryz123

This is a neat way to get metrics on simple bare-metal benchmarks, and I think its useful for that use case, but I have concerns over how this approach will scale to larger workloads. There's several edge cases which I'm not sure this approach can handle accurately.

What happens when the same address is accessed multiple times across different phases of the program? I see this is blind to duplicates. Does that not matter for simple benchmarks?
What happens when there is timing dependent execution? For simple benchmarks a non-prefetch config might execute the same code path as a prefetch config, but on a large Linux workload timing will affect the code paths taken.

Also, how does this prefetcher evaluation approach compare to the evaluation method in DPC3/DPC2?

jerryz123 · 2022-02-20T05:50:09Z

benchmarking/L1-benchmarking.py

+
+
+
+main()


Make this pythonic:

if __name__ == "__main__": main()

jerryz123 · 2022-02-20T05:51:47Z

benchmarking/L1-benchmarking.py

+import sys
+
+def main():
+    with open(sys.argv[1]) as f:


Can you try to use python's argparse thing? https://docs.python.org/3/library/argparse.html

Notably, it lets you specify a "help" message, and makes it easier to add more arguments down the line.

src/main/scala/HellaCachePrefetcher.scala

jerryz123 · 2022-02-20T05:58:20Z

src/main/scala/HellaCachePrefetcher.scala

@@ -84,7 +87,28 @@ class HellaCachePrefetchWrapperModule(pP: CanInstantiatePrefetcher, outer: Hella
    cache.io.cpu.req.bits.phys := false.B
    cache.io.cpu.req.bits.no_alloc := false.B
    cache.io.cpu.req.bits.no_xcpt := false.B
-    when (cache.io.cpu.req.fire()) { in_flight := true.B }
+    when (cache.io.cpu.req.fire()) { 


Leave the original block, and add a separate block for the prefetch print statements, gated by a config option.

when (cache.io.cpu.req.fire() { in_flight := true.B } if (printPrefetchingStats) { when (cache.io.cpu.req.fire()) { ... } when (prefetcher.io.snoop.valid) { ... } <etc> }

You'll need to add a new parameter printPrefetcherStats to the config class WithHellaCachePrefetcher, HellaCachePrefetchWrapperFactory.apply, and HellaCachePrefetchWrapper

jerryz123 · 2022-02-20T06:04:04Z

benchmarking/benchmarkingL1.sh

+make run-binary CONFIG=PassthroughPrefetchSaturnConfig BINARY=$RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/vvadd.riscv
+cp output/chipyard.TestHarness.PassthroughPrefetchSaturnConfig/vvadd.out ../../generators/bar-prefetchers/benchmarking/no-prefetchL1-vvadd.out
+cd ../../generators/bar-prefetchers/benchmarking
+python L1-benchmarking.py "prefetchL1-vvadd.out" "no-prefetchL1-vvadd.out"


I don't think this file should be committed to this repo, as Prefetch2SaturnConfig and PassthroughPrefetchSaturnConfig aren't defined for most people.
Really its just wrapping
python3 L1-benchmarking.py <path-to-prefetch.out> <path-to-no-prefetch.out>.

I think an option is to give as the arguments two configs that are compared against one another. Then it is up to the script caller to give two configs that are roughly equiv.

jerryz123 · 2022-02-20T06:14:28Z

benchmarking/L1-benchmarking.py

+
+    print("misses prevented: " + str(misses_prevented))
+
+    coverage = (misses_prevented + 0.0) / (misses_prevented + len(with_prefetch_misses)) * 100


I think its easier to convert to float with float(misses_prevented)

jerryz123 · 2022-02-20T06:16:23Z

src/main/scala/HellaCachePrefetcher.scala

+
+  //print snoop
+  when (prefetcher.io.snoop.valid) {
+    val last_snoop_addr = prefetcher.io.snoop.bits.address


Instead of having Prefetch Addr, Snoop Addr, Resp Addr, Prefetch Resp Addr, I suggest you remove the spaces to have PrefetchAddr, SnoopAddr, RespAddr, PrefetchRespAddr. This makes your parsing in the python script uniform for all the cases.

This also (marginally) speeds up simulation since prints are semi-costly.

schwarz-em · 2022-02-21T04:48:03Z

@jerryz123

I actually had a bit of difficulty trying to figure out how to handle the same address being accessed more than once and was not able to come up with a good solution. It wouldn't be so difficult if the snoop stream was the same for both the prefetch and non-prefetch configs, but since it's not, I can't compare them that way.
I haven't considered timing at all yet and I don't have any solutions for that off the top of my head. Similar to the above problem, I think what I have right now can give an okay approximation of prefetcher performance in that case, but it would require more time to find a good solution.
I'm definitely open to explore any possible ways to solve these - I'm just out of ideas for the first one in particular.

I'm not super familiar with the DPC2/DPC3 benchmarking system, but from looking through it quickly it seems like they're more concerned with overall performance (ie improvement in total cycles, CPI) than specific cache metrics. It looks like it's probably possible to do cache metric calculations from what is provided, but these can vary depending on who's doing the calculating since the metrics aren't standardized.

abejgonzalez

Small nits to the code itself. Jerry hit most of the important ones already.

As for the timing-dependent execution, I figure for simple baremetal bmarks this isn't an issue. This solution should never scale to Linux booting/workloads because printing is probably too slow to get meaningful numbers for large bmarks (I don't think we want to enable this for something like SPEC). Instead, we should transition to added HW measurements (maybe even a perf counter / autocounter to measure things).

benchmarking/L1-benchmarking.py

benchmarking/README.md

abejgonzalez · 2022-02-21T17:52:29Z

benchmarking/benchmarkingL1.sh

+make run-binary CONFIG=PassthroughPrefetchSaturnConfig BINARY=$RISCV/riscv64-unknown-elf/share/riscv-tests/benchmarks/vvadd.riscv
+cp output/chipyard.TestHarness.PassthroughPrefetchSaturnConfig/vvadd.out ../../generators/bar-prefetchers/benchmarking/no-prefetchL1-vvadd.out
+cd ../../generators/bar-prefetchers/benchmarking
+python L1-benchmarking.py "prefetchL1-vvadd.out" "no-prefetchL1-vvadd.out"


I think an option is to give as the arguments two configs that are compared against one another. Then it is up to the script caller to give two configs that are roughly equiv.

abejgonzalez · 2022-02-21T17:55:52Z

benchmarking/benchmarkingL1.sh

+cd ../../..
+source env.sh
+cd sims/vcs


Nit: Depending on where this script is run the ../../../ may not lead where you expect. Here is an example of having this work in all cases:

https://github.com/ucb-bar/chipyard/blob/dcf8da4b2d3a4deead95462fce36a6db5693ed45/scripts/build-toolchains.sh#L9-L18

abejgonzalez · 2022-02-21T17:57:06Z

src/main/scala/HellaCachePrefetcher.scala

+
+  //print snoop
+  when (prefetcher.io.snoop.valid) {
+    val last_snoop_addr = prefetcher.io.snoop.bits.address


This also (marginally) speeds up simulation since prints are semi-costly.

abejgonzalez · 2022-02-21T17:58:58Z

benchmarking/L1-benchmarking.py

+            resp_cycles = resp[1]
+            resp_addr = resp[4]
+            if (resp_addr in snoops): 
+                if (((int(resp_cycles) - int(snoops[resp_addr])) >= 5) and (int(resp_cycles) - int(last_resp_cycle) > 3)):


Nit: I would shift the int() casting to be right when you access the string (i.e. resp_cycles = int(resp[1]))

abejgonzalez · 2022-02-21T18:01:48Z

src/main/scala/HellaCachePrefetcher.scala

+
+  //print response
+  when (cache.io.cpu.resp.valid && !isPrefetch(cache.io.cpu.resp.bits.cmd)) {
+    printf(p"Cycle: ${Decimal(cycle_counter)}\tResp Addr: ${Hexadecimal(cache.io.cpu.resp.bits.addr)}\n")


If you don't want a person to read this output and instead have the script parse/understand it only, you can simplify this to print a bit faster (i.e. a schema like "type, addr, cycle")

abejgonzalez · 2022-02-21T18:03:23Z

benchmarking/L1-benchmarking.py

+    accesses = {"hits": [], "misses": []}
+    last_resp_cycle = 0
+    for line in lines:
+        if 'Snoop' in line:


Nit: This is relatively brittle. What happens if the config you build has other printfs inside it that have Resp/Snoop in it. IMO you should use Python re to create a regex match.

abejgonzalez · 2022-02-21T18:04:03Z

benchmarking/L1-benchmarking.py

+            resp_cycles = resp[1]
+            resp_addr = resp[4]
+            if (resp_addr in snoops): 
+                if (((int(resp_cycles) - int(snoops[resp_addr])) >= 5) and (int(resp_cycles) - int(last_resp_cycle) > 3)):


How did you determine 5/3 as the consts here? I would add a small comment here for other code readers.

abejgonzalez

Some more small changes but I think this can be merged in soon (once they are addressed).

abejgonzalez · 2022-05-06T16:45:18Z

benchmarking/L1-benchmarking.py

+Cycle: decimal_int  PrefetchRespAddr: hexadecimal_int
+"""
+
+snoop_regex = re.compile("^Cycle:\s*(\d+)\s*SnoopAddr:\s*([\da-f]+)\s*SnoopBlock:\s*([\da-f]+)\s*")


A couple minor changes:

\s* can probably be reduced to + since you are just doing spaces. If you have 1 space then I would just simplify and do .

IIRC re.match already starts the search at the start of the line so you don't need the ^.

I think you don't need to match the spaces at the end (you need to verify this). In any case, I think the better option would be to do .*.

abejgonzalez · 2022-05-06T16:45:55Z

benchmarking/L1-benchmarking.py

@@ -1,12 +1,34 @@
+#!/usr/bin/python


Nit: Point to Python3 or 2

abejgonzalez · 2022-05-06T16:48:41Z

benchmarking/README.md

 ```
-source benchmarkingL1.sh
+source benchmarkingL1.sh [prefetch config] [non-prefetch config]


Suggested change

source benchmarkingL1.sh [prefetch config] [non-prefetch config]

./benchmarking/L1-benchmarking.sh [prefetch config] [non-prefetch config]

abejgonzalez · 2022-05-06T16:50:22Z

benchmarking/benchmarkingL1.sh

@@ -1,15 +1,25 @@
 #!/bin/bash
-# Run L1 prefetcher benchmark tests
-# TODO: Add parameterization for other cores
+# Run L1 prefetcher benchmark test


Add a small description of what $1 and $2 are supposed to be pointing to.

Additionally, this is running the test on vvadd. If we are going to add this, I think the benchmark should also be abstracted out.

abejgonzalez · 2022-05-06T16:52:53Z

src/main/scala/HellaCachePrefetcher.scala

+    if (printPrefetchingStats) {
+      when (cache.io.cpu.req.fire()) {
+        //print prefetch
+        val last_prefetch_addr = req.bits.block_address


Should this be in the if or outside of it?

Seems like you can just delete this?

abejgonzalez · 2022-05-06T16:53:25Z

src/main/scala/HellaCachePrefetcher.scala

-  }
+  if (printPrefetchingStats) {
+    when (prefetcher.io.snoop.valid) {
+      val last_snoop_addr = prefetcher.io.snoop.bits.address


Same as comment above.

abejgonzalez · 2022-05-06T16:56:46Z

benchmarking/L1-benchmarking.py

+    accuracy_resp = float(misses_prevented) / (num_unique_prefetch_resps-len(useful_prefetches) + misses_prevented) * 100
+    print("accuracy (acknowledged): " + str(accuracy_resp) + "%")
+    if (num_prefetches_accessed != 0):
+        timeliness = (delta_sum + 0.0) / num_prefetches_accessed


Suggested change

timeliness = (delta_sum + 0.0) / num_prefetches_accessed

timeliness = float(delta_sum) / num_prefetches_accessed

jerryz123 · 2022-05-06T17:04:27Z

src/main/scala/HellaCachePrefetcher.scala

@@ -85,6 +88,31 @@ class HellaCachePrefetchWrapperModule(pP: CanInstantiatePrefetcher, outer: Hella
    cache.io.cpu.req.bits.no_alloc := false.B


Instead of using a statically assigned boolean to control printing, this should be enabled by a plusArg.

val enable_print_stats = PlusArg("prefetcher_print_stats", width=1, default=0)(0) when (enable_print_stats) { // your print statements }

Then when running the sim just set EXTRA_SIM_FLAGS=+prefetcher_print_stats=1

Ella Schwarz added 18 commits January 24, 2022 22:30

Add passthrough prefetcher

55365ed

Add printfs for benchmarking

284dabe

Merge branch 'master' of https://github.com/ucb-bar/bar-prefetchers i…

afa397a

…nto ella-prefetching

Add printfs for benchmarking script

60aef70

Add coverage script

a1338a5

Edit for correctness

7e529e2

Add bash script to automatically run tests

16d82c6

Add printfs for benchmarking script

a2402d9

Add L1 benchmarking script

d500223

Add timeliness

8220621

Add readme

67b14f1

Add L1 benchmarking script

e5e981c

Add benchmarking .out files to gitignore

7715229

Remove unfinished L2 benchmarking for now

c8b36e4

Clean up benchmarking printfs

fb3174d

Remove printfs for now

0dc6bf0

Remove redundant passthrough prefetcher

94d31cc

Remove redundant passthrough prefetcher config

0707243

schwarz-em requested review from abejgonzalez and jerryz123 February 20, 2022 05:28

jerryz123 requested changes Feb 20, 2022

View reviewed changes

abejgonzalez reviewed Feb 21, 2022

View reviewed changes

Ella Schwarz added 3 commits March 21, 2022 21:34

Address PR comments

db320de

Add acknowledged accuracy and address PR comments

65307dd

Make regex more efficient

5115d59

abejgonzalez reviewed May 6, 2022

View reviewed changes

jerryz123 requested changes May 6, 2022

View reviewed changes

Ella Schwarz added 2 commits May 16, 2022 20:19

Parameterize binary file

3d90bde

Simplify regex

ddfcafd

Change print_stats to PlusArg

1db88dc

schwarz-em requested review from abejgonzalez and jerryz123 May 17, 2022 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add L1 Benchmarking #1

Add L1 Benchmarking #1

schwarz-em commented Feb 20, 2022

jerryz123 left a comment

jerryz123 Feb 20, 2022

jerryz123 Feb 20, 2022

jerryz123 Feb 20, 2022

jerryz123 Feb 20, 2022

abejgonzalez Feb 21, 2022

jerryz123 Feb 20, 2022

jerryz123 Feb 20, 2022

abejgonzalez Feb 21, 2022

schwarz-em commented Feb 21, 2022

abejgonzalez left a comment

abejgonzalez Feb 21, 2022

abejgonzalez Feb 21, 2022

abejgonzalez Feb 21, 2022

abejgonzalez Feb 21, 2022

abejgonzalez Feb 21, 2022

abejgonzalez Feb 21, 2022

abejgonzalez Feb 21, 2022

abejgonzalez left a comment

abejgonzalez May 6, 2022

abejgonzalez May 6, 2022

abejgonzalez May 6, 2022

abejgonzalez May 6, 2022

abejgonzalez May 6, 2022

abejgonzalez May 6, 2022

abejgonzalez May 6, 2022

abejgonzalez May 6, 2022

abejgonzalez May 6, 2022

jerryz123 May 6, 2022


		print("misses prevented: " + str(misses_prevented))

		coverage = (misses_prevented + 0.0) / (misses_prevented + len(with_prefetch_misses)) * 100

	source benchmarkingL1.sh [prefetch config] [non-prefetch config]
	./benchmarking/L1-benchmarking.sh [prefetch config] [non-prefetch config]

	timeliness = (delta_sum + 0.0) / num_prefetches_accessed
	timeliness = float(delta_sum) / num_prefetches_accessed

		@@ -85,6 +88,31 @@ class HellaCachePrefetchWrapperModule(pP: CanInstantiatePrefetcher, outer: Hella
		cache.io.cpu.req.bits.no_alloc := false.B




		main()

Add L1 Benchmarking #1

Are you sure you want to change the base?

Add L1 Benchmarking #1

Conversation

schwarz-em commented Feb 20, 2022

jerryz123 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

schwarz-em commented Feb 21, 2022

abejgonzalez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abejgonzalez left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment