This repo serves as the report and codebase for Lab 2/3 for the Advanced Computer Architecture course, on the Electrical and Computer Engineering school of Aristotle University of Thessaloniki

Ανδρονίκου Δημήτρης, 9836

Αλεξανδρίδης Φώτιος, 9953

Advaced Computer Architecture, Lab 02

Question 1

Subquestion a

L1 instruction => size=32768 ~> 32KB

associativity assoc=2

L1 Data caches => size=65536 ~> 65 KB

associativity assoc=2

L2 cache size=2097152 ~> 2MB

associativity assoc=8

cache line cache_line_size=64

Subquestion b

Specbzip

execution time sim_seconds = 0.083982
CPI system.cpu.cpi = 1.679650
total miss rates for L1 Instruction cache system.cpu.icache.overall_miss_rate::total 0.000077
total miss rates for L1 Data cache system.cpu.dcache.overall_miss_rate::total 0.014798
total miss rates for L2 cache

system.l2.overall_miss_rate::total 0.282163

Specmcf

execution time sim_seconds = 0.064955
CPI system.cpu.cpi = 1.299095
total miss rates for L1 Instruction cache system.cpu.icache.overall_miss_rate::total 0.023612
total miss rates for L1 Data cache system.cpu.dcache.overall_miss_rate::total 0.002108
total miss rates for L2 cache

system.l2.overall_miss_rate::total 0.055046

Spechmmer

execution time sim_seconds = 0.059396
CPI system.cpu.cpi = 1.187917
total miss rates for L1 Instruction cache system.cpu.icache.overall_miss_rate::total 0.000221
total miss rates for L1 Data cache system.cpu.dcache.overall_miss_rate::total 0.001637
total miss rates for L2 cache

system.l2.overall_miss_rate::total 0.077760

Specsjeng

execution time: sim_seconds = 0.513528
CPI: system.cpu.cpi = 10.270554
total miss rates for L1 Instruction cache: system.cpu.icache.overall_miss_rate::total 0.000020
total miss rates for L1 Data cache: system.cpu.dcache.overall_miss_rate::total 0.121831
total miss rates for L2 cache:

system.l2.overall_miss_rate::total 0.999972

Speclibm

execution time: sim_seconds = 0.174671
CPI: system.cpu.cpi = 3.493415
total rates for L1 Instruction cache: system.cpu.icache.overall_miss_rate::total 0.000094
total rates for L1 Data cache: system.cpu.dcache.overall_miss_rate::total 0.060972
total rates for L2 cache:

system.l2.overall_miss_rate::total 0.999944

	Specbzip	Specmcf	Spechmmer	Specsjeng	Speclibm
χρόνος εκτέλεσης	0.083982	0.064955	0.059396	0.513528	0.174671

	Specbzip	Specmcf	Spechmmer	Specsjeng	Speclibm
CPI	1.679650	1.299095	1.187917	10.270554	3.493415

	Specbzip	Specmcf	Spechmmer	Specsjeng	Speclibm
Icache_miss_rate::total	0.000077	0.023612	0.000221	0.000020	0.000094

	Specbzip	Specmcf	Spechmmer	Specsjeng	Speclibm
dcache_miss_rate::total	0.014798	0.002108	0.001637	0.121831	0.060972

	Specbzip	Specmcf	Spechmmer	Specsjeng	Speclibm
l2_miss_rate::total	0.282163	0.055046	0.077760	0.999972	0.999944

execution time of specsjeng is much bigger than the others

CPI of specsjeng is much bigger than the others

But miss rate of instructions of specmcf is much bigger than the others

miss rate of data of specsjeng is much bigger than the others

miss rate in L2 of specsjeng and of speclibm is much bigger than the others

Due to 5 and 4 it is reasonable to have a high CPI for specsjeng, hence a long runtime.

Subquestion c

system.clk_domain.clock entry

	401.bzip2	429.mcf	456.hmmer	458.sjeng	470.lbm
default	1000	1000	1000	1000	1000
1GHz	1000	1000	1000	1000	1000
3GHz	1000	1000	1000	1000	1000

cpu_cluster.clk_domain.clock entry

	401.bzip2	429.mcf	456.hmmer	458.sjeng	470.lbm
default	500	500	500	500	500
1GHz	1000	1000	1000	1000	1000
3GHz	333	333	333	333	333

To answer the theretical questions, we will focus on the first benchmark, but the same information is appliccable for all benchmarks. We observe that the system clock entry defines the clock rate for the total system, the components of the motherboard. The CPU clock is responsible for defining the clock rate for the different CPU components. Because a CPU usually has to perform more operations per unit of time compared to the other components, the CPU clock is usually defined to be at least as fast as the system clock, but usually faster. For performance and syncronization reasons, we actually want the system clock rate in ticks to be an integral product of the CPU clock rate in ticks. By inspecting the config.json file for a 1GHz benchmark, we can observe that the different CPU components are clocked according to the CPU clock rate. For a MinorCPU model (defined in the commands we use to run the benchmarks), these components are:

L1 data cache (dcache)
L1 instruction cache (icache)
L2 cache
Instruction walker cache
Data walker cache
Data busses that connect the components mentioned above

If we add another CPU, meaning we add another core to the CPU cluster, its clock rate will be the defined CPU clock rate, always according to its architecture.

For the scaling we have the following execution times for the 1GHz and 3GHz simulations:

Simulation time (in seconds)

	401.bzip2	429.mcf	456.hmmer	458.sjeng	470.lbm
1GHz	0.165228	0.124137	0.118530	0.329087	0.246976
3GHz	0.061589	0.043909	0.039646	0.138401	0.135953

As we can observe from the table, in some cases the scaling is better than the others (specifically, in benchmark 456.hmmer we can see that the rate is closer to 3 than in 470.lbm). Achieving perfect scaling is close to impossible, because the system performance depends on a lot of parameters other than CPU clock (for example: cache line size).

Subquestion d

Specbzip

execution time: sim_seconds = 0.083609
CPI: system.cpu.cpi = 1.672175
total miss rates for L1 Instruction cache: system.cpu.icache.overall_miss_rate::total 0.000077
total miss rates for L1 Data cache: system.cpu.dcache.overall_miss_rate::total 0.014795
total miss rates for L2 cache:

system.l2.overall_miss_rate::total 0.282159

	Specbzip	Specbzip(fast DDR)	%
χρόνος εκτέλεσης	0.083982	0.083609	-0.4441
CPI	1.679650	1.672175	-0.4465
Icache_miss_rate::total	0.000077	0.000077	0
dcache_miss_rate::total	0.014798	0.014795	-0.02
l2_miss_rate::total	0.282163	0.282159	-0.0014

We observe that with increasing the frequency of memory the miss rate for L2 and L1 data cache decreases. This makes sense because these caches will receive data faster, so we will have fewer misses in the same amount of time. This has an effect on CPI, which is reasonable, and of course CPI as shown before affects runtime almost proportionally.

Questions 2

In set associative cache, block size does not affect cache tag anyhow.
A smaller cache tag ensures a lower cache hit time.
A smaller cache block incurs a lower cache miss penalty.

Also

Increase cache line size
- Reduces compulsory misses
- Reduces the miss rate
- Increases capacity and conflict misses
Increase size of the cache
- Increases hit time, increases power consumption
- Reduces the miss rate (it can load more Bytes on caches)
Higher associativity
- Reduces conflict misses
- Reduces the miss rate
- Increases hit time, increases power consumption
Higher number of cache levels
- Reduces overall memory access time
- Reduces the miss penalty

So our run tests are these:

Run #	L1 dcache size	L1 icache size	L2 cache size	L1 icache associat.	L1 dcache associat.	L2 cache associat.	Cache line size
1	64	64	1	1	1	2	128
2	64	128	2	1	1	2	32
3	128	64	2	1	1	2	32
4	128	128	4	1	1	2	128
5	128	128	4	2	1	2	128
6	128	128	4	4	1	2	128
7	128	128	4	4	2	2	128
8	128	128	4	4	4	2	64
9	128	128	4	4	4	4	64
10	128	128	4	4	4	8	128

The results:

Specbzip

	cpi	dcache_miss_rate	icache _miss_rate	L2 _miss_rate
run1	1.551542	0.009703	0.000046	0.204023
run2	1.790627	0.018864	0.000069	0.384215
run3	1.756071	0.015868	0.000070	0.465377
run4	1.574223	0.011891	0.000047	0.166513
run5	1.574144	0.011891	0.000046	0.166506
run6	1.574144	0.011891	0.000046	0.166510
run7	1.556731	0.010273	0.000046	0.192764
run8	1.597712	0.010896	0.000057	0.312559
run9	1.597876	0.010896	0.000057	0.312993
run10	1.551542	0.009703	0.000046	0.204023

Specmcf

	cpi	dcache_miss_rate	icache _miss_rate	L2 _miss_rate
run1	1.181221	0.001966	0.000016	0.758297
run2	1.335364	0.018105	0.000055	0.316779
run3	2.143559	0.005244	0.103163	0.042156
run4	1.184956	0.002541	0.000036	0.578237
run5	1.184904	0.002541	0.000023	0.581209
run6	1.184816	0.002541	0.000016	0.583029
run7	1.182096	0.002097	0.000016	0.705200
run8	1.203137	0.003132	0.000022	0.863710
run9	1.203047	0.003132	0.000022	0.863262
run10	1.181221	0.001966	0.000016	0.758297

Spechmmer

	cpi	dcache_miss_rate	icache _miss_rate	L2 _miss_rate
run1	1,178110	0,000367	0,000056	0,200911
run2	1,210302	0,004346	0,000401	0.054981
run3	1,194226	0,002259	0,000420	0,102704
run4	1,186298	0,001133	0,000334	0,056418
run5	1,185588	0,001133	0,000057	0,061388
run6	1,185588	0,001133	0,000056	0,061400
run7	1,178295	0,000387	0,000056	0,187128
run8	1,182888	0,000662	0,000078	0,208062
run9	1,182888	0,000662	0,000078	0,208062
run10	1,178110	0,000367	0,000056	0,200911

Specsjeng

	cpi	dcache_miss_rate	icache _miss_rate	L2 _miss_rate
run1	3.348858	0.248364	0.011330	0.044292
run2	5.605595	0.393710	0.003321	0.290642
run3	5.684481	0.392898	0.012986	0.123246
run4	3.261793	0.243757	0.002443	0.166720
run5	3.251312	0.243745	0.000578	0.288207
run6	3.247843	0.243744	0.000115	0.352064
run7	3.240343	0.242518	0.000114	0.702941
run8	3.173673	0.243469	0.000108	0.894077
run9	3.173692	0.243468	0.000108	0.894172
run10	3.239587	0.242393	0.000114	0.780786

Speclibm

	cpi	dcache_miss_rate	icache _miss_rate	L2 _miss_rate
run1	1.883187	0.032168	0.000102	0.944590
run2	3.648170	0.123694	0.000071	0.997075
run3	3.648098	0.123577	0.000075	0.998511
run4	1.874379	0.031517	0.000089	0.971545
run5	1.874379	0.031517	0.000087	0.971559
run6	1.874379	0.031517	0.000086	0.971562
run7	1.865695	0.030867	0.000086	0.999997
run8	2.467162	0.061731	0.000085	0.999999
run9	2.467162	0.061731	0.000085	0.999999
run10	1.865695	0.030867	0.000086	0.999997

Overall, we can see that we achieve the lowest CPI consistently across all benchmarks on runs 1, 7 and 10. We can definitely conclude that a higher cache line size is vital to lowering CPI, since for the runs with the smaller sizes (32) we can consistently see the highest CPI there along every benchmark. Apart from that we can see that increasing associativity has a minimal effect on CPI, by checking the differences between the three runs. Cache size doesn’t impact the CPI much.

As such, we can conclude that the CPI is largely impacted by cache line size (higher -> better CPI), impacted somewhat by associativity (higher -> better CPI) and is marginally impacted by cache sizes (higher -> better CPI in some cases).

Question 3

We want a function like Performance/Cost, where Performance = 1/CPI

So F = 1CPICost=1CPI*Cost

We want to maximize rhis function, thus to minimize the CPI*Cost

Cost is:

Cost = 10*cost_L1_data + 10*cost_L1_instr + cost_L2 + 10*cost_L1_data_asso + 10*cost_L1_inst_asso + cost_L2_asso + cost_line_s

where cost_L1_data = μέγεθος της L1 data

where cost_L1_instr = μέγεθος της L1 instruction

where cost_L2 = μέγεθος της L2 (σε kB, άρα x1000)

where cost_L1_data_asso = μέγεθος του L1 data associativity

where cost_L1_inst_asso = μέγεθος του L1 instruction associativity

where cost_L2_asso = μέγεθος του L2 associativity

where cost_line_s = μέγεθος του cache line

we use x10 because the cost of L1 is much bigger than the cost of L2

Cost(run-i)

run1	2430
run2	3974
run3	3974
run4	6710
run5	6720
run6	6740
run7	6750
run8	6706
run9	6708
run10	6776

And Cost = Cost /1000 (to be of the same order of magnitude as cpi)

Specbzip

	F
run1	0.265234
run2	0.140529
run3	0.143295
run4	0.09467
run5	0.094534
run6	0.094253
run7	0.095166
run8	0.093333
run9	0.093296
run10	0.095119

Specmcf

	F
run1	0.34839
run2	0.18844
run3	0.11739
run4	0.12577
run5	0.12559
run6	0.12522
run7	0.12533
run8	0.12394
run9	0.12392
run10	0.12494

Spechmmer

	F
run1	0.34931
run2	0.20791
run3	0.21071
run4	0.12563
run5	0.12552
run6	0.12514
run7	0.12573
run8	0.12606
run9	0.12603
run10	0.12527

Specsjeng

	F
run1	0.12288
run2	0.04489
run3	0.04427
run4	0.04569
run5	0.04577
run6	0.04568
run7	0.04572
run8	0.04699
run9	0.04697
run10	0.04556

Speclibm

	F
run1	0.21852
run2	0.06898
run3	0.06898
run4	0.07951
run5	0.07939
run6	0.07916
run7	0.07941
run8	0.06044
run9	0.06042
run10	0.0791

So best choice for all run 1 mainly because of very low cost and good cpi, although the other tests have more cache memory they don't have better cpi.

Bibliography

https://www.gatevidyalay.com/cache-line-cache-line-size-cache-memory/

http://ece-research.unm.edu/jimp/611/slides/chap5_4.html

Computer Architecture John L. Hennesy and A. Patterson, 4th edition

Run #	L1 dcache size	L1 icache size	L2 cache size	L1 icache associat.	L1 dcache associat.	L2 cache associat.	Cache line size
1	64	64	1	1	1	2	128
2	64	128	2	1	1	2	32
3	128	64	2	1	1	2	32
4	128	128	4	1	1	2	128
5	128	128	4	2	1	2	128
6	128	128	4	4	1	2	128
7	128	128	4	4	2	2	128
8	128	128	4	4	4	2	64
9	128	128	4	4	4	4	64
10	128	128	4	4	4	8	128

Run #	L1 dcache size	L1 icache size	L2 cache size	L1 icache associat.	L1 dcache associat.	L2 cache associat.	Cache line size
1	64	64	1	1	1	2	128
2	64	128	2	1	1	2	32
3	128	64	2	1	1	2	32
4	128	128	4	1	1	2	128
5	128	128	4	2	1	2	128
6	128	128	4	4	1	2	128
7	128	128	4	4	2	2	128
8	128	128	4	4	4	2	64
9	128	128	4	4	4	4	64
10	128	128	4	4	4	8	128

Files

README.md

Latest commit

History

README.md

File metadata and controls

This repo serves as the report and codebase for Lab 2/3 for the Advanced Computer Architecture course, on the Electrical and Computer Engineering school of Aristotle University of Thessaloniki

Advaced Computer Architecture, Lab 02

Question 1

Subquestion a

Subquestion b

Specbzip

Specmcf

Spechmmer

Specsjeng

Speclibm

Specbzip

Specmcf

Spechmmer

Specsjeng

Speclibm

Specbzip

Specmcf

Spechmmer

Specsjeng

Speclibm

Specbzip

Specmcf

Spechmmer

Specsjeng

Speclibm

Specbzip

Specmcf

Spechmmer

Specsjeng

Speclibm

Specbzip

Specmcf

Spechmmer

Specsjeng

Speclibm

Subquestion c

Subquestion d

Specbzip

Specbzip

Specbzip(fast DDR)

%

Questions 2

Specbzip

cpi

dcache_miss_rate

icache _miss_rate

L2 _miss_rate

Specmcf

cpi

dcache_miss_rate

icache _miss_rate

L2 _miss_rate

Spechmmer

cpi

dcache_miss_rate

icache _miss_rate

L2 _miss_rate

Specsjeng

cpi

dcache_miss_rate

icache _miss_rate

L2 _miss_rate

Speclibm

cpi

dcache_miss_rate

icache _miss_rate

L2 _miss_rate

Question 3

Cost(run-i)

Specbzip

F

Specmcf

F

Spechmmer

Run #	L1 dcache size	L1 icache size	L2 cache size	L1 icache associat.	L1 dcache associat.	L2 cache associat.	Cache line size
1	64	64	1	1	1	2	128
2	64	128	2	1	1	2	32
3	128	64	2	1	1	2	32
4	128	128	4	1	1	2	128
5	128	128	4	2	1	2	128
6	128	128	4	4	1	2	128
7	128	128	4	4	2	2	128
8	128	128	4	4	4	2	64
9	128	128	4	4	4	4	64
10	128	128	4	4	4	8	128