Skip to content
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Design:

{
"constant_vars": [
"application_type=e-commerce web application",
"function_tested=add_to_cart",
"concurrent_requests=500",
"test_type=network I/O performance"
],
"independent_vars": [
"ec2_instance_type=[c5.large, c5.xlarge, c5.2xlarge, c5.4xlarge, c5.9xlarge]"
],
"dependent_vars": [
"99th percentile latency"
]
}

# Setup: refer to misc/optional_setup_q1_q2_q3_q4.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Design:

{
"constant_vars": [
"application_type=e-commerce web application",
"function_tested=add_to_cart",
"concurrent_requests=500",
"test_type=network I/O performance"
],
"independent_vars": [
"ec2_instance_type=[c5.large, c5.xlarge, c5.2xlarge, c5.4xlarge, c5.9xlarge]"
],
"dependent_vars": [
"99th percentile latency under 150ms",
"total cost per hour"
]
}

# Setup: refer to misc/optional_setup_q1_q2_q3_q4.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Design:

{
"constant_vars": [
"application_type=e-commerce web application",
"function_tested=add_to_cart",
"concurrent_requests=500",
"test_type=network I/O performance"
],
"independent_vars": [
"ec2_instance_type=[c5.large, c5.xlarge, c5.2xlarge, c5.4xlarge, c5.9xlarge]"
],
"dependent_vars": [
"99th percentile latency under 150ms",
"total cost per hour"
],
}

# Setup: refer to misc/optional_setup_q1_q2_q3_q4.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Design:

{
"constant_vars": [
"application_type=e-commerce web application",
"function_tested=add_to_cart",
"concurrent_requests=500",
"network_test=network I/O latency"
],
"independent_vars": [
"ec2_instance_type=[c5.large, c5.xlarge, c5.2xlarge, t3.nano, t3.micro, t3.small]"
],
"dependent_vars": [
"99th percentile latency",
"total cost per hour"
],
}

# Setup: refer to misc/optional_setup_q1_q2_q3_q4.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Design:

{
"constant_vars": [
"workload_type=compute-bound",
],
"independent_vars": [
"ec2_instance_type=[t3.medium, c5.large]"
],
"dependent_vars": [
"CPU throughput (events per second)",
]
}

# Setup: refer to misc/optional_setup_q5_q6_q7.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Design:

{
"constant_vars": [
"workload_type=compute-bound",
],
"independent_vars": [
"ec2_instance_type=[t3.medium, c5.large, r5.large, m6i.large, t3a.large]"
],
"dependent_vars": [
"CPU throughput (events per second)"
],
}

# Setup: refer to misc/optional_setup_q5_q6_q7.txt. Also, check if "events per second" is used as the metric.
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Design:

{
"constant_vars": [
"workload_type=compute-bound",
"instance_vCPU=2",
"instance_memory=8GB",
],
"independent_vars": [
"ec2_instance_type=[t3a.large, m5.large, m6a.large]",
"cpu_architecture=[AMD EPYC 7000 (t3a), Intel Xeon Platinum 8000 (m5), AMD EPYC Milan (m6a)]",
"base_clock_frequency=[2.5GHz, 3.1GHz, 3.6GHz]"
],
"dependent_vars": [
"CPU throughput (events per second)"
],
}

# Setup: refer to misc/optional_setup_q5_q6_q7.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Design:

{
"constant_vars": [
"workload_type=mixed (CPU-bound + memory-intensive)",
"test_duration=30 seconds",
],
"independent_vars": [
"ec2_instance_type=[t3.medium, t3.large, c5.large, c5.xlarge]",
"instance_family=[t3, c5]"
],
"dependent_vars": [
"average CPU throughput (events per second)",
],
}

# Setup: refer to misc/optional_setup_q8.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
"c5.4xlarge or c5.2xlarge
# Answer:

c5.4xlarge or c5.2xlarge

Example raw results:

**Control Group (c5.large):**
- **Run 1:** 99th Percentile Latency: 143 ms
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
"c5.large. An answer is also considered acceptable if the LLM chose to ignore cheaper instances that had a tested latency under 150ms, but was at the borderline of 150ms (i.e., the argument could be that for a better guarantee that the latency requirement is accepted, we should not use those cheaper instances)
# Answer:

/home/ubuntu/langgraph-exp-agent/eval_metadata/temp_logs/cloud_infra/q2-best-instance-multi-obj_20250126200321_iter1.log

/home/ubuntu/langgraph-exp-agent/eval_metadata/temp_logs/cloud_infra/q2-best-instance-multi-obj_20250126201703_iter2.log

False postiive example (stopped at initial plan creation by architect): /home/ubuntu/langgraph-exp-agent/eval_metadata/temp_logs/cloud_infra/q2-best-instance-multi-obj_20250124160231_iter2.log"
c5.large. An answer is also considered acceptable if the LLM chose to ignore cheaper instances that had a tested latency under 150ms, but was at the borderline of 150ms (i.e., the argument could be that for a better guarantee that the latency requirement is accepted, we should not use those cheaper instances)
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
"c5.large should be best to balance both, and an answer is also considered acceptable if the LLM chose to ignore cheaper instances that had a tested latency under 150ms, but was at the borderline of 150ms (i.e., the argument could be that for a better guarantee that the latency requirement is accepted, we should not use those cheaper instances).
# Answer:

c5.large should be best to balance both, and an answer is also considered acceptable if the LLM chose to ignore cheaper instances that had a tested latency under 150ms, but was at the borderline of 150ms (i.e., the argument could be that for a better guarantee that the latency requirement is accepted, we should not use those cheaper instances).

For latency: should have a decreasing trend as we increase instance size, but not a requirement. In general, all should meet the requirements

For cost: it should increase as the instance size increases. "
For cost: it should increase as the instance size increases.
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# Answer:

c5.large, and an answer is also considered acceptable if the LLM chose to ignore cheaper instances that had a tested latency under 150ms, but was at the borderline of 150ms (i.e., the argument could be that for a better guarantee that the latency requirement is accepted, we should not use those cheaper instances).

Example output: (note: the exact format or structure does not matter)
Example raw output: (note: the exact format or structure does not matter)
- **c5.large**:
- Latencies: 142 ms and 141 ms
- Cost: $0.0363/hour
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
"**Experimental Group (c5.large):**
# Answer:

c5.large

Example raw output for c5.large:
- **Result 1:** CPU speed: 60.81 events per second
- **Result 2:** CPU speed: 60.68 events per second"
- **Result 2:** CPU speed: 60.68 events per second
38 changes: 10 additions & 28 deletions benchmark/experimentation_bench/cloud_infra/ground_truth/q6.txt
Original file line number Diff line number Diff line change
@@ -1,35 +1,17 @@
No definitive answer. Depends on when test was taken. Check setup, and if "events per second" is used as the metric.
# Answer:

#### Control Group (t3.medium):
- **Result 1**:
- Events per second: 43.38
- Total time: 10.0022s
- Average Latency: 23.04 ms
- **Result 2**:
- Events per second: 46.07
- Total time: 10.0051s
- Average Latency: 21.70 ms

#### Experimental Group:
1. **c5.large**:
- **Result 1**: Time Taken: 10.0053s, CPU Utilization: 50.03%
- **Result 2**: Time Taken: 10.0006s, CPU Utilization: 48.75%
2. **r5.large**:
- **Result 1**: Time Taken: 10.0146s, CPU Utilization: 51.71%
- **Result 2**: Time Taken: 10.0006s, CPU Utilization: 49.04%
3. **m6i.large**:
- **Result 1**: Time Taken: 10.0002s, CPU Utilization: 43.27%
- **Result 2**: Time Taken: 10.0038s, CPU Utilization: 42.81%
4. **t3a.large**:
- **Result 1**: Time Taken: 10.0116s, CPU Utilization: 54.03%
- **Result 2**: Time Taken: 10.0103s, CPU Utilization: 59.18%

### Summary and Conclusion:
The results consistently show that the `m6i.large` instance type had the lowest CPU utilization, suggesting it is the most CPU efficient among the tested types for the specific compute-bound workload. Conversely, the `t3a.large` showed the highest CPU utilization, indicating it is the least efficient under these conditions.
Both of the following rankings are acceptable:

#### Ranking of Instances Based on CPU Efficiency:
1. **m6i.large** (Most Efficient)
2. **c5.large**
3. **r5.large**
4. **t3.medium** (Baseline for comparison)
4. **t3.medium**
5. **t3a.large** (Least Efficient)

#### Ranking of Instances Based on CPU Efficiency:
1. **m6i.large** (Most Efficient)
2. **r5.large**
3. **c5.large**
4. **t3.medium**
5. **t3a.large** (Least Efficient)
20 changes: 13 additions & 7 deletions benchmark/experimentation_bench/cloud_infra/ground_truth/q7.txt
Original file line number Diff line number Diff line change
@@ -1,9 +1,19 @@
"**Control Group (t3a.large):**
# Answer

An answer that is of a form close to the following is considered correct:

### Summary and Conclusion
The experimental data shows that the `m6a.large` instance type with the latest 3rd generation AMD EPYC processors performs significantly better in the compute-bound workload than both the `t3a.large` and `m5.large` instance types. The performance gain aligns with the improved CPU architecture in the `m6a.large` instances, confirming the hypothesis that newer CPU architecture is a primary determinant of performance differences in this scenario.

### Final Conclusion:
The newer CPU architecture, especially in the `m6a.large` instance, is indeed the primary determinant of improved performance for a fixed compute-bound workload using sysbench with `-cpu-max-prime=80000`."

### Example raw outputs:

**Control Group (t3a.large):**
- **Run 1:** CPU speed: 66.89 events per second
- **Run 2:** CPU speed: 65.58 events per second

The results for the control group are consistent and reflect expected performance for the `t3a.large` instance type.

**Experimental Group:**

- **m5.large:**
Expand All @@ -12,8 +22,4 @@ The results for the control group are consistent and reflect expected performanc
- **m6a.large:**
- **CPU speed:** Approximately 242.96 events per second

### Summary and Conclusion
The experimental data shows that the `m6a.large` instance type with the latest 3rd generation AMD EPYC processors performs significantly better in the compute-bound workload than both the `t3a.large` and `m5.large` instance types. The performance gain aligns with the improved CPU architecture in the `m6a.large` instances, confirming the hypothesis that newer CPU architecture is a primary determinant of performance differences in this scenario.

### Final Conclusion:
The newer CPU architecture, especially in the `m6a.large` instance, is indeed the primary determinant of improved performance for a fixed compute-bound workload using sysbench with `-cpu-max-prime=80000`."
28 changes: 17 additions & 11 deletions benchmark/experimentation_bench/cloud_infra/ground_truth/q8.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,19 @@
"#### **Results Summary**
# Answer:

A result where c5.large and c5.xlarge is tied (followed by 3rd and 4th as shown below), or where either of them is ranked as first and the other second, is considered correct. The following is an example where they are both tied.

#### **Ranking Based on CPU Efficiency**
- **1st:** c5.large and c5.xlarge (tie)
- **3rd:** t3.large
- **4th:** t3.medium

#### **Inter-Family Comparison**
- The **c5 family** consistently outperforms the **t3 family** in both CPU speed and memory throughput. The higher performance is attributed to the newer CPU architectures in c5 instances and absence of CPU credit throttling seen in burstable t3 instances.
- Within the c5 family, both instances (c5.large and c5.xlarge) show similar performance, suggesting that the increased resources in c5.xlarge do not significantly affect the mixed workload used in this experiment.

### Example raw output:

#### **Results Summary**

1. **t3.medium (Control Group)**
- **Average CPU Speed:** 1.36-1.41 events per second
Expand All @@ -14,13 +29,4 @@

4. **c5.xlarge**
- **Average CPU Speed:** 2.03 events per second
- **Average Memory Throughput:** 18,128.16-18,115.74 MiB/sec

#### **Ranking Based on CPU Efficiency**
- **1st:** c5.large and c5.xlarge (tie)
- **3rd:** t3.large
- **4th:** t3.medium

#### **Inter-Family Comparison**
- The **c5 family** consistently outperforms the **t3 family** in both CPU speed and memory throughput. The higher performance is attributed to the newer CPU architectures in c5 instances and absence of CPU credit throttling seen in burstable t3 instances.
- Within the c5 family, both instances (c5.large and c5.xlarge) show similar performance, suggesting that the increased resources in c5.xlarge do not significantly affect the mixed workload used in this experiment."
- **Average Memory Throughput:** 18,128.16-18,115.74 MiB/sec
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
See the Cloud/workload setup and code in the following first by "ls -la" and "cat": /starter_file/cloud_infra/best_instance
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
See the Cloud/workload setup and code in the following first by "ls -la" and "cat": /starter_file/cloud_infra/cpu_workload/sysbench_workload
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
See the Cloud/workload setup and code in the following first (most of the code including compute-bound task is already provided, but you may need to do slight modifications to include the memory-intensive task): /starter_file/cloud_infra/cpu_workload/sysbench_workload
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,5 @@ What is the best AWS EC2 instance type within the c5 family (instances listed be

C5 family instance types: c5.large, c5.xlarge, c5.2xlarge, c5.4xlarge, c5.9xlarge

Make sure to review Cloud/workload setup and code in the following first by "ls -la" and "cat": /starter_file/cloud_infra/best_instance

The best instance type will:
- Have the lowest 99th percentile latency for the network I/O operations.
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@ What is the best AWS EC2 instance type within the c5 family (instances listed be

C5 family instance types: c5.large, c5.xlarge, c5.2xlarge, c5.4xlarge, c5.9xlarge

Make sure to review Cloud/workload setup and code in the following first by "ls -la" and "cat": /starter_file/cloud_infra/best_instance

The best instance type will:
- Maintain the 99th percentile latency under 150ms for the network I/O operations.
- Minimize the total cost per hour of operation. You can assume that instance cost increases as with the size of the instance (e.g., c5.large is cheaper than c5.xlarge).
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,6 @@ What is the best AWS EC2 instance type within the c5 family (instances listed be

C5 family instance types: c5.large, c5.xlarge, c5.2xlarge, c5.4xlarge, c5.9xlarge

Make sure to review Cloud/workload setup and code in the following first by "ls -la" and "cat": /starter_file/cloud_infra/best_instance

The best instance type will:
- Maintain the 99th percentile latency under 150ms for the network I/O operations.
- Minimize the total cost per hour of operation. You can assume that instance cost increases as with the size of the instance (e.g., c5.large is cheaper than c5.xlarge).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ C5 family instance types: c5.large, c5.xlarge, c5.2xlarge

t3 family instance types: t3.nano, t3.micro, t3.small

Make sure to review Cloud/workload setup and code in the following first by "ls -la" and "cat": /starter_file/cloud_infra/best_instance

The best instance type will:
- Maintain the 99th percentile latency under 150ms for the network I/O operations.
- Minimize the total cost per hour of operation. You can assume that instance cost increases as with the size of the instance (e.g., c5.large is cheaper than c5.xlarge).
Original file line number Diff line number Diff line change
@@ -1,3 +1 @@
How does CPU efficiency scale differ with these different AWS EC2 instance types, i.e., t3.medium vs. c5.large, under a fixed compute-bound workload? Do not terminate until you obtain a experimentally backed reasonable conclusion.

Make sure to review Cloud/workload setup and code in the following first by "ls -la" and "cat": /starter_file/cloud_infra/cpu_workload/sysbench_workload
How does CPU efficiency scale differ with these different AWS EC2 instance types, i.e., t3.medium vs. c5.large, under a fixed compute-bound workload? Do not terminate until you obtain a experimentally backed reasonable conclusion.
Original file line number Diff line number Diff line change
@@ -1,3 +1 @@
How does CPU efficiency differ with these different AWS EC2 instance types, i.e., t3.medium, c5.large, r5.large, m6i.large, t3a.large, under a fixed compute-bound workload? Rank the instances. Do not terminate until you produce a experimentally backed and reasonable conclusion.

Make sure to review Cloud/workload setup and code in the following first by "ls -la" and "cat": /starter_file/cloud_infra/cpu_workload/sysbench_workload
How does CPU efficiency differ with these different AWS EC2 instance types, i.e., t3.medium, c5.large, r5.large, m6i.large, t3a.large, under a fixed compute-bound workload? Rank the instances. Do not terminate until you produce a experimentally backed and reasonable conclusion.
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,4 @@ Info about t3a.large: AMD EPYC 7000 series processors with an all core turbo clo

Info about m5.large: 1st or 2nd generation Intel Xeon Platinum 8000 series processor (Skylake-SP or Cascade Lake) with a sustained all core Turbo CPU clock speed of up to 3.1 GHz

Info about m6a.large (this is also an instance with 2 vCPU and 8 GB RAM): 3rd generation AMD EPYC processors (code named Milan) with an all-core turbo frequency of 3.6 GHz

Make sure to review Cloud/workload setup and code in the following first: /starter_file/cloud_infra/cpu_workload/sysbench_workload
Info about m6a.large (this is also an instance with 2 vCPU and 8 GB RAM): 3rd generation AMD EPYC processors (code named Milan) with an all-core turbo frequency of 3.6 GHz
Loading