Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a load testing command #135

Open
StoneDot opened this issue Jun 18, 2023 · 7 comments
Open

Create a load testing command #135

StoneDot opened this issue Jun 18, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@StoneDot
Copy link
Contributor

StoneDot commented Jun 18, 2023

Load testing command

Background

The load testing command is useful in understanding DynamoDB behaviors, for example, throttling, auto-scaling, metrics, etc. Also, it helps users to investigate an application's behavior when throttling happens.

Proposed design

The decisions in the implementations are the followings;

  • The amount of request traffic is controlled by leaky bucket algorithm with a feedback loop that adjusts the next amount of acquisition by actual consumed capacity.
  • The current consumed capacity is updated and presented in real time. But, in the first implementation, we will omit visualization like a graph.
  • To prevent consuming capacity unintentionally, RCU and WCU must be provided by the user.
  • The internal request manager controls the maximum parallel request to DynamoDB. It has a responsibility to scale in or out the number of parallel requests. It scales requests exponentially with base 2.

Interface

At first implementation, load testing functionality is provided with the command, dy bench run or dy benchmark run and provided options are the following;

  • --rcu <number>: Specify target RCU when reading items. This is a required argument.
  • --wcu <number>: Specify target WCU when writing items. This is a required argument if you do not provide --skip-item-createion.
  • --size <number>: The preferred size of an attribute in bytes. The default value is 500.
  • --skip-item-creation: By default, dynein creates items first for the writing test, and then, performs the read tests by using created items. This option skips wcu testing and uses the data stored on the table.
  • --partition-key-variations <number>: The maximum number of primary key variations of items. The default value is 1000.
  • --sort-key-variations <number>: The maximum number of sort key variations of items. The default value is 100.
  • --duration-write <number>: The duration of the write testing. The default value is five minutes.
  • --duration-read <number>: The duration of the read testing. The default value is five minutes.

Common options like --table, --region, etc are considered as well as other commands.

We use a bench run subcommand for initial implementation. Please note that we have room of feature enhancements. For example, we can use dy bench run -s <scenario-file> for scenario based tests and dy bench report <report-file> for showing a result of a test.

The workflow

The workflow of the load testing is schematically described as the followings;

  1. Based on the --item-variations argument, create a list of primary keys to use in the test. In the case in which --skip-item-creation is provided, Scan APIs are invoked to list primary keys. We must use parallel scans because sequential scans create a hot partition.
  2. Based on the --wcu argument, PutItem are invoked with the primary keys created by the first step for the duration of --duration-write. An item created has an additional string attribute with --size bytes.
  3. Based on the --rcu argument, GetItem are invoked with the primary keys created by the first step for the duration of --duration-read.
@StoneDot StoneDot self-assigned this Jun 18, 2023
@ryota-sakamoto
Copy link
Contributor

Thank you for creating proposal of great feature.

I think other command have followed the format like dy <verb> or dy <command> <verb> in general. What kind of other sub command do you have rather than simple?

@StoneDot
Copy link
Contributor Author

I think other command have followed the format like dy or dy in general. What kind of other sub command do you have rather than simple?

I have some ideas regarding scenario base benchmarking. I suppose it will be invoked by dy benchmark scenario command. Its command style is the same as dy admin create table. I understand that it is a little awkward as an English phrase, but I feel dy benchmark table simply is a little verbose. I am willing to take in good suggestions for the command name.

@StoneDot
Copy link
Contributor Author

I mention the YCSB command style as an option. I think it will be dy benchmark load to load the data and dy benchmark run to run the workload if we implement its style in dynein. The pros are compatibility with YCSB, and the cons are that we should separately run loading and testing. But I prefer dy benchmark simple.

@ryota-sakamoto
Copy link
Contributor

ryota-sakamoto commented Jun 23, 2023

I think we need to provide some command like show result of load testing.
I'm not sure how to run scenario base test for now. But I have two ideas that we provide simple test and scenario base test.

all in one

The idea is that we can run simple test and scenario base test within one command. If we specify the test file to run scenario base, I can imagine kind of command as follows. It is just simple interface.

# simple test
$ dy load run --rcu 100 --wcu 5

# scenario base test
$ dy load run -s <scenario-file>

# show result of load test
$ dy load report <report-file>

split command

The idea is that we provide two command load and benchmark. The role of each command is clearly.

# simple test
$ dy load run --rcu 100 --wcu 5
$ dy load report <report-file>
# scenario base test
$ dy benchmark run <scenario-file>
$ dy benchmark report <report-file>

@StoneDot
Copy link
Contributor Author

I personally find the -s option to be a clear and effective way of specifying scenario-based testing. Also, it makes sense to split the run and report commands. Thank you for your suggestion. However, I'm a bit concerned that the load argument might confuse users since it has multiple meanings. In other words, I worry that users might mix up loading the data and loading DynamoDB for stress testing.

In my opinion, using the term benchmark (maybe even a shorter version like bench) would be clearer than load. What do you think?

Additionally, I would like to propose the following commands:

# Perform a simple test
$ dy bench run --rcu 100 --wcu 5

# Conduct a scenario-based test (not implemented in the initial phase)
$ dy bench run -s <scenario-file>

# Generate a report for the load test (not implemented in the initial phase)
$ dy bench report <report-file>

Please let me know what you think about these suggestions and proposed commands.

@ryota-sakamoto
Copy link
Contributor

I agree with you. The idea that using benchmark or bench instead of load is clearly and easy to understand.

@StoneDot StoneDot added the enhancement New feature or request label Aug 24, 2023
@StoneDot
Copy link
Contributor Author

Based on the internal discussion with Solution Architect, the following features are preferable.

  • Specify the maximum number of concurrent requests instead of RCU and WCU.
  • Specify primary keys to use load testing.

He want similar functionality as what the following project provides.
https://github.com/aws-samples/dynamodb-consumed-capacity-check-tool

@StoneDot StoneDot added this to the v0.4.0 milestone Sep 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants