Rewrite documentation (#233)

dice-group · Jun 10, 2024 · 318b96a · 318b96a
1 parent 4a6a146
commit 318b96a
Show file tree

Hide file tree

Showing 13 changed files with 1,241 additions and 202 deletions.
diff --git a/README.md b/README.md
@@ -1,82 +1,62 @@
-# IGUANA
-
-[![ci](https://github.com/dice-group/IGUANA/actions/workflows/ci.yml/badge.svg)](https://github.com/dice-group/IGUANA/actions/workflows/ci.yml)
-
 <p align="center">
     <img src="https://github.com/dice-group/IGUANA/raw/develop/images/IGUANA_logo.png" alt="IGUANA Logo" width="200">
 </p>
-Iguana is an integrated suite for benchmarking the read/write performance of HTTP endpoints and CLI Applications.
-
-It provides an environment which ...
-
-* is highly configurable
-* provides a realistic scenario benchmark
-* works on every dataset
-* works on SPARQL HTTP endpoints
-* works on HTTP Get & Post endpoints
-* works on CLI applications
-* and is easily extendable
 
-For further information visit:
-- [iguana-benchmark.eu](http://iguana-benchmark.eu)
-- [Documentation](http://iguana-benchmark.eu/docs/3.3/)
-
-### Available metrics
+# IGUANA
+Iguana is a benchmarking framework for testing the read performances of HTTP endpoints.
+It is mostly designed for benchmarking triplestores by using the SPARQL protocol.
+Iguana stresstests endpoints by simulating users which send a set of queries independently of each other.
 
-Per run metrics:
-* Query Mixes Per Hour (QMPH)
-* Number of Queries Per Hour (NoQPH)
-* Number of Queries (NoQ)
-* Average Queries Per Second (AvgQPS)
-* Penalized Average Queries Per Second (PAvgQPS)
+Benchmarks are configured using a YAML-file, this allows them to be easily repeated and adjustable.
+Results are stored in RDF-files and can also be exported as CSV-files.
 
-Per query metrics:
-* Queries Per Second (QPS)
-* Penalized Queries Per Second (PQPS)
-* Number of successful and failed queries
-* result size
-* queries per second
-* sum of execution times
+## Features
+- Benchmarking of (SPARQL) HTTP endpoints
+- Reusable configuration
+- Calculation of various metrics for better comparisons
+- Processing of HTTP responses (e.g., results counting)
 
-## Setup Iguana
+## Setup
 
 ### Prerequisites
+You need to have `Java 17` or higher installed.
+On Ubuntu it can be installed by executing the following command:
 
-In order to run Iguana, you need to have `Java 17`, or greater, installed on your system.
+```bash
+sudo apt install openjdk-17-jre
+``` 
 
 ### Download
-Download the newest release of Iguana [here](https://github.com/dice-group/IGUANA/releases/latest), or run on a unix shell:
-
-```sh
-wget https://github.com/dice-group/IGUANA/releases/download/v4.0.0/iguana-4.0.0.zip
-unzip iguana-4.0.0.zip
-```
+The latest release can be downloaded at https://github.com/dice-group/IGUANA/releases/latest.
+The zip file contains three files:
 
-The zip file contains the following files:
-
-* `iguana-X.Y.Z.jar`
-* `start-iguana.sh`
+* `iguana-4.0.0.jar`
 * `example-suite.yml`
+* `start-iguana.sh`
 
-### Create a Configuration
-
-You can use the provided example configuration and modify it to your needs.
-For further information please visit our [configuration](http://iguana-benchmark.eu/docs/3.2/usage/configuration/) and [Stresstest](http://iguana-benchmark.eu/docs/3.0/usage/stresstest/) wiki pages.
-
-For a detailed, step-by-step instruction through a benchmarking example, please visit our [tutorial](http://iguana-benchmark.eu/docs/3.2/usage/tutorial/).
-
-### Execute the Benchmark
+### Configuration
+The `example-suite.yml` file contains an extensive configuration for a benchmark suite.
+It can be used as a starting point for your own benchmark suite.
+For a detailed explanation of the configuration, see the [configuration](./configuration/overview.md) documentation.
 
-Start Iguana with a benchmark suite (e.g. the example-suite.yml) either by using the start script:
+## Usage
+Start Iguana with a benchmark suite (e.g., the `example-suite.yml`) either by using the start script:
 
-```sh
+```bash
 ./start-iguana.sh example-suite.yml
 ```
 
 or by directly executing the jar-file:
 
-```sh
-java -jar iguana-x-y-z.jar example-suite.yml
+```bash
+java -jar iguana-4.0.0.jar example-suite.yml
+```
+
+If you're using the script, you can use JVM arguments by setting the environment variable `IGUANA_JVM`.
+For example, to let Iguana use 4GB of RAM you can set `IGUANA_JVM` as follows:
+
+```bash
+export IGUANA_JVM=-Xmx4g
 ```
 
 # How to Cite

diff --git a/docs_new/README.md b/docs_new/README.md
@@ -0,0 +1,88 @@
+<p align="center">
+    <img src="https://github.com/dice-group/IGUANA/raw/develop/images/IGUANA_logo.png" alt="IGUANA Logo" width="200">
+</p>
+
+# IGUANA
+Iguana is a benchmarking framework for testing the read performances of HTTP endpoints.
+It is mostly designed for benchmarking triplestores by using the SPARQL protocol.
+Iguana stresstests endpoints by simulating users which send a set of queries independently of each other.
+
+Benchmarks are configured using a YAML-file, this allows them to be easily repeated and adjustable.
+Results are stored in RDF-files and can also be exported as CSV-files.
+
+## Features
+- Benchmarking of (SPARQL) HTTP endpoints
+- Reusable configuration
+- Calculation of various metrics for better comparisons
+- Processing of HTTP responses (e.g., results counting)
+
+## Setup
+
+### Prerequisites
+You need to have `Java 17` or higher installed.
+On Ubuntu it can be installed by executing the following command:
+
+```bash
+sudo apt install openjdk-17-jre
+``` 
+
+### Download
+The latest release can be downloaded at https://github.com/dice-group/IGUANA/releases/latest.
+The zip file contains three files:
+
+* `iguana-4.0.0.jar`
+* `example-suite.yml`
+* `start-iguana.sh`
+
+### Configuration
+The `example-suite.yml` file contains an extensive configuration for a benchmark suite.
+It can be used as a starting point for your own benchmark suite.
+For a detailed explanation of the configuration, see the [configuration](./configuration/overview.md) documentation.
+
+## Usage
+Start Iguana with a benchmark suite (e.g., the `example-suite.yml`) either by using the start script:
+
+```bash
+./start-iguana.sh example-suite.yml
+```
+
+or by directly executing the jar-file:
+
+```bash
+java -jar iguana-4.0.0.jar example-suite.yml
+```
+
+If you're using the script, you can use JVM arguments by setting the environment variable `IGUANA_JVM`.
+For example, to let Iguana use 4GB of RAM you can set `IGUANA_JVM` as follows:
+
+```bash
+export IGUANA_JVM=-Xmx4g
+```
+
+# How to Cite
+
+```bibtex
+@InProceedings{10.1007/978-3-319-68204-4_5,
+author="Conrads, Lixi
+and Lehmann, Jens
+and Saleem, Muhammad
+and Morsey, Mohamed
+and Ngonga Ngomo, Axel-Cyrille",
+editor="d'Amato, Claudia
+and Fernandez, Miriam
+and Tamma, Valentina
+and Lecue, Freddy
+and Cudr{\'e}-Mauroux, Philippe
+and Sequeda, Juan
+and Lange, Christoph
+and Heflin, Jeff",
+title="Iguana: A Generic Framework for Benchmarking the Read-Write Performance of Triple Stores",
+booktitle="The Semantic Web -- ISWC 2017",
+year="2017",
+publisher="Springer International Publishing",
+address="Cham",
+pages="48--65",
+abstract="The performance of triples stores is crucial for applications driven by RDF. Several benchmarks have been proposed that assess the performance of triple stores. However, no integrated benchmark-independent execution framework for these benchmarks has yet been provided. We propose a novel SPARQL benchmark execution framework called Iguana. Our framework complements benchmarks by providing an execution environment which can measure the performance of triple stores during data loading, data updates as well as under different loads and parallel requests. Moreover, it allows a uniform comparison of results on different benchmarks. We execute the FEASIBLE and DBPSB benchmarks using the Iguana framework and measure the performance of popular triple stores under updates and parallel user requests. We compare our results (See https://doi.org/10.6084/m9.figshare.c.3767501.v1) with state-of-the-art benchmarking results and show that our benchmark execution framework can unveil new insights pertaining to the performance of triple stores.",
+isbn="978-3-319-68204-4"
+}
+```
diff --git a/docs_new/configuration/language_processor.md b/docs_new/configuration/language_processor.md
@@ -0,0 +1,15 @@
+# Language Processor
+
+Language processors are used to process the response bodies of the HTTP requests that are executed by the workers. 
+The processing is done to extract relevant information from the responses and store them in the results.
+
+Language processors are defined by the content type of the response body they process.
+They cannot be configured directly in the configuration file, but are used by the response body processors.
+
+Currently only the `SaxSparqlJsonResultCountingParser` language processor is supported for the `application/sparql-results+json` content type.
+
+## SaxSparqlJsonResultCountingParser
+
+The `SaxSparqlJsonResultCountingParser` is a language processor used to extract simple information from the responses of SPARQL endpoints that are in the `application/sparql-results+json` format.
+It counts the number of results, the number of variables, 
+and the number of bindings from the response of a `SELECT` or `ASK` query.
diff --git a/docs_new/configuration/metrics.md b/docs_new/configuration/metrics.md
@@ -0,0 +1,84 @@
+# Metrics
+
+Metrics are used to measure and compare the performance of the system during the stresstest.
+They are divided into task metrics, worker metrics, and query metrics.
+
+Task metrics are calculated for every query execution across the whole task.
+Worker metrics are calculated for every query execution of one worker.
+Query metrics are calculated for every execution of one query across one worker and across every worker.
+
+For a detailed description of how results for tasks, workers and queries are reported in the RDF result file, please refer to the section [RDF results](rdf_results.md).
+
+## Configuration
+
+The metrics are configured in the `metrics` section of the configuration file.
+To enable a metric, add an entry to the `metrics` list with the `type` of the metric.
+Some metrics (`PQPS`, `PAvgQPS`) require the configuration of a `penalty` value,
+which is the time in milliseconds that a failed query will be penalized with.
+
+```yaml
+metrics:
+  - type: "QPS"
+  - type: "AvgQPS"
+  - type: "PQPS"
+    penalty: 180000 # in milliseconds
+```
+
+If the `metrics` section is not present in the configuration file, the following **default** configuration is used:
+```yaml
+metrics:
+  - type: "AES"
+  - type: "EachQuery"
+  - type: "QPS"
+  - type: "AvgQPS"
+  - type: "NoQ"
+  - type: "NoQPH"
+  - type: "QMPH"
+```
+
+## Available metrics
+
+| Name                                 | Configuration type | Additional parameters       | Scope        | Description                                                                                                                                                                                                                                                                                                                                                                   |
+|--------------------------------------|--------------------|-----------------------------|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Queries per second                   | `QPS`              |                             | query        | The number of successfully executed queries per second. It is calculated by dividing the number of successfully executed queries                                                                                                                                                                                                                                              |
+| Average queries per second           | `AvgQPS`           |                             | task, worker | The average number of queries successfully executed per second. It is calculated by dividing the sum of the QPS values of every query the task or worker has by the number of queries.                                                                                                                                                                                        |
+| Number of queries                    | `NoQ`              |                             | task, worker | The number of successfully executed queries. This metric is calculated for each worker and for the whole task.                                                                                                                                                                                                                                                                |
+| Number of queries per hour           | `NoQPH`            |                             | task, worker | The number of successfully executed queries per hour. It is calculated by dividing the number of successfully executed queries by their sum of time (in hours) it took to execute them. The metric value for the task is the sum of the metric for each worker.                                                                                                               |
+| Query mixes per hour                 | `QMPH`             |                             | task, worker | The number of query mixes executed per hour. A query mix is the set of queries executed by a worker, or the whole task. This metric is calculated for each worker and for the whole task. It is calculated by dividing the number of successfully executed queries by the number of queries inside the query mix and by their sum of time (in hours) it took to execute them. |
+| Penalized queries per second         | `PQPS`             | `penalty` (in milliseconds) | query        | The number of queries executed per second, penalized by the number of failed queries. It is calculated by dividing the number of successful and failed query executions by their sum of time (in seconds) it took to execute them. If a query fails, the time it took to execute it is set to the given `penalty` value.                                                      |
+| Penalized average queries per second | `PAvgQPS`          | `penalty` (in milliseconds) | task, worker | The average number of queries executed per second, penalized by the number of failed queries. It is calculated by dividing the sum of the PQPS of each query the task or worker has executed by the number of queries.                                                                                                                                                        |
+| Aggregated execution statistics      | `AES`              |                             | task, worker | _see below_                                                                                                                                                                                                                                                                                                                                                                   |
+| Each execution statistic             | `EachQuery`        |                             | query        | _see below_                                                                                                                                                                                                                                                                                                                                                                   |
+
+## Other metrics
+
+### Aggregated Execution Statistics (AES)
+This metric collects for each query that belongs to a worker or a task a number of statistics
+that are aggregated for each execution.
+
+| Name                | Description                                                  |
+|---------------------|--------------------------------------------------------------|
+| `succeeded`         | The number of successful executions.                         |
+| `failed`            | The number of failed executions.                             |
+| `resultSize`        | The size of the HTTP response. (only stores the last result) |
+| `timeOuts`          | The number of executions that resulted with a timeout.       |
+| `wrongCodes`        | The number of HTTP status codes received that were not 200.  |
+| `unknownExceptions` | The number of unknown exceptions during execution.           |
+| `totalTime`         | The total time it took to execute the queries.               |
+
+The `resultSize` is the size of the HTTP response in bytes and is an exception to the aggregation.
+
+### Each Execution Statistic (EachQuery)
+This metric collects statistics for each execution of a query. 
+
+| Name           | Description                                                                                               |
+|----------------|-----------------------------------------------------------------------------------------------------------|
+| `run`          | The number of the execution.                                                                              |
+| `startTime`    | The time stamp where the execution started.                                                               |
+| `time`         | The time it took to execute the query.                                                                    |
+| `success`      | If the execution was successful.                                                                          |
+| `code`         | Numerical value of the end state of the execution. (success=0, timeout=110, http_error=111, exception=1)  |
+| `resultSize`   | The size of the HTTP response.                                                                            |
+| `exception`    | The exception that occurred during execution. (if any occurred)                                           |
+| `httpCode`     | The HTTP status code received. (if any was received)                                                      |
+| `responseBody` | The hash of the HTTP response body. (only if `parseResults` inside the stresstest has been set to `true`) |