This repository benchmarks the performance of various dense (fully connected) layer implementations in Java using JMH (Java Microbenchmark Harness).
This project aims to evaluate and compare the performance of:
β Scalar (naive) implementation
β EJML matrix multiplication
β DeepLearning4j (DL4J)
β Java Vector API
β Java Vector API with Fused Multiply Add (FMA)
across varying input and output sizes, and multi-threaded environments.
| Implementation | Description |
|---|---|
| Scalar | Plain nested loop multiplication with bias addition |
| EJML | Matrix multiplication using EJML's FMatrixRMaj and CommonOps_FDRM |
| DL4J | Dense layer using DL4J with identity activation |
| Vector | SIMD-based computation using Java Vector API |
| FMA | SIMD-based computation using Java Vector API with Fused Multiply Add |
Two benchmark modes are evaluated:
AverageTime β time per operation (ΞΌs)
Throughput β operations per second (ops/Β΅s)
A sanity test is included to ensure correctness across all implementations. It verifies that outputs are numerically consistent (within float tolerance).
mvn clean install
java --enable-preview --add-modules jdk.incubator.vector -jar target/benchmark.jar \
-t 8 -rf csv -rff results_threads8.csv
You can also customize threads, benchmark mode and output format.