Release 1.1.2 Release · coreylammie/MemTorch

Added

C++ and CUDA bindings for memtorch.bh.crossbar.Tile.tile_matmul.

Using an NVIDIA GeForce GTX 1080, a tile shape of (25, 25), and two tensors of size (500, 500), the runtime of tile_matmul without quantization support is reduced by 2.45x and 5.48x, for CPU-bound and GPU-bound operation, respectively. With an ADC resolution of 4 bits and an overflow rate of 0.0, the runtime of tile_matmul with quantization support is reduced by 2.30x and 105.27x, for CPU-bound and GPU-bound operation, respectively.

Implementation	Runtime Without Quantization Support (s)	Runtime With Quantization Support (s)
Pure Python (Previous)	6.917784	27.099764
C++ (CPU-bound)	2.822265	11.736974
CUDA (GPU-bound)	1.262861	0.2574267

Eigen integration with C++ and CUDA bindings.
Additional unit tests.

Enhanced

Modularized C++ and CUDA quantize bindings.
Enhanced functionality of naive_progam and added additional input arguments to dictate logic for stuck devices.

Fixed

Removed debugging code from naive_progam.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.1.2 Release

Added

Enhanced

Fixed