A bare bones ML framework that mimics how PyTorch handles neural network architectures. This was implemented with custom CUDA kernels that will be profiled and optimized using Nsight Compute Systems profiler from Nvidia.
git clone --recursive <repo_url>
./install_mnist.sh
Then from the project root:
mkdir build && cd build
cmake ..
make
./build/my_conv_app
It should work, and you can now try to change up the current network pipeline that is in the main.cu file.