Fastest CPU (AVX/SSE) implementation of a 128-pixel Box Blur. For even more speed see the CUDA version: github.com/komrad36/CUDABoxBlur All functionality is contained in BoxBlur.h. 'main.cpp' is a demo and test harness.