Parallel and Distributed Computing

THIS PROJECT IS STILL ONGOING

currently working on Optimizing Cuda implemntation to beat CuPy perfomance on 3080

Parallel and Distributed Computing

High-performance implementations of matrix multiplication using different parallelization techniques, including OpenMP, MPI, and CUDA. The project focuses on optimizing computational efficiency, memory usage, and scalability across single-threaded, multi-threaded, distributed, and GPU-accelerated environments.

Features

Optimized Matrix Multiplication:
- Baseline implementation using a standard three-loop algorithm.
- Optimized cache-aware single-threaded version.
- SIMD vectorization using AVX (x86) and NEON (ARM) instructions for efficient CPU execution.
- Parallelized versions leveraging OpenMP and multi-threading.
- Distributed computation using MPI.
- GPU acceleration with CUDA.
Performance Benchmarks:
- Detailed comparison of execution times and optimizations.
- Analysis of speedup across different architectures.
- Scaling behavior from single-core to multi-core and distributed execution.

Technologies Used

C++ – Optimized CPU implementations
Clang – Recommended compiler (LLVM 18+)
AVX & NEON – SIMD optimizations for x86 and ARM architectures
OpenMP – Multithreading support
MPI (MPICH/OpenMPI) – Distributed computing across multiple nodes
CUDA – GPU-accelerated computation with NVIDIA GPUs

Getting Started

Prerequisites

Ensure you have the following installed:

Clang Compiler (LLVM 18+)
OpenMP (for CPU parallelization)
MPI (MPICH or OpenMPI)
CUDA Toolkit (for GPU execution)

Installation & Compilation

Single-Threaded & OpenMP Version

clang++ -O3 -march=native -fopenmp gemm.cpp -o compiled/gemm
./gemm

MPI Version

mpic++ -O3 -march=native mpi_matrix_mul.cpp -o mpi_matmul
mpirun -np 4 ./mpi_matmul

CUDA Version

nvcc  gemm.cu -o compiled/gemm 
./cuda_matmul

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
CpuGEMM		CpuGEMM
CudaGEMM		CudaGEMM
MPI		MPI
report		report
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

THIS PROJECT IS STILL ONGOING

Parallel and Distributed Computing

Features

Technologies Used

Getting Started

Prerequisites

Installation & Compilation

Single-Threaded & OpenMP Version

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

YousefMelhem/4DT906-Parallel-and-Distributed

Folders and files

Latest commit

History

Repository files navigation

THIS PROJECT IS STILL ONGOING

Parallel and Distributed Computing

Features

Technologies Used

Getting Started

Prerequisites

Installation & Compilation

Single-Threaded & OpenMP Version

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages