- R5SEBA – An N-way out-of-order RISC-V processor with speculative execution, branch recovery, and associative caches
- ConvoLite – A CUDA-accelerated deep learning engine with shared memory tiling and pipelined convolution
- μTracker – A microarchitectural profiling framework with integrated testbenches and RTL debug automation
📞 Live walkthroughs available by request.
Most of my work is private to protect intellectual property and maintain academic integrity.
I do not share sensitive source code publicly, especially coursework or competitive designs. However, I'm happy to:
- Share proof of authorship and commit history
- Walk through projects live over a verified call
- Discuss design, test, and debug methodology
🛡️ Contact me for verified access or demonstrations.
🔬 Architecture & RTL Design
- Superscalar, out-of-order, speculative execution pipelines
- Tomasulo's algorithm, register renaming, CDB, PRF, ROB
- Associative and non-blocking cache design (write-back, write-allocate)
- RTL: SystemVerilog, Verilog, Chisel (exploring)
🛠️ Hardware Design & Verification
- Synthesis, P&R, timing closure, static timing analysis
- ASIC & FPGA workflows — Vivado, Quartus, ModelSim, Verdi
- Formal verification via JasperGold, plus UVM-style validation
- CMOS and VLSI design techniques
⚡ High-Performance Computing & Parallelism
- CUDA (shared memory tiling, coalesced access, INT8 quantization)
- OpenMP, multithreading, low-latency pipelines
- Real-time image/video processing on FPGAs (30 FPS @ high-res)
🧠 Software & Simulation
- C/C++, Python, Bash, MATLAB, TCL, Perl
- Embedded systems (Arduino, STM32, custom SoCs)
- Web development (HTML/CSS/JS, Flask, Node, Firebase)
- Linux internals, kernel mods, custom drivers, networking stacks
- Database design, file system tweaking, cross-arch simulation
🧰 Tools & Platforms
- Cadence Virtuoso, Synopsys Verdi, Vivado, Quartus, SLURM, Makefiles
- Git, GitHub, GitOps workflows, CI/CD
- OS-native scripting (Linux, macOS, Windows PowerShell)
🧩 AI/ML Hardware Optimization
- ResNet-50 inference acceleration with TensorRT
- INT8 quantization, CUDA tensor cores
- 4x+ speedup over CPU baselines for real-time inference
🧑🔬 Meta-Skills
- 🧠 Extreme learning agility — I pick up new stacks like I’ve been doing them for years
- 🔧 Builder’s instinct — from fixing engines to optimizing memory subsystems
- 💡 Hyper-resourceful — nothing gets blocked, everything gets solved
- 🔥 Full-stack to full-system — I don't need a framework. I am the framework
📧 Email: [email protected]
· [email protected]
If you see it like a video game, where the daily and weekly "bonuses" are less bugs and less trouble, it becomes much easier to make even one small improvement daily which compounds over time!