Projects

Selection of projects that I have worked on during my research and degree

LOW-BIT OPS FOR BLOCK FLOATING POINT

  • Designed a bit-accurate hardware–software (C++/SystemVerilog) framework for compensated accumulation with MX-like formats as adder trees
  • Built an automated FPGA design-space exploration (DSE) tool to suggest Pareto-optimal MX-like accumulators

TRANSFORMER NEURAL NETWORK ON FPGA

  • Developed a Transformer Neural Network on GPU (PyTorch) and FPGA (HLS) for High Energy Physics experiments at CERN, integrating into HLS4M framework.
  • FPGA solution outperformed state-of-the-art models on GPU by ~1000 times, experimented with multiple quantisation-aware training schemes, developed novel hardware-aware optimisations and post-training quantisation techniques.

QUBIT VISION TRANSFORMER ON FPGA

  • Designed a low-latency FPGA pipeline (CameraLink) for real-time qubit-state classification in collaboration with experimental quantum physicists.
  • Optimised the ViT architecture to achieve millisecond-scale end-to-end detection with up to ~120 times lower latency than a GPU baseline.

LIGHTWEIGHT ZERO-SHOT LEARNING

  • Developed a lightweight zero-shot learning framework with attribute knowledge graphs, reducing parameters by ~100 times while retaining accuracy.
  • Designed an accelerator on FPGA for CNN feature extraction and attribute recognition, achieving ~67 times speedup over a software-only baseline.

POLYHEDRAL SYSTOLIC ARRAY COMPILER

  • Developed an MLIR-based HLS framework that maps C/C++ and PyTorch programs to systolic-array accelerators using polyhedral optimisations
  • Achieved SoTA performance while enabling MLIR pass interchangeability

C90 LANGUAGE COMPILER AND TRANSLATOR

  • Created a C to MIPS compiler in C++ with advanced features like structures, mutually recursive functions, N-d arrays, strings, and pointer arithmetic.
  • The program also performed translation from C90 to Python.

FPGA COMPUTATION ACCELERATION

  • Applied dedicated hardware to general purpose digital systems to accelerate computations >99% and investigates trade-offs (resources vs latency).
  • Used optimization techniques in software (C, C++) and hardware (Verilog), including custom bit widths, pipelining and direct memory access (DMA).

SOFTWARE OPTIMIZATION FOR GBP ALGORITHM

  • Reduced the execution time of C++ implementation of Gaussian Belief Propagation algorithm by 77% after profiling with Intel VTune Profiler.
  • Used optimization methods included parallelization, vectorization, loop/array tiling, unrolling, adjustments, type simplification and memoization.

HARDWARE DESIGN AND VERIFICATION

  • Created and modified AHB protocol compliant modules in SystemVerilog.
  • Designed unit-level (SystemVerilog), formal (JasperGold) and top-level integration (ARM assembly) testbenches along with a verification plan.

DEEP LEARNING ARCHITECTURES LANDSCAPE

  • Created building blocks and implemented CNN (ResNet), generative (VAE, GAN) and RNN (LSTM, GRU, bidirectional) architectures using Pytorch.
  • Achieved high accuracy in image classification and speech recognition; generated realistic images from random latent space.

POOL SHOT PREDICTION SYSTEM ON FPGA

  • Designed image processing system analysing real-time 1080p video with custom hardware by performing high level synthesis in Vivado HLS.
  • Obtained formulae for calculating trajectory and rebounds in Python.

PRICE PREDICTION WITH NEURAL NETWORKS

  • Created neural network library in Python implementing preprocessing, back propagation, different activation functions, training and evaluation.
  • Used that library to design an architecture for predicting house prices.

PROBABILISTIC ROBOTICS

  • Worked with Lua in CoppeliaSim robot simulator to verify the accuracy and speed of robot designs in combination with different algorithms.
  • Implemented local and global Monte Carlo Localization algorithm that allowed a robot to move in a maze solely based on sonar readings.

COVID-19 TRACING SYSTEM

  • Designed an architecture of a P2P system responsible for tracking virus`s transmission based on the NHS requirements and true scale.

MIPS CPU SIMULATOR

  • Implemented 32-bit MIPS CPU architecture simulator capable of instruction parsing, memory management and simulating data paths in C++.
  • Automated compilation, linking and executing binaries with the help of GNU toolchain and Bash scripting.

FAKE NEWS DETECTOR

  • Created a Chrome extension that analyses an article using Natural Language Processing and checks it against web scraped news sources using Python.

EVOLUTIONARY ALGORITHM FOR MASTERMIND

  • Gained insight into evolutionary algorithms based on numerous research papers, which led to choosing Particle Swarm Optimisation as the most suitable solution. Implemented in C++.