About

I am a PhD student at Imperial College London, advised by Prof. Wayne Luk and Prof. Paul Kelly. My research focuses on reconfigurable accelerators (FPGAs), AI acceleration, and compilers. I'm particularly interested in co-designing hardware and software — from low-precision number formats and quantisation, to MLIR-based compiler flows for systolic arrays.

I have previously worked at Fractile, Jump Trading, and Arm, and have presented my work at FCCM, Flatiron Institute, REACH, ScalPerf, ACACES, FDF, NANDA, FPT, and CERN FastML workshops.

Education

MEng Computer Engineering

Imperial College London — equivalent to summa cum laude

  • Distinguished Final Year Project (scored 88%).
  • Year 4: Ranked #1 of 65 — 1st Class Honours (84%).
  • Year 3: Ranked #4 of 65 — 1st Class Honours (80%).
  • Year 2: Ranked #1 of 74 — 1st Class Honours (83%).
  • Year 1: Ranked #1 of 74 — 1st Class Honours (80%).

Industrial Experience

Low-latency Researcher

Fractile

  • Evaluated latency and area trade-offs of block floating-point formats (Microscaling) for an analog in-memory-compute AI-accelerator ASIC.
  • Implemented and benchmarked low-latency paths as RISC-V (RVV) assembly kernels and SystemVerilog blocks to guide HW/SW partitioning.

Software & Hardware Engineer

Jump Trading

  • Created a Python library for exploring topology, configuring, connecting, and graphing multi-device (ASIC, FPGA, CPU, GPU) hardware systems.
  • Built an efficient, distributed arbitrage trading C++ application composed of several parallel processes running on ASICs, FPGAs, and x86 machines.
  • Configured and benchmarked formal verification using SystemVerilog and a novel custom Python tool for use in ASIC and FPGA development.

Software & Hardware Engineer

Jump Trading

  • Built an ultra low-latency ASIC validation test platform for floating-point calculations on x86 and RISC-V architectures using C++, C, and Python.

Systems Architect

Arm

  • Improved an autonomous driving platform's verification in SystemVerilog, overhauled its documentation, and added support for formal verification (JasperGold).

Research Projects

MX Attention on FPGA

Paper [FCCM] · Code

  • Built a parametrised SystemVerilog FPGA attention architecture with operator-wise MX-style block floating-point formats and configurable accumulation.
  • Developed a Python design-space exploration framework, finding designs with 29% fewer LUTs, 7% fewer FFs, and 0.42 lower perplexity than baseline.

Quantised TNN on FPGA

Paper [FPT] · Code

  • Developed a novel TNN architecture for GPUs (PyTorch) and FPGAs (HLS) for High Energy Physics experiments in collaboration with CERN.
  • FPGA solution outperformed SoTA models on GPU by ~1000× thanks to software/hardware-aware optimisations, without accuracy loss.
  • Experimented with quantisation-aware training (QAT) and developed a quick FPGA-friendly post-training quantisation (PTQ) scheme for HLS4ML.

Qubit Vision TNN on FPGA

Paper pending [IEEE TQE]

  • Designed a low-latency FPGA pipeline (CameraLink) for real-time qubit-state classification in collaboration with experimental quantum physicists.
  • Optimised the ViT architecture to achieve millisecond-scale end-to-end detection with up to ~120× lower latency than a GPU baseline.

Zero-shot Learning

Paper pending [ACM TECS]

  • Developed a lightweight zero-shot learning framework with attribute knowledge graphs, reducing parameters by ~100× while retaining accuracy.
  • Designed an FPGA accelerator for CNN feature extraction and attribute recognition, achieving ~67× speedup over a software-only baseline.

Systolic Array Compiler

Paper pending [ICCD] · Code

  • Developed an MLIR-based HLS framework for C/C++ and PyTorch programs targeting systolic-array accelerators using polyhedral optimisations.
  • Achieved state-of-the-art performance while enabling MLIR pass interchangeability.

Research Experience

Talks & Presentations

I have presented my work at:

Reviewer

I have served as a reviewer for:

Teaching Experience

  • Develop and maintain labs and coursework on building a C90-to-RISC-V compiler in C++.
  • Built automation for testing, benchmarking, reviewing, and environment deployment in Python — with an experience report planned for SIGCSE TS '27.
  • Provide support during lab sessions, mark coursework, and offer 1-to-1 tutorials for various modules.
  • Awarded the Top Undergraduate Teaching Assistant prize.

Skills

Programming

  • Python
  • C++
  • SystemVerilog
  • RISC-V Assembly

Tools & Technologies

  • ML frameworks: PyTorch, HLS4ML
  • Quantisation-aware training: Brevitas, QPyTorch
  • Hardware design & verification: Vivado, Quartus
  • Compiler infrastructure: MLIR, LLVM
  • Linux & version control: Bash, Git
  • Software profiling: Intel VTune Profiler, perf