About
I am a PhD student at
Imperial College London,
advised by
Prof. Wayne Luk
and
Prof. Paul Kelly.
My research focuses on reconfigurable accelerators (FPGAs),
AI acceleration, and compilers. I'm particularly interested in
co-designing hardware and software — from low-precision number formats and quantisation,
to MLIR-based compiler flows for systolic arrays.
I have previously worked at
Fractile,
Jump Trading, and
Arm,
and have presented my work at
FCCM,
Flatiron Institute,
REACH, ScalPerf,
ACACES,
FDF, NANDA,
FPT, and
CERN FastML workshops.
Education
- Distinguished Final Year Project (scored 88%).
- Year 4: Ranked #1 of 65 — 1st Class Honours (84%).
- Year 3: Ranked #4 of 65 — 1st Class Honours (80%).
- Year 2: Ranked #1 of 74 — 1st Class Honours (83%).
- Year 1: Ranked #1 of 74 — 1st Class Honours (80%).
Industrial Experience
-
Evaluated latency and area trade-offs of block floating-point formats
(Microscaling) for an analog in-memory-compute AI-accelerator ASIC.
-
Implemented and benchmarked low-latency paths as RISC-V (RVV) assembly
kernels and SystemVerilog blocks to guide HW/SW partitioning.
-
Created a Python library for exploring topology, configuring, connecting,
and graphing multi-device (ASIC, FPGA, CPU, GPU) hardware systems.
-
Built an efficient, distributed arbitrage trading C++ application composed
of several parallel processes running on ASICs, FPGAs, and x86 machines.
-
Configured and benchmarked formal verification using SystemVerilog and a
novel custom Python tool for use in ASIC and FPGA development.
-
Built an ultra low-latency ASIC validation test platform for floating-point
calculations on x86 and RISC-V architectures using C++, C, and Python.
-
Improved an autonomous driving platform's verification in SystemVerilog,
overhauled its documentation, and added support for formal verification
(JasperGold).
Research Projects
-
Built a parametrised SystemVerilog FPGA attention architecture with
operator-wise MX-style block floating-point formats and configurable accumulation.
-
Developed a Python design-space exploration framework, finding designs with
29% fewer LUTs, 7% fewer FFs, and
0.42 lower perplexity than baseline.
-
Developed a novel TNN architecture for GPUs (PyTorch) and
FPGAs (HLS) for High Energy Physics experiments in collaboration with
CERN.
-
FPGA solution outperformed SoTA models on GPU by ~1000× thanks to
software/hardware-aware optimisations, without accuracy loss.
-
Experimented with quantisation-aware training (QAT) and developed a quick
FPGA-friendly post-training quantisation (PTQ) scheme for
HLS4ML.
-
Designed a low-latency FPGA pipeline (CameraLink) for
real-time qubit-state classification in collaboration with experimental quantum physicists.
-
Optimised the ViT architecture to achieve millisecond-scale end-to-end
detection with up to ~120× lower latency than a GPU baseline.
-
Developed a lightweight zero-shot learning framework with attribute knowledge
graphs, reducing parameters by ~100× while retaining accuracy.
-
Designed an FPGA accelerator for CNN feature extraction
and attribute recognition, achieving ~67× speedup over a software-only baseline.
-
Developed an MLIR-based HLS framework for
C/C++ and PyTorch programs targeting
systolic-array accelerators using polyhedral optimisations.
-
Achieved state-of-the-art performance while enabling
MLIR pass interchangeability.
Research Experience
Talks & Presentations
I have presented my work at:
Reviewer
I have served as a reviewer for:
Teaching Experience
-
Develop and maintain labs
and coursework on building a
C90-to-RISC-V compiler in C++.
-
Built automation for testing, benchmarking, reviewing, and environment
deployment in Python — with an experience report planned for
SIGCSE TS '27.
-
Provide support during lab sessions, mark coursework, and offer 1-to-1 tutorials for various modules.
-
Awarded the Top Undergraduate Teaching Assistant prize.
Skills
Programming
- Python
- C++
- SystemVerilog
- RISC-V Assembly
Tools & Technologies
- ML frameworks: PyTorch, HLS4ML
- Quantisation-aware training: Brevitas, QPyTorch
- Hardware design & verification: Vivado, Quartus
- Compiler infrastructure: MLIR, LLVM
- Linux & version control: Bash, Git
- Software profiling: Intel VTune Profiler, perf