Zhiqiang (Walkie) Que

Huxley Building, Imperial College London, London, UK, SW7 2BZ
Email: z.que [ at ] imperial.ac.uk

I am a Research Associate in the Custom Computing Research Group in the Department of Computing at Imperial, focusing on domain-specific hardware architecture and design automation, particularly for Machine Learning (ML).

With over 12 years of experience spanning academia and industry, I previously worked as a Senior Engineer at Marvell Semiconductor, specializing in CPU microarchitecture design and verification, and later an FPGA Specialist at China Financial Futures Exchange, where I developed low-latency FinTech computing systems.

I obtained my PhD from Imperial College London under the supervision of Prof. Wayne Luk. I received my B.S. in Microelectronics and M.S. in Computer Science from Shanghai Jiao Tong University (SJTU) in 2008 and 2011 respectively.

My research centers on domain-specific hardware architectures and design automation for AI/ML workloads, leading to numerous high-impact publications in top journals such as TECS, TVLSI, TRETS, TPDS, and TCAD, as well as premier conferences including DAC, ICCAD, ASP-DAC, FCCM, FPL, ASAP, and FPT. I have received Best Paper Award Nominations at leading conferences, including FCCM’20, ASAP’19, FPT’19, and FPT’18, and have authored 48 peer-reviewed publications (12 journal articles and 36 conference papers) in AI/ML hardware architecture and automation.

I also lead collaborative research projects, including the LL-GNN project with CERN, ETH Zurich, UCSD, and AMD, and the RENOWN project, which develops a Neural Processing Unit (NPU) in collaboration with Tokyo Tech and Intel, advancing interdisciplinary research in ML hardware and system optimization. In addition, I actively contribute to the research community as a Technical Program Committee (TPC) member for leading conferences, including DAC, DATE, and FPT.

Some News

  • Mar. 2025 - Accepted by HEART 2025: Trustworthy Deep Learning Acceleration with Customizable Design Flow Automation

  • Mar. 2025 - Invited to serve as the Associate Editor for ACM Transactions on Reconfigurable Technology and Systems (TRETS) Journal Track for FPT2025, welcome to submit!

  • Mar. 2025 - Invited to serve on the FPT 2025 TPC, welcome to submit!

  • Dec. 2024 - FPT 2024: Optimizing DNN Accelerator Compression Using Bayesian Guided Tolerable Accuracy Loss

  • Nov. 2024 - Invited to serve on the DAC 2025 TPC, welcome to submit!

  • Oct. 2024 - Accepted by IEEE 17th International Conference on Solid-State & Integrated Circuit Technology: Deep Learning Design-Flow with Static and Dynamic Optimizations
    [DOI]

  • Sept. 2024 - Invited to serve on the DATE 2025 TPC, welcome to submit!

  • Aug. 2024 - ITC-Asia 2024: Trustworthy Codesign by Verifiable Transformations
    [DOI]

  • Jul. 2024 - Invited to serve on the FPT 2024 TPC, welcome to submit!

  • Jul. 2024 - Accepted by Machine Learning: Science and Technology (MLST): Ultrafast jet classification at the HL-LHC
    [DOI] [PDF]

  • Apr. 2024 - Accepted by IEEE Journal on Emerging and Selected Topics in Circuits and Systems (JETCAS): Low Latency Variational Autoencoder on FPGAs
    [Link] [PDF]

  • Jan. 2024 - Accepted by ACM TECS: LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics
    [PDF] [GitHub] [PDF-arXiv] [Indico] [Slides] [VIDEO]

  • Dec. 2023 - FPT 2023: Efficiently Removing Sparsity for High-Throughput Stream Processing
    [PDF] [Src]

  • Jun. 2023 - Successfully defended my PhD.

  • May 2023 - FPL 2023: MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration, 33rd International Conference on Field-Programmable Logic and Applications (FPL). Our first work about MetaML for codifying the optimizations for DL accelerators.
    [PDF]

  • Apr. 2023 - Invited to serve on the FPT 2023 TPC, welcome to submit!

  • Dec. 2022 - FPT 2022: Accelerating Transformer Neural Networks on FPGAs for High Energy Physics Experiments, 2022 International Conference on Field-Programmable Technologies. Our first work about low latency transformer networks.
    [Link] [PDF]

  • Oct. 2022 - ARC 2022: Hardware-Aware Optimizations for Deep Learning Inference on Edge Devices
    [Link] [PDF]

  • Oct. 2022 - A talk for Compute Accelerator Forum based on our latest draft: LL-GNN: Low Latency Graph Neural Networks on FPGAs for Particle Detectors
    [PDF-ArXiv] [Indico] [Slides] [VIDEO]

  • August 2022 - FPL 2022: Optimizing graph Neural Networks for jet tagging in particle physics on FPGAs
    [PDF] [VIDEO]

  • April 2022 - AICAS 2022: Reconfigurable Acceleration of Graph Neural Networks for Jet Identification in Particle Physics
    [PDF]

  • April 2022 - ACM TRETS: Remarn: A Reconfigurable Multi-threaded Multi-core Accelerator for Recurrent Neural Networks, It is an extension of FPT'20 paper.
    [Link]

  • March 2022 - IEEE TCAD: FPGA-based Accelerastion for Bayesian Convolutional Neural Networks Co-author
    [Link] [PDF]

  • March 2022 - IEEE TPDS: Accelerating Bayesian Neural Networks via Algorithmic and Hardware Optimizations Co-author
    [Link] [PDF]

  • January 2022 - IEEE Transactions on Very Large Scale Integration (VLSI) Systems: Recurrent Neural Networks With Column-Wise Matrix-Vector Multiplication on FPGAs, It is an extension of our FCCM'20 paper.
    [Link] [PDF]

  • October 2021 - Will appear in the FPT'21: Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator, Co-first author. This paper is about accelerating Bayesian LSTMs via a co-design framework.
    [PDF]

  • September 2021 - A talk for FastML group about the II balancing for multi-layer LSTM acceleration on FPGAs.
    [Slides]

  • June 2021 - ASAP'21: Accelerating Recurrent Neural Networks for Gravitational Wave Experiments. This paper presents novel reconfigurable architectures with balanced IIs for reducing the latency of multi-layer LSTM-based autoencoder that is used for detecting gravitational waves.
    [PDF] [Github]

  • May 2021 - Journal of Systems Architecture (JSA) : In-Circuit Tuning of Deep Learning Designs. It is an extension of our ICCAD'19 paper about the In-Circuit Tuning.
    [PDF]

  • March 2021 - FCCM'21: Instrumentation for Live On-Chip Debug of Machine Learning Training on FPGAs. Co-author.
    [PDF]

  • November 2020 - FPT'20: A Reconfigurable Multithreaded Accelerator for Recurrent Neural Networks, 2020 International Conference on Field-Programmable Technologies. Acceptance Rate: 24.7%.
    [VIDEO] [PDF]

  • October 2020 - ICCD'20, Short-paper: Optimizing FPGA-based CNN Accelerator using Differentiable Neural Architecture Search, the 38th IEEE International Conference on Computer Design. Co-author.

  • July 2020 - Journal of Signal Processing Systems (JSPS) paper: Mapping Large LSTMs to FPGAs with Weight Reuse. It is an extension of our ASAP'19 paper about the weights reuse for LSTMs with blocking & batching strategy.
    [Link] [PDF]

  • May 2020 - FCCM'20 paper: Optimizing Reconfigurable Recurrent Neural Networks. Conventional design of matrix-vector multiplications (MVM) for RNNs is row-wise, however, it will bring system stall due to data dependency. To eliminate the data dependency in RNNs, we proposed column-wise MVM for RNNs in this paper.
    [Link] [PDF]

  • May 2020 - Best paper nomination in FCCM'20 : High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG Compression. In this paper, we propose customized JPEG+CNN to address the data transfer bandwidth problems for cloud-based FPGAs.
    [Link] [PDF]

  • December 2019 - FPT'19 paper: Real-time Anomaly Detection for Flight Testing using AutoEncoder and LSTM. In this work, a novel Timestep(TS)-buffer is proposed to avoid redundant calculations of LSTM gate operations to reduce system latency.
    [Link] [PDF]

  • November 2019 - ICCAD'19 paper: Towards In-Circuit Tuning of Deep Learning Designs.
    [Link] [PDF]

  • July 2019 - ASAP'19 paper: Efficient Weight Reuse for Large LSTMs. In this paper, we proposed a blocking & batching strategy to reuse the LSTM weights.
    [Link] [PDF]