Zhiqiang (Walkie) Que

Huxley Building, Imperial College London, London, UK, SW7 2BZ
Email: z.que [ at ] imperial.ac.uk

I am a research associate in the Custom Computing Research Group in the Department of Computing at Imperial College London. Before joining Imperial, I worked on microarchitecture design and verification of ARM CPUs with Marvell Semiconductor (2011-2015) and low latency FPGA systems with CFFEX (2015-2018). I obtained my PhD from Imperial College London under the supervision of Prof. Wayne Luk. I received my B.S. in Microelectronics and M.S. in Computer Science from Shanghai Jiao Tong University (SJTU) in 2008 and 2011 respectively.
I served as a peer reviewer for many conferences and Journals including FPGA, FCCM, FPL, FPT, TECS, JSA, TCAS, JPDC, FGCS, MICPRO, TETC, IECIE, etc. Our research has received best paper awards at SmartCloud'18, CSE'10 and best paper nomination at FCCM'20, ASAP'19, FPT'19, FPT'18.
My research interests include computer architecture, embedded systems, high-performance computing and computer-aided design (CAD) tools for hardware design optimization.

Some News

  • Jan. 2024 - Accepted by ACM TECS: LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics
    [PDF] [GitHub] [PDF-arXiv] [Indico] [Slides] [VIDEO]

  • Dec. 2023 - FPT 2023: Efficiently Removing Sparsity for High-Throughput Stream Processing
    [PDF] [Src]

  • June 2023 - Successfully defended my PhD.

  • May 2023 - FPL 2023: MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration, 33rd International Conference on Field-Programmable Logic and Applications (FPL). Our first work about MetaML for codifying the optimizations for DL accelerators.
    [PDF]

  • April 2023 - Invited to serve on the FPT 2023 TPC, welcome to submit!

  • Dec. 2022 - FPT 2022: Accelerating Transformer Neural Networks on FPGAs for High Energy Physics Experiments, 2022 International Conference on Field-Programmable Technologies. Our first work about low latency transformer networks.
    [Link] [PDF]

  • Oct. 2022 - ARC 2022: Hardware-Aware Optimizations for Deep Learning Inference on Edge Devices
    [Link] [PDF]

  • Oct. 2022 - A talk for Compute Accelerator Forum based on our latest draft: LL-GNN: Low Latency Graph Neural Networks on FPGAs for Particle Detectors
    [PDF-ArXiv] [Indico] [Slides] [VIDEO]

  • August 2022 - FPL 2022: Optimizing graph Neural Networks for jet tagging in particle physics on FPGAs
    [PDF] [VIDEO]

  • April 2022 - AICAS 2022: Reconfigurable Acceleration of Graph Neural Networks for Jet Identification in Particle Physics
    [PDF]

  • April 2022 - ACM TRETS: Remarn: A Reconfigurable Multi-threaded Multi-core Accelerator for Recurrent Neural Networks, It is an extension of FPT'20 paper.
    [Link]

  • March 2022 - IEEE TCAD: FPGA-based Accelerastion for Bayesian Convolutional Neural Networks Co-author
    [Link] [PDF]

  • March 2022 - IEEE TPDS: Accelerating Bayesian Neural Networks via Algorithmic and Hardware Optimizations Co-author
    [Link] [PDF]

  • January 2022 - IEEE Transactions on Very Large Scale Integration (VLSI) Systems: Recurrent Neural Networks With Column-Wise Matrix-Vector Multiplication on FPGAs, It is an extension of our FCCM'20 paper.
    [Link] [PDF]

  • October 2021 - Will appear in the FPT'21: Optimizing Bayesian Recurrent Neural Networks on an FPGA-based Accelerator, Co-first author. This paper is about accelerating Bayesian LSTMs via a co-design framework.
    [PDF]

  • September 2021 - A talk for FastML group about the II balancing for multi-layer LSTM acceleration on FPGAs.
    [Slides]

  • June 2021 - ASAP'21: Accelerating Recurrent Neural Networks for Gravitational Wave Experiments. This paper presents novel reconfigurable architectures with balanced IIs for reducing the latency of multi-layer LSTM-based autoencoder that is used for detecting gravitational waves.
    [PDF] [Github]

  • May 2021 - Journal of Systems Architecture (JSA) : In-Circuit Tuning of Deep Learning Designs. It is an extension of our ICCAD'19 paper about the In-Circuit Tuning.
    [PDF]

  • March 2021 - FCCM'21: Instrumentation for Live On-Chip Debug of Machine Learning Training on FPGAs. Co-author.
    [PDF]

  • November 2020 - FPT'20: A Reconfigurable Multithreaded Accelerator for Recurrent Neural Networks, 2020 International Conference on Field-Programmable Technologies. Acceptance Rate: 24.7%.
    [VIDEO] [PDF]

  • October 2020 - ICCD'20, Short-paper: Optimizing FPGA-based CNN Accelerator using Differentiable Neural Architecture Search , the 38th IEEE International Conference on Computer Design. Co-author.

  • July 2020 - Journal of Signal Processing Systems (JSPS) paper: Mapping Large LSTMs to FPGAs with Weight Reuse. It is an extension of our ASAP'19 paper about the weights reuse for LSTMs with blocking & batching strategy.
    [Link] [PDF]

  • May 2020 - FCCM'20 paper: Optimizing Reconfigurable Recurrent Neural Networks. Conventional design of matrix-vector multiplications (MVM) for RNNs is row-wise, however, it will bring system stall due to data dependency. To eliminate the data dependency in RNNs, we proposed column-wise MVM for RNNs in this paper.
    [Link] [PDF]

  • May 2020 - Best paper nomination in FCCM'20 : High-Throughput Convolutional Neural Network on an FPGA by Customized JPEG Compression. In this paper, we propose customized JPEG+CNN to address the data transfer bandwidth problems for cloud-based FPGAs.
    [Link] [PDF]

  • December 2019 - FPT'19 paper: Real-time Anomaly Detection for Flight Testing using AutoEncoder and LSTM. In this work, a novel Timestep(TS)-buffer is proposed to avoid redundant calculations of LSTM gate operations to reduce system latency.
    [Link] [PDF]

  • November 2019 - ICCAD'19 paper: Towards In-Circuit Tuning of Deep Learning Designs.
    [Link] [PDF]

  • July 2019 - ASAP'19 paper: Efficient Weight Reuse for Large LSTMs. In this paper, we proposed a blocking & batching strategy to reuse the LSTM weights.
    [Link] [PDF]