Home

Misc

Ruokai Yin

PhD candidate, Electrical Engineering, Yale University

E-mail: ruokai.yin@yale.edu

Bio

Ruokai is currently a final year Ph.D. student in the Department of Electrical Engineering at Yale University, where he is advised by Prof. Priyadarshini Panda.

His research focuses on designing energy-efficient computer architectures, systems, and algorithms for AI workloads, particularly those involving asymmetric operand precision or sparsity. He is also interested in neuromorphic computing, as enablers for bio-plausible and energy-efficient deep learning (spiking neural networks).

Prior to joining Yale, he earned his B.S. from the University of Wisconsin-Madison, majored in in Electrical Engineering, Computer Science, and Mathematics. During his undergraduate, he worked with Prof. Joshua San Miguel on designing computer architectures for stochastic computing.

Albuquerque, NM, Dec 2024

            (CV latest update: June 2025)


News

  • 2025/09:   »   NeurIPS 2025 workshop paper accepted (Microsoft internship's project).
  • 2025/09:   »   NeurIPS 2025 paper accepted.
  • 2025/03:   »   I will join Microsoft Azure as a Research Intern, working with AI System Architectur team (June - August).
  • 2025/02:   »   DAC 2025 paper accepted.
  • 2024/07:   »   MICRO 2024 paper accepted.
  • 2024/03:   »   I will join Cerebras System as a Research Intern, working with ASIC team (June - August).
  • 2024/03:   »   IEEE TETCI paper accepted.
  • 2023/10:   »   MINT nominated for best paper award at ASP-DAC 2024.
  • 2023/09:   »   ASP-DAC 2024 paper accepted.

Selected Publications

(please see the full publication list in my google scholar;)

DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs
Ruokai Yin, Yuhang Li, Donghyun Lee, Priyadarshini Panda
39th Conference on Neural Information Processing Systems (NeurIPS), 2025
Thumbnail for DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs
Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference
Ruokai Yin, Sattwik Deb Mishra, Xuan Zuo, Hokchhay Tann, Preyas Shah, Apala Guha
Workshop on ML for Systems, NeurIPS, 2025
Thumbnail for Learning to Shard: RL for Co-optimizing the Parallelism Degrees and Per-operator Sharding Dimensions in Distributed LLM Inference
PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs
Ruokai Yin, Yuhang Li, Priyadarshini Panda
62nd ACM/IEEE Design Automation Conference (DAC), 2025
Thumbnail for PacQ: A SIMT Microarchitecture for Efficient Dataflow in Hyper-asymmetric GEMMs
LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks
Ruokai Yin, Youngeun Kim, Di Wu, Priyadarshini Panda
57th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2024
Thumbnail for LoAS: Fully Temporal-Parallel Dataflow for Dual-Sparse Spiking Neural Networks
MINT: Multiplier-less INTeger Quantization for Energy Efficient Spiking Neural Networks
Ruokai Yin, Yuhang Li, Abhishek Moitra, Priyadarshini Panda
29th Asia and South Pacific Design Automation Conference (ASP-DAC), 2024
Best Paper Nomination
Thumbnail for MINT: Multiplier-less INTeger Quantization for Energy Efficient Spiking Neural Networks
Workload-Balanced Pruning for Sparse Spiking Neural Networks
Ruokai Yin, Youngeun Kim, Yuhang Li, Abhishek Moitra, Nitin Satpute, Anna Hambitzer, Priyadarshini Panda
IEEE Transactions on Emerging Topics in Computational Intelligence, 2024
Thumbnail for Workload-Balanced Pruning for Sparse Spiking Neural Networks
Wearable-based Human Activity Recognition with Spatio-Temporal Spiking Neural Networks
Yuhang Li, Ruokai Yin, Hyoungseob Park, Youngeun Kim, Priyadarshini Panda
Workshop on Learning from Time Series for Health, NeurIPS, 2022
Spotlight Paper
Thumbnail for Wearable-based Human Activity Recognition with Spatio-Temporal Spiking Neural Networks
SATA: Sparsity-Aware Training Accelerator for Spiking Neural Networks
Ruokai Yin, Abhishek Moitra, Abhiroop Bhattacharjee, Youngeun Kim, Priyadarshini Panda
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2022
Thumbnail for SATA: Sparsity-Aware Training Accelerator for Spiking Neural Networks
uGEMM: Unary Computing Architecture for GEMM Applications
Di Wu, Jingjie Li, Ruokai Yin, Hsuan Hsiao, Younghyun Kim, and Joshua San Miguel
47th Annual International Symposium on Computer Architecture (ISCA), 2020
IEEE 2020 Toppick
Thumbnail for uGEMM: Unary Computing Architecture for GEMM Applications

Experience

May 2025-Aug 2025

Research Intern

Microsoft, Azure AI, Architecture and Systems for AI team, Mentors: Apala Guha & Xuan Zuo.

Autonomous sharding planning in large scale distributed LLM inference system.

Paper: https://arxiv.org/abs/2509.00217 (Accepted to Workshop on ML for Systems, NeurIPS 2025)

May 2024-Aug 2024

Research Intern

Cerebras Systems, ASIC team, Mentor: Vipin Sharma.

Architecture design and modeling for Cerebras’s next-generation wafer-scale engine.

July 2021-Present

Graduate Research Assistant

Yale University, Intelligent Computing Lab, PI: Prof. Priyadarshini Panda.

Working on projects that improving the energy efficiency of neural networks, in particular, spiking neural networks.

June 2019-May 2021

Undergraduate Research Assistant

University of Wisconsin-Madison, STACS Lab, PI: Prof. Joshua San Miguel.

Worked on projects that applying unary computing to the deep neural networks. Construted a PyTorch-basede library for unary computing.


Teaching

2023 Fall

Teaching Fellow, EENG 439 Neural Networks and Learning Systems

Yale University

Instructor: Prof. Priya Panda. Course Description.

2023 Spring

Teaching Fellow, EENG 348/CPSC 338: Digital Systems

Yale University

Instructor: Prof. Rajit Manohar. Course Description.