SOAR Associate Professor (Tenured)
Director of Future System Architecture (FSA) Lab
School of Computer Science
The University of Sydney
Affiliated Professor
Electrical & Computer Engineering Department
University of Washington, Seattle
Affiliated Faculty, Sydney Quantum Academy
2023/3/20, I have been elevated to the associate editor for TPDS.
2022/12/15, I have been promoted .Thank you USYD!
2022/12/6, I have received IEEE mid-career award for scalable computing. Thanks IEEE!
2022/10/4, Google Brain collaboratiohn award. Thanks Google!
2022/9/5, our paper has been accepted to EuroSys 2023. Congrats everyone !
2022/8/23, serving PC for ISCA and NIPS 2023.
2022/5/31, I gave an invited talk at Microsoft on multi-dimensional and multi-scale machine learning system design with compilation techniques. Thanks for all the great discussion!
2022/4/10, our FSA lab website goes live at https://www.fsa-lab.org/ !
2022/3/25, I will be serving MICRO'22 PC. Please submit your best work!
2022/2/26, Congrats to Donglin to receive ASPLOS'22 student travel award.
2022/2/25, I have received Alibaba AIR faculty award to support my work on building multi-dimensional optimization framework for extreme large models on complex enterprise system architectures. Thanks Alibaba!
2022/1/15, collaborated paper with Google Brain has been accepted to MLSys 2022. Congrats to the team! It is Donglin's first leading author paper.
2021/12/4, I am awarded with SOAR faculty fellowship. Thank you for USYD for great support to my research.
2021/12/1, interview with ABC Radio on Metaverse.
2021/11/18, work with my visiting student on temporal graph processing has been accepted to ICDE'22. congrats to everyone!
2021/11/16, lossy compression for DNN paper has been accepted to VLDB'22.
2021/11/15, our paper on desgining compiler optimization for large-scale memory intensive computation in emerging ML models has been accepted to ASPLOS'22. Congrats to Dr. Zheng and the team!
2021/9/20, attending NYU entrepreneurship training workshop with my collaborator.
2021/7/24, congrats to my phd student Alan Robertson to make his first service as ASPLOS'22 ERC.
2021/7/15, I am elected to serve as the next area chair for Supercomputing 2022 (IEEE/ACM SC'22). Please submit your best works !
2021/7/15, our paper on exploration into designing general robust probalistic neural networks'patterns and optimization strategies for Safety-critial applications is accepted by MICRO21. Congrats everyone!
2021/7/11, my student Donglin is invited to give a talk at Google Brain research (hosted by Anna Goldie) on our recent collaborative work with Sara Hooker@Google!
2021/7/10, two papers on high performance computing are accepted to SC'21! Congrats everyone!
2021/5/22, our collaborated paper on the performance discrepancy between python and native libraries has been accepted FSE 2021. Congrats to Xu and his team!
2021/5/18, Google GCP award for Google Brain collaboration. Thanks Google!
2021/3/31, Efficient and Accurate End-to-End Deep LearningTraining via Fine-Grained Architecture-Preserving Pruning is accepted to ICS'21. Nice job everyone !
2021/3/24, giving guest lecture on future XR system design and optimizations for UW CSE548 on April 23rd. Come join in the discussion!˜
2021/3/6, our large LSTM training software-hardware design paper has been accepted to ISCA'21! Congrats to all the folks invovled from USYD FSA lab and UW Bespoke Group.
2021/3/3, I am excited to serve as ERC for MICRO21. Please submit your work!
2021/1/30, our HPCA'20 paper has been included as architecture research highlights for 2019˜2020 by IEEE Transactions on Computers (TC). Congrats everyone !
2021/1/20, I am excited to become ACM distinguished speaker.
2020/12/15, I have received Facebook Faculty Award. Thank you FB for supporting my research!
2020/12/8, invited to speak at AMD research Asia Tech talk. Super excited!
2020/11/19, Coallborative VR system design paper is accepted at ASPLOS'21!
2020/11/18, kick-start project with Google Brain !
2020/11/13, received Australian Research Council's (Austrilian NSF) discovery project for 3 years. Thanks ARC for supporting my research !
2020/11/05, PC @ ISCA'21. Please submit your best work !
2020/09/20, invited to speak at Monash University Engineering Event on Oct 13th.
2020/09/17, PC @ SC'21. Please submit your best work !
2020/09/17, PC @ HPDC'21, machine learning track. Please submit your best work !˜
2020/7/1, I am awarded the 2020 Austrilia's Most Innovative Engineers award. Thanks for USYD's tremendous support!
2020/7/1, chairing architecture track@IPDPS'21
2020/6/4, ERC@HPCA-27
2020/5/1, Associate editor@ IEEE transactions on Sustainable Computing
2020/4/6, ERC@MICRO-53
2020/3/25, IPDRM workshop proposal accepted @SC'20!
2020/3/15, Awarded for GCP and TPU Pod V2 and V3 resource for my VR research! Thank you Google!
2020/3/04, panelist for MLperf workshp at System ML conference at Austin Texas.
2020/2/28, presenting VR research work @ Google Platform research.
2020/2/15, review board@TPDS.
2019/11/10, visiting CS@UNSW.
2019/11/2, visiting virtual reality lab@ UNSW art school.
2019/11/1, ERC@ISCA'20 .
2019/10/15, CapsuleNet PIM design paper accepted to HPCA-26.
2019/10/1: Area chair@ICPP'20.
2019/9/20, PC@ICDCS'20, HPDC'20.
2019/9/28, start my academic life @ U of Sydney. Love our beautiful downtown campus in beautiful Sydney!˜
2019/8/15, visiting professor Yan at ECE@Rice and give a talk.
2019/8/1, soft tensorcore for approximate neural nets is accepted to Supercomputing'19.
2019/6/10, ISCA paper is presented by my postdoc Chenhao@FRC.
2019/05/01, formally affiliated with UW ECE department as affiliated professor.
2019/4/2, Panelist for Berkeley lab AI workshop.
2019/3/28, PC@PPoPP'20.
2019/3/15, future cloud server design for VR services is accepted to ISCA'19.
2019/2/14, granted $350k as PI from DoD/DoE HPDA project to develop highly scalable BLAS library on many-accelerator systems.
2019/1/9, PC@PACT'19.
2018/11/8, co-design for enabling motion-anomaly free virtual reality devices accepted to HPCA-25.
2018/9/13, Tartan benchmark for multi-GPU evaluation nominated for best paper finalists at IISWC'18.
2018/6/29, serving 2018 IEEE TCHPC early career award selection committee.
2018/5/16, R&D 100 award judge.
2018/4/10, WarpConsolidation model accepted to ICS'18@Beijing.
2018/3/14, U.S. DOE research highlight: Unlocking On-Package Memory’s Effects on High-Performance Computing’s Scientific Kernels
2018/2/8, invited to serve on review board for Concurrency and Computation, Practice and Experience (CCPE) journal.
2018/2/6, Four papers presented at HPCA-PPoPP-CGO 2018: SuperNeuron (PPoPP), CUDAAdvisor (CGO), Low-cost real-time memory profiling (CGO), and Efficient Approximate design for 3D rendering architecture (HPCA).
2018/1/24, serving PC for PPoPP'19.
2018/1/24, participating DOE ASCR Heterogeneous workshop to help ASCR draft strategies for beyond exascale computing.
2018/1/13, CSE@UW project meeting with Micheal's group.
2017/11/15, receiving IEEE early career award for HPC @Supercomputing'17.
2017/11/15, presenting our paper@Supercomputing'17 in best paper session.
2017/10/27, giving a talk @ MSR distributed computing group.
2017/10/02, giving a talk @ Intel research portland.
2017/8/15, invited talk @ SIAM PP18 in Tokyo.
2017/8/3, ASPLOS'17 paper receives HiPEAC paper award.
2017/7/2, paper accepted to MICRO-50.
"It is never too late to become what you might have been." -- George Elliot
I am the SOAR associate professor (tenured) at the School of Computer Science at University of Sydney, and I direct the Future System Architecture Lab (FSA). I am also a Senior Principal Scientist at Microsoft, leading DeepSpeed4Science initative and other pathfinding projects at DeepSpeed. I hold an Affiliated Professor position with University of Washington . Prior to my appointment at University of Sydney, I worked for U.S. Department of Energy Lab for five years as a senior staff scientist and technical lead. In 2017 and 2022, I was awarded with IEEE HPC early career award and IEEE mid-career award for scalable computing, respectively. I was also awarded with 2022 Alibaba Gloab Faculty Award (AIR), 2022 SOAR Fellowship, 2022/2021 Google Brain Collaboration Award, 2021 Facebook faculty award, 2020 Australia's Most Innovative Engineer Award and a ACM distinguished speaker. I am also a Lawrence Scholar and a recipient of Paul E. Torgersen Excellent research award, a 2018 DOE pathway to excellence research award, 2015 and 2017 DOE PNNL lab outstanding research award, two Supercomputing (IEEE/ACM SC) best paper runners-up (2015 and 2017), and 2017 HiPEAC paper award. I have published in the top HPC and computer architecture conferences including ISCA, HPCA, ASPLOS, MICRO, and Supercomputing (SC). My past and current research has been supported by Microsoft, Google, NVIDIA, Intel, U.S. government agencies including DOE office of science (ASCR), DoD, DARPA and DoE Lab LDRD, and Australian Research Council (ARC). During my tenure at PNNL, I led two DOE lab LDRD projects on AI-driven architecture design and large-scale data analytics acceleration. At University of Sydney, I run Future System Architecture (FSA) Lab with my wonderful students. Currently, we are actively working with our collaborators from UW Seattle, UT Austin, NYU, Google Brain, Facebook Reality Lab and Alibaba Research. In my spare time, I am also consulting for tech startups.
I do research at the boundary of system software and hardware, breaking down abstraction barriers, and rethinking the hardware–software interface. I have a particular interest of holistic system design and software-hardware co-design. More broadly, my expertise lies in the general areas of computer system architecture and high performance computing (HPC). I hold the strong belief that future beyond Moore’s system architectures will become increasingly heterogeneous which demands new software (programming system, compiler, runtime) and hardware design paradigm to accommodate such complex many-accelerator integrated systems. As a computer system researcher, I am inspired to push the concept of co-design to create efficient and scalable solutions for emerging systems and applications, including future planet-scale Extended-Reality (XR) system, System ML and AI-driven System Designs, and even future quantum accelerator based heterogeneous architectures., In the recent years, with my amazing students and collaborators, we have published some of the first papers (HPCA'17, HPCA'18, HPCA'19, ISCA'19, ASPLOS'21, HPCA'23) on future VR system characterizations and system-level design & optimizations (including both multi-accelerator based HMD SoC and cloud server designs) in the field of computer architecture. Additionally, my recent work in industry research on machine learning system optimizations and scalability are being delopyed to real-world large-scale enterprise usage for millions of users.
[OSDI'24] "MonoInfer: Enabling a New Monolithic Optimization Space for Neural Network Inference Tasks on Modern GPU-Centric Architectures" Donglin Zhuang, Zhen Zheng, Haojun Xia, Xiafei Qiu, Junjie Bai, Wei Lin, Shuaiwen Leon Song, accepted to appear in OSDI 2024.
|
[VLDB'24] "Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity" Haojun Xia, Zhen Zheng, Yuchao Li, Donglin Zhuang, Zhongzhu Zhou, Xiafei Qiu, Yong Li, Wei Lin, Shuaiwen Leon Song, accepted to appear in VLDB 2024.
|
[Arxiv] "Chiplet Cloud: Building AI Supercomputers for Serving Large Generative Language Models" Huwan Peng, Scott Davidson, Richard Shi, Shuaiwen Leon Song, Michael Taylor.
|
[Arxiv] "DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales" Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He.
|
[SC'23] "Mitigating Coupling Map Constrained Correlated Measurement Errors on Quantum Devices" Alan Robertson, Shuaiwen Leon Song, accepted to appear in IEEE/ACM Supercomputing'23.
|
[HPCA'23] "Post0-VR: Enabling Universal Realistic Rendering for Modern VR via Exploiting Architectural Similarity and Data Sharing" In the 29th IEEE International Symposium on High-Performance Computer Architecture.
|
[EuroSys'23] "TEA: A General-Purpose Temporal Graph Random Walk Engine" In European Conference on Computer Systems (2023).
|
[MLSys'22] "Randomness In Neural Network Training: Characterizing The Impact of Tooling" In Machine Learning and Systems (MLSys) 2022. (With Google Brain, Acceptance rate 20.6%) Artifact.
|
[ASPLOS'22] "AStitch: Enabling A New Multi-Dimensional Optimization Space for Memory-Intensive ML Training and Inference on Modern SIMT Architectures" In the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'22, with Alibaba Research).
|
[VLDB'22] "COMET: A Novel Memory-Efficient Deep Learning Training Framework by Using Error-Bounded Lossy Compression" the 48th International Conference on Very Large Data Bases (VLDB'22).
|
[PPoPP'22] "Vapro: Performance Variance Detection and Diagnosis for Production-Run Parallel Applications" ACM Principles and Practice of Parallel Programming (PPoPP). The journal versioin is published at TPDS.
|
[ICS'22] "Bring Orders into Uncertainty: Enabling Efficient Uncertain Graph Processing via Novel Path Sampling
on Multi-Accelerator Systems" The International Conference on Supercomputing (ICS).
|
[ICDE'22] "TeGraph: A Novel General-Purpose Temporal Graph Computing Engine" IEEE International Conference on Data Engineering (ICDE).
|
[Supercomputing'21] "Dr. Top-k: Delegate-Centric Top-k on Heterogenous HPC Architectures" The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'21). Code Reproductivity
|
[Supercomputing'21] "MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers" The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'21). Code Reproductivity
|
[MICRO'21] "Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving" IEEE/ACM International Symposium on Microarchitecture 2021 (MICRO'21). Artifact
|
[ESEC/FSE] "Toward Efficient Interactions between Python and Native Libraries" The ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 2021.
|
[TC] "MalFox: Camouflaged Adversarial Malware Example Generation Based on Conv-GANs Against Black-Box Detectors" IEEE Transactions on Computers.
|
[ICS'21] "ClickTrain: Efficient and Accurate End-to-End Deep Learning Training via Fine-Grained Architecture-Preserving Pruning" The International Conference on Supercomputing (ICS).
|
[ASPLOS YArch'21] "Quantum Von Neumann Architectural Modeling for Algorithm Analysis" The third Young Architect Workshop (YArch) of ASPLOS'21. talk video.
|
[ISCA'21] "η-LSTM: Co-Designing Highly-Efficient Large LSTM Training via Exploiting Memory-Saving and Architectural Design Opportunities" Xingyao Zhang, Haojun Xia, Donglin Zhuang, Hao Sun, Xin Fu, Michael Taylor, Shuaiwen Leon Song, The International Symposium on Computer Architecture [ISCA'21]. (Acceptance rate: 18.7%). talk
|
[TC] "Enabling Highly Efficient Capsule Networks Processing Through Software-Hardware Co-Design" Xingyao Zhang, Xin Fu, Donglin Zhuang, Chenhao Xie, Shuaiwen Leon Song, IEEE Transactions on Computers (TC), [A special issue on research highlights of Computer Architecture for 2019-2020.]
|
[ASPLOS'21] "Q-VR: System-Level Design for Future Collaborative Virtual Reality" Chenhao Xie, Xie Li, Yang Hu, Huwan Peng, Michael B. Taylor, Shuaiwen Leon Song, The 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'21). (Acceptance rate: 17.5%). This is the first work in the computer architecture community on a possible design for future planet-scale mobile VR systems with low latency and high quality. We transform a complex multi-dimensional problem into a system design problem, providing real-world observations, practical insights and a software-hardware co-design solution.
|
[HPCA'20] "Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design" 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), (Acceptance rate: 19.4%; This work tries to start some conversation on how we deal with these new types of hybrid network structures where not only 2D or 3D convolution poses as the most significant efficiency block. We are helping CapsuleNet actually scale and become hardware-friendly (via Processing-in-Memory).) Slides
|
[ISCA'19] "OO-VR: NUMA friendly object-oriented VR rendering framework for future NUMA-based multi-GPU systems" Chenhao Xie, Xin Fu, Mingsong Chen, Shuaiwen Leon Song. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA '19). (Acceptance rate: 16.9%; Investigating into what future chiplet-based cloud server system will look like to reach scalable parallel rendering efficiency via a new hardware-software co-design framework.) Viedo
|
[HPCA'19] "PIM-VR: Erasing Motion Anomalies In Highly-Interactive Virtual Reality World with Customized Memory Cube" Chenhao Xie, Xingyao Zhang, Ang Li, Xin Fu, Shuaiwen Leon Song. In 2019 IEEE International Symposium on High Performance Computer Architecture (HPCA). (Acceptance rate: 19.7%; One of the first papers in the community identifying that state-of-the-art software-level reprojection optimization for commerical VR devices involuntarily causes significant hardware-level bottlenecks and motion anomalies.) Viedo
|
[Supercomputing'19] "BSTC: a novel binarized-soft-tensor-core design for accelerating bit-based approximated neural nets" Ang Li, Tong Geng, Tianqi Wang, Martin C. Herbordt, Shuaiwen Leon Song, Kevin J. Barker. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19). (Acceptance rate: 20.9%; Introducing the concept of software-based tensorcore for bit-level parallelism on manycore accelerators) Viedo
|
[PPoPP'18] "Superneurons: dynamic GPU memory management for training deep neural networks" Linnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Tim Kraska. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18). (Acceptance rate: 20%; GPUs'memory are too small to have balanced efficiency matching with powerful cores when handling training for large non-linear networks. SuperNeurons helps users tackle this issue.)Viedo
|
[HPCA'18] "Perception-Oriented 3D Rendering Approximation for Modern Graphics Processors." Chenhao Xie, Xin Fu, Shuaiwen Leon Song. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). (Acceptance rate: 19.7%; Connecting architectural bit stream and its manipulation to user perception impact.) Viedo
|
[CGO'18] "CUDAAdvisor: LLVM-based Runtime Profiling for Modern GPUs." Du Shen, Shuaiwen Leon Song, Ang Li, Xu Liu. In ACM International Symposium on Code Generation and Optimization (CGO).
|
[ICS'18] "Warp-Consolidation: A Novel Execution Model for GPUs." Ang Li, Weifeng Liu, Linnan Wang, Kevin Barker, Shuaiwen Leon Song. In 32nd ACM International Conference on Supercomputing (ICS'18). (A new GPU programmming model for sychronizaton-critical applications).
|
[TACO] "NUMA-Caffe: NUMA-Aware Deep Learning Neural Networks." Probir Roy, Shuaiwen Leon Song, SRIRAM KRISHNAMOORTHY, Dipanjan Sengupta, Xu Liu. In ACM Transactions on Architecture and Code Optimization (TACO).
|
[TPDS] "Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect." Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R. Tallent, Kevin J. Barker. In IEEE Transactions on Parallel and Distributed Systems (TPDS). [IISWC'18, Best Paper Finalist] "Tartan: Evaluating Modern GPU Interconnect via a Multi-GPU Benchmark Suite." Ang Li, Shuaiwen Leon Song, Jieyang Chen, Jiajia Li, Xu Liu, Nathan R. Tallent, Kevin J. Barker. In 2018 IEEE International Symposium on Workload Characterization (IISWC). |
[Supercomputing'17, Best Paper Finalist, DoE Research Highlight article] "Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels." Ang Li, Weifeng Liu, Mads Ruben Burgdorff Kristensen, Brian Vinter, Hao Wang, Kaixi Hou, Andres Marquez, Shuaiwen Leon Song. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). (Acceptance rate: 61/327=18.6%; First research investigating performance impact of introducing on-package memory into memory heriachy on a large spectrum of fundamental HPC scientific kernels and provide formalized modeling scheme for easy performance analysis at large scale. It recieves SC'17 best paper finalist and DoE research highlight.)
|
[MICRO-50] "BVF: enabling significant on-chip power savings via bit-value-favor for throughput processors." Ang Li, Wenfeng Zhao, Shuaiwen Leon Song. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO '17). (Acceptance rate:18.6%).
|
[ASPLOS-17, HiPEAC paper award]"Locality-Aware CTA Clustering for Modern GPUs." Ang Li, Shuaiwen Leon Song, Weifeng Liu, Xu Liu, Akash Kumar, Henk Corporaal. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '17). (Acceptance rate:17.4%) Viedo
|
[HPCA'17]"Processing-in-Memory Enabled Graphics Processors for 3D Rendering." Chenhao Xie, Shuaiwen Leon Song, Xin Fu. In n 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA). (Acceptance rate:21.6%; Using processing-in-memory technique along with graphics software pipeline re-design to accelerate modern rendering efficiency. )
|
[ICS'16] "SFU-Driven Transparent Approximation Acceleration on GPUs." Ang Li, Shuaiwen Leon Song, Mark Wijtvliet, Akash Kumar, Henk Corporaal. In 30th ACM International Conference on Supercomputing (ICS). (Interesting concept about approximation units on modern many-core architectures and how they collaborate together in system to achieve dynamic performance-accuracy trade-offs.)
|
[HPDC'16] "SMT-Aware Instantaneous Footprint Optimization." Probir Roy, Xu Liu, Shuaiwen Leon Song. In 25th ACM international Symposium on High-Performance and Distributed Computing (HPDC). (Identifying, debugging and fixing false sharing on multicore processors.)
|
[ICS'15] "Locality-Driven Dynamic GPU Cache Bypassing." Chao Li, Shuaiwen Leon Song, Hongwen Dai, Albert Sidelnik, Siva Hari, Huiyang Zhou In 29th ACM International Conference on Supercomputing (ICS).
|
[Supercomputing'15, Best Paper Finalist]"GraphReduce: processing large-scale graphs on accelerator-based systems." Dipanjan Sengupta, Shuaiwen Leon Song, Kapil Agarwal, Karsten Schwan. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). (Acceptance rate:18.5%)
|
Room 438, School of
Computing
The University of Sydney
Camperdown NSW 2006, Australia
shuaiwen.song |at| sydney |
dot | edu | dot | au
+61 2 8627 9613