Co-design of systems and formal verification methods 

Complex systems, such as distributed systems and operating systems, need formal verification to guarantee the correctness. However, formal verification is often considered sophisticated and time-consuming and thus it is not usually applied to most of production code and performance-optimized code that tends to have complex code structures. I am exploring ways to co-design the system and formal verification so that code verification can be simplified and verified code runs fast.

Atomic Distributed Objects (ADO)

The ADO is an atomic object model that captures the core properties of strongly consistent distributed systems, such as multi-Paxos, Raft, and Chain Replication. This abstract model hides concurrent and interleaved network-level behaviors and exposes a coalesced view of the distributed system in a simplified fashion. Still, its non-deterministic atomic APIs and tree-based state structure faithfully model network and node failures and transient states from failures, respectively, making it useful for verifying distributed systems. 

The ADO model yields two benefits: 1) the simple model facilitates compositional reasoning of multiple distributed systems and 2) the abstract model can be easily instantiated to different protocols and implementations that are verified to observe the same semantics through layered refinements. Through the ADO, verification of sophisticated compositional systems like key-value stores built with multiple distributed components and two-phase commit over multi-Paxos becomes trivial. The ADO connects to various implementations of strongly consistent distributed systems and demonstrates its practicality. 

ADO vs. multi-Paxos

Layered refinement of ADO

Distributed systems and serverless computing

I am interested in designing, building, and reasoning about distributed systems, such as consensus protocols and systems with different consistency semantics. My current research focuses on the interactions and compositions of multiple distributed systems in a serverless computing context. As part of this research, side by side with the theoretical exploration, new system designs are explored and implemented. 

Serverless Computing for General Applications/Systems

Today’s Serverless frameworks are suited for short-lived stateless tasks, and their nature limits the type of applications that can build on the framework. Stateful operations, such as a state transfer to a successor function or network communications among function instances, are not well supported. Sophisticated distributed operations and long-term state managements are delegated to serverful backend services. Theoretically, serverless frameworks should be able to host almost all programs. However, high latency (e.g., due to cold start and persistent state access), lack of low-level system support (e.g., for communication, fault tolerance, and concurrency control), and the financial cost (i.e., certain applications are cheaper on serverful models) limit their practicality. This project envisions that serverless computing will become more general, and developers will use the serverless paradigm even to build cloud backend services, which today’s serverless applications rely on. As part of this more generic serverless computing vision, this project explores shortcomings of serverless frameworks and improves them. Specifically, designs of applications/systems that are not supported well by serverless platforms are explored in the serverless environment, and new designs of serverless framework and the applications/systems are investigated.

Cloud storage systems 

Cloud storage systems can be considered as distributed systems but there are specialized concerns regarding how to store and mutate the data and how the system should interact with lower-level storage devices. More specifically, my research topics include the use of various transactional mechanisms, consistency semantics, and new storage media technologies in the cloud and distributed environment.

Ongoing projects include how to best utilize emerging hardware devices, such as persistent memory, high-performance SSDs, and RDMA enabled network interface cards, to build local and distributed storage systems.

System support for deep neural network 

New form of applications always need a system support to run efficiently. I am investigating ways to accelerate and optimize control and data flows of deep neural network (DNN) from a systems viewpoint. 

An ongoing project explores optimizing the data augmentation pipeline for the training. Data augmentation enriches the input dataset so that the DNN training accuracy can be enhanced at a relatively lower cost. However, the data augmentation pipeline, if the augmentation approach is dynamic, often becomes the bottleneck. The project explores system designs that can optimize the pipeline for non-stop training.