2008-11-18

ADE: A Decade of Distributed Computing

The Asynchronous Distributed Environment (ADE) is a message-based framework for the creation of distributed grid computing systems, including the Bycast StorageGRID. With over 40,000 node-years of production runtime, this framework represents one of the more mature distributed computing environments.

I originally developed ADE as a testbed for distributed computing concepts, and published a paper describing the system, titled "The ADE Environment: A Macroscopic Object Architecture for Software Componentization and Distributed Computing" at the MacHack conference back in 1998. As an environment for experimentation, it was very successful, allowing the rapidly exploration of many patterns for creating massively concurrent distributed systems, and for refining the environment to improve the programming model and associated infrastructure.

The ADE Process Model

ADE takes a rather unique approach to the traditional CSP process model: processes are messages. Thus, instantiating a process is as easy as sending a message that includes executable code. Each message has a unique "capability" identifier that allows messages to be sent to itself. When a message is received by a process, the executable code corresponding to the received message is run, which allows the process to manipulate its state (the process message) and to optionally generate additional messages.



An example of a more complex process trace can be found as part of this Graphviz .dot file.

This approach has several major advantages:
  1. Parallelism is implicit and automatic, as dependencies are expressed as message exchange relationships. All forms of serial and parallel computing can thus expressed as directed acyclic graphs.
  2. Processes can easily be migrated from one system to another, as they are just messages. Lightweight and automated migration permits experimentation with alternate approaches to distributed processing, such as migrating processes to a data source or resource.
  3. As processes are just messages, the entire execution state of a system can be captured by checkpointing all message across a given cut line, and by retaining historical message states, execution can be run backwards, and "anti-messages" can undo processing.
  4. Execution state can easily be logged and inspected for debugging and visualization. By inspecting the real-time message graph, deadlock and livelock can be automatically detected, as can the critical path for performance optimization.
And as one can imagine, this is just the tip of the iceberg.

A Maturing Environment

Much has changed over the last decade, and ADE has transformed from a research system into an industry proven technology. As ADE originally was based on ATM networking, we re-wrote the networking and node-to-node messaging to use TCP/IP, and the process model was changed to allow statically linked code to be associated with processes, instead of including it with the message. As storage systems require extremely high execution performance, we optimized for message processing speed and efficiency, and some of the less-used features, such as process migration, were retired from our production builds.

The result has been a high-performance, yet flexible system that facilitates the rapid development and testing of distributed systems. By focusing on allowing distributed software to be easily developed, visualized and tested, we've been able to rapidly innovate and add functionality to our system. And ultimately, this has been the most significant advantage of ADE.

No comments: