2008-12-23

Concurrency and Complexity

HP's Russ Daniels understands a critical issue facing the next generation of software development:

"If you think about whatever those skills are that are necessary to be able to program a computer, we don't understand what they are very well, but we know that they're not very widely distributed. So you end up with a relatively small portion of the population who can write a program successfully. Then you start introducing certain kinds of abstractions: if you introduce recursion, you lose some people; if you introduce pointers, you lose some people; if you introduce event-driven models, you lose some people; when you introduce concurrency, you lose virtually everybody.

So we create these programming languages that support concurrency and then you find that there's a lock and an unlock around every method, because otherwise people hurt themselves. It presents an interesting dilemma—we need to be able to get more stuff done, we need to figure out how to do stuff in parallel, but we're pretty bad at doing that."

One of the keys to solving this complexity problem is to make the mechanics of parallelism safe, automatic, and transparent to the programmer. Ultimately, this approach needs to extend far beyond just parallelism, as we see the same failings in the areas of security, fault tolerance, fault recovery, dynamic scaling, and resource economics.

An ADE Example

In order to see how this could work, let's take a quick look at an example of how ADE transparently provides concurrency and protection of resources that must be accessed serially:

Let's imagine that we're tasked with building a simple LAN switch, and we want to provide an IP address resolution for connected Ethernet hosts. If we want to send an IP packet to a given computer, we first need to find what the MAC address of the computer with a given IP address is. This is performed using the Address Resolution Protocol (ARP), which was originally documented in IETF RFC 826.

At a simplified level, the protocol works as follows:
  1. A host which wants to talk with a specific IP address sends an ARP broadcast request asking for the MAC address that traffic should be addressed.
  2. The switch intercepts the broadcast, and looks up the IP address in its local cache. If the IP to MAC address mapping is found, it responds to the originally requesting host with the ARP response.
  3. If the mapping is not found in the local cache, the switch broadcasts the ARP request to all other ports.
  4. If the switch receives a ARP response, it adds the IP to MAC mapping to its local cache, and forwards the response to the originally requesting host.
This is equivalent to a simple database insert/lookup problem, and in traditional concurrent programming models, errors can easily be introduced if care is not taken to lock the local cache while performing modifications.

In contrast, because ADE handles concurrency by forcing all component interactions to be atomic transactions, no locks are required and the system can be proven to be correct and deadlock free.

In this example, we are going to consider three actors: The ARP Lookup Process, the ARP Cache Process and the ARP Discovery Process. These actors have the following roles:
  • The ARP Lookup Process receives ARP lookup requests from the network hosts, asks the ARP Cache Process if the mapping is known, asks the ARP Discovery Process for the mapping if not known, and responds to the requesting host.
  • The ARP Cache Process looks up mappings in the local cache, and adds mappings when asked.
  • The ARP Discovery Process sends ARP lookup requests to the network, and when a mapping is discovered, it asks the ARP Cache Process to remember them.
In this model, there would be one instance of the ARP Lookup Process per ARP lookup packet received on the network, one instance of the ARP Cache Process, and one instance ARP Discovery Process per cache miss.

In a busy network, there could be dozens of ARP requests being processed, and many concurrent requests that may require changes to be made to the ARP cache. Even if ARP discovery is adding a new mappings at the same time that a new request is looking up a mapping, there is no possibility that the cache can become corrupt, because all access to the cache is automatically serialized by virtue of the entities and their messages that are exchanged.

This is the key aspect of ADE that simplifies the creation of distributed systems. Because concurrency is automatic, when problems are decomposed into collections of interacting actors, any parallelism inherent in the problem will be automatically utilized.

No comments: