2009-04-29

Cloud Storage Standardization, Part 1 - Why a Standard?

As cloud storage matures and use increases, there is a strong need for standardized interfaces for performing basic storage operations. Just as the SCSI standard enabled interoperability and facilitated innovation within the directly-attached storage market, the presence of a standard interface for cloud storage will provide many advantages. These include:
  • Improving quality by allowing standardized conformance testing
  • Creating a market for test, validation, profiling, debugging and analysis tools
  • Allowing the creation of standardized documentation
  • Encouraging the publication of articles and books discussing the standard
  • Reducing development work required to use cloud storage and to support multiple providers
  • Enabling the creation of standardized access libraries
  • Reducing customer lock-in and enabling multi-vendor selection
At Bycast, we provide a RESTful HTTP API for object storage access, and next month, I will be presenting our HTTP API at the SNIA cloud storage summit. As part of my preparations, I've been reviewing many of the other HTTP storage APIs used by other industry players, such as Amazon, Microsoft, and Nirvanix. There is significant overlap and commonalities between all of these APIs, and I believe that there is a good chance, at least from a technical standpoint, to create a common lightweight HTTP storage access protocol.

2009-04-07

Just what is "Cloud Computing"?

As cloud computing increases in prominence, the argument over exactly what cloud computing actually is has also grown in proportions. Never before has a buzzword been so truly nebulous.

So, with the definition in such dispute, this is the perfect time to throw out yet another definition.

With that said, my definition of cloud computing is: *drum roll*

"Distributed location-independent scale-free cooperative agents"

"What?", one might say... "That's nothing like the others I've seen." And they would be right. That's exactly the point. This is my technical definition of cloud computing, and let's pick it apart, piece by piece:

Distributed

Distributed is part of almost every cloud definition, but even the word "distributed" has different meanings to different people. For me, distributed signifies the presence of a substantial separation between entities. This definition implies several things:
  1. That there are well-defined entities that are distinct from each other
  2. That there is a method of communication between entities that is separate from the entities themselves
  3. That there is some degree of isolation between entities
This is broad definition, but one that I would argue that most technical people would agree with. This would mean that a client-server system is a distributed system, as is a telephone exchange or an interacting web server and client. However, a web server alone would not be considered distributed, nor would the Linux kernel.

This definition of distributed is about logical entities and their interactions, so it does not matter if the entities are all co-resident on a single computing system, or scattered across multiple computing systems. That, in my books, is more related to the the property of location independence.

Location-Independence

What separates your typical computer operating system from what is typically known in the academic world as a distributed operating system is the property of location independence. This is what allows the physical location of a given entity (be it a program or state) to not matter to its correct operation. More specifically, location independent entities have the ability to transparently move between locations without affecting system operation.

Location independence is a key enabling technology. For example, in the Internet, DNS and IP routing provides location independence between Internet clients and servers. This allows the servers to be moved around without the clients having to know their physical location, or that there even was a change.

When systems are built out of location-independent entities, they can be run and stored on different physical configurations. This lays the foundation for designing scale-free systems.

Scale-Free

In most system implementations, the chosen algorithms and architectures only work efficiently at a given scale. As the size and scope of a system is increased or decreased, bottlenecks, inefficiencies, and waste result in diminishing returns, and put limits on how big and how small a system can be. Systems designed for portable embedded devices must be designed very differently than systems designed for the worlds largest supercomputers, and rightly so.

However, a new class of computing systems has emerged that has the property of being scale-free. This means that regardless of the scale of their deployment, they are able to continue to perform their task efficiently. From an algorithms standpoint, this means that the computation required scales linearly as a function of the load on the system.

Thus, when you have scale-free location-independent systems, you can scale just by adding more servers, networks and storage devices. This is what the all of the largest (at least in terms of computational load) web companies, such as facebook, flickr, twitter and friends, all aspire to.

Cooperative Agents

Web services counts. SoA counts. In fact, most message-passing distributed systems are based around the concept of cooperating agents working together to solve a problem.

If the agents aren't cooperating, or they aren't agents, then they're not a cloud to me.

What isn't in my Definition of Cloud

It is also interesting to look at what is not in my definition of cloud. The first thing that one may notice when comparing it to most other definitions is the complete lack of business terms. That's because Cloud Computing is a technology, not a business model. There's a reason why previous attempts at cloud-based business models were called "Software as a Service", and "Application Service Providers". That's because these terms describe business models, as opposed to technological models.

Much of my annoyance with the "Cloud" as a buzzword would go away if marketing and businesses just said that they have a "Cloud business model", which provides Internet-based delivery of their application or service and leveraged cloud computing techniques.

In fact, one could arguably develop and deliver such a business model without using any cloud technology. A good example of this is the credit card processing networks, which is mainframe based, largely centralized, and very specifically designed around a given scale, with an implementation that requires very specific location bindings for entities. Yet, it provides what is arguably from a business standpoint, a cloud service.