2009-09-14

Introducing CDMI

Today, the Storage Networking Industry Association (SNIA) publicly released the first draft of the Cloud Data Management Interface (CDMI). The draft standard can be downloaded at the below address:

http://www.snia.org/tech_activities/publicreview

I'm very pleased to have been a significant contributor to this standard since the inception of the working group earlier this year. Over the last nine months, we've been able to come a long way towards defining a working standard for cloud storage management, and Bycast is proud to have contributed many best-of-breed capabilities first pioneered in Bycast's StorageGRID HTTP API, used by hundreds of customers worldwide to store and access many dozens of petabytes of data in cloud environments, both public and private.

Why CDMI?

CDMI provides a standardized method by which data objects and metadata can be stored, accessed and managed within a cloud environment. It is intended to provide a consistent method for access by applications and end-users' systems, and provide a consistent interface for providers of cloud storage.

Currently, almost all of the cloud storage providers and vendors use significantly different APIs, which forces cloud application and gateway software vendors to code and test against different APIs, and having to architect their application around the lowest common denominator. CDMI significantly reduces the complexity of development, test and integration for the application vendor, and is specifically designed to be easy to adopt for both cloud providers and application vendors. CDMI can run along side existing cloud protocols, and, as an example, a customer could run a CDMI gateway in an EC2 instance to gain access to their existing Amazon S3 bucket without Amazon having to do any work — a great example of the power of cloud!

Much like SCSI, FiberChannel and TCP/IP, such industry-wide standards provide many advantages. These range from simple but essential efficiencies, such as standardized interface documentation, conformance and performance testing tools, the creation of a market for value-added tools such as protocol analyzers and developer awareness, libraries and code examples.

Industry standards also jump-start the network effect, where more applications encourage providers to support the standard, and more providers supporting the standard encourage application vendors to support the standard. Finally, and most excitingly, CDMI increases inter-cloud interoperability, and is a fundamental enabler for advanced emerging cloud models such as federation, peering and delegation, and the emergence of specialized clouds for content delivery, processing and preservation.

A Whirlwind Tour of CDMI

CDMI stores objects (data and metadata) in named containers that group the objects together. All stored objects are accessed by a web addresses that either contain a path (eg: http://cloud.example.com/myfiles/cdmi.txt) or an object identifier (eg: http://cloud.example.com/objectid/AABwbQAQvmAJSJWUHU3awAAA==).

CDMI provides a series of RESTful HTTP operations that can be used to access and manipulate a cloud storage system. PUT is used to create and update objects, GET is used to retrieve objects, HEAD is used to retrieve metadata about objects, and DELETE is used to remove objects.

Data stored in CDMI can be referenced between clouds (where one cloud points to another), copied and moved between clouds, and can be serialized into an export format that can be used to facilitate cloud-to-cloud transfers and customer bulk data transfers. All data-metadata relationships are preserved, and standard metadata is defined to allow a client to specify how the cloud storage system should manage the data. Examples of this "Data System Metadata" include the acceptable levels of latency and the degree of protection through replication.

In addition to basic objects and containers (similar to file and folder from a file system), CDMI also supports the concept of capabilities, which allow a client to discover what a cloud storage system is capable of doing. CDMI also supports accounts, which provide control and statistics over account security, usage and billing. Finally, CDMI supports queue data storage objects, which enable many exciting new possibilities for cloud storage.

In fact, queues are important and significant enough that I'll be writing more about them and what they enable in a subsequent blog entry.

The Next Steps

With CDMI now "out in the wild", this is the point where the standards effort starts to get really interesting. Up to this point, it has been a relatively small group that has been working on the standard, and we've had to make some controversial decisions (such as eliminating locking and versioning from the first release). There's still a lot of work to be done, and as CDMI gets more visibility, we look forward to increased involvement from other players in the industry. Together, we can make this standard even better, and help shape the future of cloud storage.

So, if you are interested in cloud storage and cloud storage APIs, I would strongly encourage you to take the time to read the CDMI draft documentation, and contribute your thoughts and suggestions.

We're proud of what we've achieved, and together, we can make it even better.

4 comments:

Kalpak Shah said...

Excellent! Such a standard will make writing multi-cloud applications much easier. But I think it will be difficult for cloud vendors to conform to these standards - atleast very soon. They hardware and software infrastructure may not allow them to tweak all of their current interfaces to this standard.

Also Amazon, with such a large portion of the market share may not care about supporting standards. If Amazon shifts to any standard, I am sure others will follow.

David Slik said...

Fortunately, CDMI has been designed to allow it to easily co-exist with the existing cloud vendor's APIs. You can store data via S3's API, for example, and retrieve it via CDMI, and use CDMI to manage existing objects within a bucket.

As a CDMI interface to Amazon S3 could run in an EC2 instance, there is very little friction for a user to start using CDMI, and nothing that Amazon must do.

This extends to the more advanced data system capabilities -- if you use CDMI to specify a degree of replication, then when you store an object into your S3 bucket using Amazon's cloud, the CDMI manager can detect this, look at the CDMI data system metadata of the parent container, and replicate the object to a second cloud, all transparently to the application.

This is why we chose to emphasize the "Control" aspect of the name of the protocol. It's not just a data path. (and in many ways, it's not the most efficient data path, at least, not yet)

Steve said...

Very promising work and we look forward to participating!

-Steve
CEO, Diomede Storage
http://www.diomedestorage.com/

David Slik said...

Excellent — We're glad to have you on board.

If you have any questions about how to join the SNIA technical working group, please don't hesitate to contact me via e-mail (dslik at bycast.com)