2009-12-22

A CDMI Tutorial - Basic Input/Output

With the new 0.9 draft release of the SNIA Cloud Data Management Interface specification now released, this is a good time to highlight the aspects of the standard that make it well suited for cloud storage, and to review the changes from the previous 0.8 draft version.

This post is the first in a series on CDMI. Subsequent posts will cover the following areas:
Basic Input/Output

While supported, CDMI does not assume or require that clients interact with a storage system using the CDMI for storage and retrieval. CDMI can be used in conjunction with existing cloud protocols (such as Amazon's S3 API) and file system protocols (such as NFS, CIFS and WebDAV). The goal of CDMI is not to replace or even supplant these protocols — it is to provide a standardized access and management method that is independent from these protocols. So while storage and retrieval are important parts of the standard, they are only just one part of the standard. It is even possible to implement a fully complaint CDMI system that does not support the ability to store or retrieve data.

By not restricting to how data is stored or accessed, but still enabling a consistent and uniform way to access and manage stored data, CDMI enables a client to discover, access and manage stored content regardless of how it was originally stored, and regardless of the underlying storage implementation.

To get an idea of the basic object model by example, let's review basic storage and retrieval operations using CDMI:

Storing an Object

The most basic operations in CDMI are object storage and retrieval. Assuming that we have an authenticated session established to a CDMI cloud running at cloud.example.com and we want to store some text, here is the HTTP transaction that would be performed, as described in section 8.2 of the specification:
PUT /hello.txt HTTP/1.1
Host: cloud.example.com
Accept: application/vnd.org.snia.cdmi.dataobject+json
Content-Type: application/vnd.org.snia.cdmi.dataobject+json
X-CDMI-Specification-Version: 1.0
{
"mimetype" : "text/plain",
"value" : "Hello Cloud"
}

HTTP/1.1 201 Created
Content-Type: application/vnd.org.snia.cdmi.dataobject+json
X-CDMI-Specification-Version: 1.0
{
"objectURI" : "/hello.txt",
"objectID" : "AABwbQAQ810Mpei21pxfzA==",
"parentURI" : "/",
"accountURI" : "/cdmi_accounts/default_account/",
"capabilitiesURI" : "/cdmi_capabilities/dataobject/",
"completionStatus" : "Complete",
"mimetype" : "text/plain",
"metadata" : {
"cdmi_size" : "11"
}
}
Let's look at this transaction in more detail:

It's standard HTTP, and we specify that we will be submitting a request body in the form of a cdmi.dataobject, and requesting a response body in the form of a cdmi.dataobject. We submit in our request body JSON that includes a mimetype and value field, and receive in response a result (HTTP 201 Created) and a JSON structure that includes information about the newly created cloud object.

The response body JSON contains the following fields:

Field NameField Description
objectURIThe URI for the newly created object. This can be used to access the object via CDMI, and reflects the organization of the stored data. For example, if a storage system provides CDMI access to a NAS share, the objectURI reflects the file path.
objectIDEvery object within a CDMI system has a globally unique identifier that can be used to access the object. Object IDs remain constant for the life of the object, even if modified, renamed or even moved to a different cloud provided by a different vendor.
parentURIThe URI for the parent of the created object. Objects inherit metadata from their parent.
accountURIThe URI for the account that the object was created under. Accounts determine the billing relationship, reporting visibility and basic access permissions.
capabilitiesURIEvery object within a CDMI system has "capabilities", which describe what operations the system is capable of performing on that URI. This allows clients to discover the capabilities of a CDMI storage provider. This URI allows a client to find out what the capabilities of a given object are.
completionStatusThis field indicates if an object has been fully created. When performing operations that take long periods of time, such as serialization/deserialization, copies and storing large objects, this field may indicate that the object is still in progress, and thus is not ready to be accessed.
mimetypeThis field indicates the type of the value of an object, as specified at the time of creation.
metadataThis field includes user-specified metadata (not used in this example) and system generated metadata (one example of object size is shown in this example) related to the object. Metadata is a key part of the CDMI standard, and will be covered in more detail in later tutorials.

While this is the primary method to create objects in CDMI, there is an even simpler way:
PUT /hello2.txt HTTP/1.1
Host: cloud.example.com
Content-Type: text/plain
Content-Length: 11

Hello Cloud
HTTP/1.1 201 Created
This approach, described in section 8.3, uses completely standard HTTP to create a new CDMI object. The mimetype is specified through the Content-Type header, with the tradeoff that no CDMI-specific data is returned in response. The ability to support operations through 100% standard HTTP makes CDMI very easy to use from JavaScript and web environments, as does the use of JSON.

Listing Stored Object

As every data object has a parent, CDMI also allows the objects owned by a parent to be listed. This is performed by the following HTTP transaction, as described in section 9.4 of the specification:
GET / HTTP/1.1
Host: cloud.example.com
Content-Type: application/vnd.org.snia.cdmi.object+json
X-CDMI-Specification-Version: 1.0

HTTP/1.1 200 OK
Content-Type: application/vnd.org.snia.cdmi.container+json
X-CDMI-Specification-Version: 1.0
{
"objectURI" : "/",
"objectID" : "AABwbQAQI2MtLCwAVfYSFA==",
"parentURI" : "/",
"accountURI" : "/cdmi_accounts/default_account/",
"capabilitiesURI" : "/cdmi_capabilities/container/",
"percentageComplete" : "Complete",
"metadata" : {

},
"childrenrange" : "1-2",
"children" : [
"hello.txt",
"hello2.txt"
]
}
As a client does not know if a given URI is a data object or a container, it asks for a generic object. In this case, the URI, "/" is a container, so the cloud storage system returns with a response body of type cdmi.container. All common fields shared between data objects and containers have consistent meanings, but the two additional fields, "childrenrange" and "children" provide information about the objects contained by the container.

Like with the data object example above, there is a simpler way to get a list of the children of an object:
GET /?children HTTP/1.1
Host: cloud.example.com

HTTP/1.1 200 OK
Content-Type: text/json
{
"children" : [
"hello.txt",
"hello2.txt"
]
}
This approach, as described in section 9.5, illustrates the ability to request specific fields to be returned in the JSON response body, and is easily integrated into AJAX-style javascript for web-based applications.

Retrieving Stored Object

Retrieving a stored object is as straightforward as listing the children of a container:
GET /hello.txt HTTP/1.1
Host: cloud.example.com
Content-Type: application/vnd.org.snia.cdmi.object+json
X-CDMI-Specification-Version: 1.0

HTTP/1.1 200 OK
Content-Type: application/vnd.org.snia.cdmi.dataobject+json
X-CDMI-Specification-Version: 1.0
{
"objectURI" : "/hello.txt",
"objectID" : "AABwbQAQ810Mpei21pxfzA==",
"parentURI" : "/",
"accountURI" : "/cdmi_accounts/default_account/",
"capabilitiesURI" : "/cdmi_capabilities/dataobject/",
"completionStatus" : "Complete",
"mimetype" : "text/plain",
"metadata" : {
"cdmi_size" : "11"
},
"valuerange" : "0-11",
"value" : "Hello Cloud"
}
Like the JSON body when originally creating the data object, the CDMI storage system returns the fields associated with the object. But unlike the original create, it also returns the value range and the value of the object. This is described in more detail in section 8.4.

If just the value is desired, it can be requested as either a JSON content-type, or as a standard HTTP (non-CDMI transaction):
GET /hello2.txt HTTP/1.1
Host: cloud.example.com

HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 11

Hello Cloud
This allows CDMI clouds to be used to act as a standard web server, and enables the intriguing possibility that standard web servers could use CDMI as an upload, management and publishing protocol.

Summary

This provides a quick overview of the basics of writing, discovering and reading data in a CDMI cloud. In the next tutorial, we will review CDMI's data management functions, which rounds out the core of the proposed protocol.

2009-12-09

A Reference Architecture for Cloud Storage

Over the last twelve months, the SNIA Cloud Storage Technical Working Group has been busily defining an industry-wide standard for cloud data storage and management. This standard, the Cloud Data Management Interface (CDMI), is currently available for public review, with an updated draft scheduled for release in later in December.

The below diagram illustrates how the CDMI standard fits into emerging cloud ecosystems:

One of the most important aspects to understand about the CDMI standard is that it is not intended to replace existing cloud data access standards. This allows CDMI to be used with existing and new clouds that store data via non-CDMI protocols such as Amazon S3's API and traditional file system protocols such as NFS, CIFS, WebDAV.

CDMI builds on top of these existing data access paths to bring rich management semantics for data in the cloud, and facilitates emerging cloud use-cases including cloud peering, federation and differentiated services. In addition, CDMI provides standard mechanisms to enable the integration of clouds with other external systems (or clouds) for notification, workflow, audit, billing and authorization purposes.

A Reference Architecture for a CDMI-Native Cloud Storage

In order to illustrate how CDMI can enable to creation of next-generation cloud architectures, I've put together an example reference architecture for a CDMI-based cloud storage system, illustrating the different components and primary data flows that would be found in most cloud implementations that fully support CDMI.

The below diagram shows a logical representation of the major components of a cloud storage system build around the CDMI standard:

Looking at the diagram, we have the following major logical components:

Filesystem Clients and Protocol Gateways

CIFS, NFS, FTP, WebDAV, iSCSI, Fibre Channel, FCoE and other standard network file and block protocol clients can be attached to cloud storage via protocol gateways. These protocol gateways translate non-cloud storage protocols into CDMI cloud transactions, and are notified about changes to management metadata in the cloud such that it can export portions of the cloud via these non-cloud protocols.

Existing Cloud Clients and API Protocol Gateways

Existing cloud clients programmed to use cloud protocols such as Amazon's S3 HTTP API can communicate with a CDMI storage cloud through an API protocol gateway. An API protocol gateway translates these non-CDMI cloud storage protocols into CDMI cloud transactions. When a given non-CDMI cloud protocol supports features not directly supported by CDMI, the API gateway can either map these operations into CDMI operations, or use vendor extensions to CDMI to implement the required functionality.

CDMI Clients

Clients that implement the CDMI protocol natively can connect directly to the CDMI storage cloud without having to go through any protocol translation layers. These clients can access the full set of CDMI functionality, and can access and manage content stored by all clients, even if the content was not stored via CDMI.

Low Latency Object Stores

In almost all implementations, CDMI transactions will be routed to one or more low-latency object stores that are able to quickly satisfy storage and retrieval requests. Multiple low-latency object stores will often be connected together, peered, or federated, to allow for data dispersion, replication and other data management functions. Once safely stored, the cloud will be ready to interpret the data system metadata that determines the optimal placement characteristics desired by the cloud user.

High Latency Object Stores

Some vendors may choose to implement higher latency object stores, such as power-managed disk or tape storage tiers. Stored objects will typically be migrated to these stores from low-latency object stores based on the CDMI data system metadata.

Object Catalogue

In order to keep track of the locations where cloud objects are stored, and the namespace in which the objects are stored, many CDMI implementations will include an object catalogue. This catalogue keeps track of objects across all object stores (and external peered and federated clouds), and provides CDMI functions such as object notification, query and billing. Notification provides updates to internal cloud components and to external systems to enable additional cloud functions such as indexing, e-discovery, classification, object processing, format conversion, workflow and advanced analytics.

Data System Manager

In a CDMI system, the data system manager is responsible for interpreting the data system metadata specified for objects and containers through the CDMI interface, and performing operations to attempt to satisfy the requests for data dispersion, redundancy, geographic placement, performance, etc. Each time an object is created or modified, the data system manager will be notified, and if the constraints specified are not met, it can further replicate or migrate content across object stores or clouds. The SDSC iRODS (Integrated Rules-Oriented Data System) is an example of an existing Data System Manager that can play this role within or to federate clouds.

Putting it all Together

When you put all of these components together, you get a cloud that fully implements the main components of the CDMI specification, and provides rich cloud functionality. On top of this foundation, a cloud vendor can innovate and create additional value added services, such as integration with computing clouds, cloud virus scanning, QoS monitoring, and much, much more.

This degree of flexibility demonstrates much of the value that CDMI has to offer the industry, both to cloud users and vendors, and it will be exciting to see these sorts of interoperable architectures emerge over time as CDMI becomes adopted and the cloud marketplace matures.