2010-02-24

The World's Shortest CDMI Implementation

The CDMI standard is built around the concept of "capabilities", which describe which functionality a given cloud storage system provides to clients. As a result, it permits a system to only implement a small subset of the standard while still being compliant.

So, to demonstrate this, I present to you the world's shortest* CDMI implementation, written in Ruby:
# World's smallest CDMI implementation, or, how to say NO in CDMI.
require 'socket'
require 'openssl'

listener = TCPServer.new('', 2000)

# Set up TLS
ssl_context = OpenSSL::SSL::SSLContext.new()
ssl_context.cert = OpenSSL::X509::Certificate.new(File.open("cdmi_server.cert"))
ssl_context.key = OpenSSL::PKey::RSA.new(File.open("cdmi_server.key"))
ssl_listener = OpenSSL::SSL::SSLServer.new(listener, ssl_context)

while (connection = ssl_listener.accept)
request = ""
while(request.index("\n\n") == nil)
request << connection.gets
end

print "-- CLIENT REQUEST -------------------------------------------------------------\n"
print request
print "-------------------------------------------------------------------------------\n"

if(request.index("GET ") == 0)
uri = request.slice(request.index(" ") + 1, request.length)
uri = uri.slice(0, uri.index(" "))

if(uri == "/")
connection.puts("HTTP/1.1 200 OK\nContent-Type: application/vnd.org.snia.cdmi.container+json\nX-CDMI-Specification-Version: 1.0\n\n{\"objectURI\" : \"/\", \"objectID\" : \"AABwbQAQgTpfe4qRBsyCCw==\", \"parentURI\" : \"/\", \"capabilitiesURI\" : \"/cdmi_capabilities/\", \"completionStatus\" : \"Complete\", \"metadata\" : {}, \"childrenrange\" : \"0-0\", \"children\" : [\"cdmi_capabilities/\"]}")
elsif(uri == "/cdmi_capabilities" || uri == "/cdmi_capabilities/" )
connection.puts("HTTP/1.1 200 OK\nContent-Type: application/vnd.org.snia.cdmi.capabilities+json\nX-CDMI-Specification-Version: 1.0\n\n{\"objectURI\" : \"/cdmi_capabilities/\", \"objectID\" : \"AABwbQAQnP7GJT2muKDelQ==\", \"parentURI\" : \"/\", \"capabilities\" : {\"cdmi_security_https_transport\" : \"true\", \"cdmi_read_metadata\" : \"true\", \"cdmi_list_children\" : \"true\"}, \"childrenrange\" : \"\", \"children\" : []}")
else
connection.puts("404 Not Found\n")
end
else
connection.puts("501 Not Implemented\n")
end

connection.close
end
* It's a little longer than it could be, because it is written for readability. Removal of comments, code tightening, etc, is left as an exercise for the reader.

Now, since this uses TLS, you'll need a TLS-compatible client. Here's a test client for this purpose:
# Test client for a minimal CDMI implementation
require 'socket'
require 'openssl'

socket = TCPSocket.new('localhost', 2000)

ssl_context = OpenSSL::SSL::SSLContext.new()
ssl_socket = OpenSSL::SSL::SSLSocket.new(socket, ssl_context)
ssl_socket.sync_close = true
ssl_socket.connect

ssl_socket.puts("GET #{ARGV[0]} HTTP/1.0")
ssl_socket.puts("accept: application/vnd.org.snia.cdmi.object+json")
ssl_socket.puts("X-CDMI-Specification-Version: 1.0")
ssl_socket.puts("")

print "-- SERVER RESPONSE ------------------------------------------------------------\n"
while line = ssl_socket.gets
print line
end
print "-------------------------------------------------------------------------------\n"
In order to run these, you need a x.509 certificate and key. You can generate these using OpenSSL. You can follow the below instructions for apache, then instead of step 5, copy the certificate to "cdmi_server.cert", and copy the key to "cdmi_server.key".

http://www.akadia.com/services/ssh_test_certificate.html

2010-02-23

A CDMI Tutorial - Data Management, Part 1

As we covered in first part of this tutorial, the SNIA Cloud Data Management Interface provides a RESTful mechanism for the basic storage and retrieval of data. However, the core of the standard is focused around data management — how objects are stored, delivered, placed, protected and more.

This post is the second in a series on CDMI. Subsequent posts will cover the following areas:
Data Management and Metadata

CDMI storage systems see the world as a tree of objects and containers. Objects store data, and containers contain child objects and containers. Regardless of if the data is stored via HTTP or via traditional protocols such as NFS or CIFS, all data represented through the CDMI protocol is fundamentally seen in terms of objects and containers.

Management of stored data is enabled through metadata. Every container and object can have metadata associated with it. Metadata in CDMI is organized into three general categories; user metadata, storage system metadata, and data system metadata.

User Metadata

User metadata is set directly by CDMI clients, or indirectly through the extended metadata interfaces of other access protocols. For example, in NFS, extended attributes can be mapped to CDMI user metadata items. User metadata items are arbitrary, and are not interpreted by the storage system.

Storage System Metadata

Storage system metadata are generated by the CDMI storage system, and provide read-only access to information about the stored data that is managed by the storage system. The creation time of a object is a good example of a storage system metadata item.

Data System Metadata

Data system metadata are provided by a CDMI client, or specified through an out-of-band management interface, and determine how the stored data should be managed. For example, data system metadata can specify the degree of replication, or an encryption level desired to protect data while stored on disk.

It is through the specification of data system metadata that CDMI enables the management of how data should be stored.

Example Object Metadata

As an example, let's assume that we have a CDMI-enabled storage system that also provides a NFS share, and we've stored some documents onto it.


We can then connect to the system via CDMI, and access the CDMI metadata of the "Documents" container:
GET /Documents/ HTTP/1.1
Host: cloud.example.com
Content-Type: application/vnd.org.snia.cdmi.object+json
X-CDMI-Specification-Version: 1.0

HTTP/1.1 200 OK
Content-Type: application/vnd.org.snia.cdmi.container+json
X-CDMI-Specification-Version: 1.0
{
"objectURI" : "/Documents/",
"objectID" : "AABwbQAQ8ypO85j/ml8TZQ==",
"parentURI" : "/",
"accountURI" : "/cdmi_accounts/default_account/",
"capabilitiesURI" : "/cdmi_capabilities/container/",
"percentageComplete" : "Complete",
"metadata" : {
"user.DosAttrib": "0x20",
"cdmi_ctime" : "2009-12-29T12:43:32.479832Z",
"cdmi_atime" : "2010-01-02T16:12:53.521983Z",
"cdmi_mtime" : "2010-01-02T16:12:53.521983Z",
"cdmi_acount" : "52",
"cdmi_mcount" : "12",
"ACL" : {
"acetype" : "0x00",
"identifier" : "jdoe",
"aceflags" : "0x03",
"acemask" : "0x000F005F",
"acetime" : "2009-12-29T12:43:32.479832Z"
},
"cdmi_data_redundancy": "2",
"cdmi_immediate_redundancy": "2",
"cdmi_infrastructure_redundancy": "2",
"cdmi_geographic_placement": [
"US"
],
"cdmi_encryption": "AES_256_CTR",
"cdmi_data_redundancy_billed": "2",
"cdmi_immediate_redundancy_billed": "2",
"cdmi_infrastructure_redundancy_billed": "1",
"cdmi_geographic_placement_billed": [
"US"
],
"cdmi_encryption_billed": "AES_256_CTR"
},
"childrenrange" : "1-3",
"children" : [
"Financials/",
"CDMI_Spec.pdf",
"hello.txt"
]
}
There is quite a few metadata items here, and it looks quite complex, but it's not as bad as it first appears. To see what these metadata items mean, and how they are used to manage stored data, let's review them individually:

User Metadata Items

The directory (container in CDMI speak) has a single user metadata item:
      "user.DOSATTRIB": "0x20",
This user metadata item is an extended attribute that indicates the archive bit is set in the DOS mode of a directory. In this case, this user metadata item was created when the directory was created by SAMBA, and the storage server presents extended attributes on the filesystem as user metadata items.

Storage System Metadata Items

The directory has five storage system metadata items, and an ACL, which is an example of a more complex storage system metadata item:
        "cdmi_ctime" : "2009-12-29T12:43:32.479832Z",
"cdmi_atime" : "2010-01-02T16:12:53.521983Z",
"cdmi_mtime" : "2010-01-02T16:12:53.521983Z",
"cdmi_acount" : "52",
"cdmi_mcount" : "12",
"ACL" : {
"acetype" : "0x00",
"identifier" : "jdoe",
"aceflags" : "0x03",
"acemask" : "0x000F005F",
"acetime" : "2009-12-29T12:43:32.479832Z"
},
In this example, the first three metadata items contain the creation, last access and last modify times, respectively, and the remaining two show the number of accesses and the number of modifications since creation. The ACL metadata specifies the access control restrictions for the folder, and is based on NFSv4 ACLs.

Data System Metadata Items

Finally, we have the data system metadata items. These items are specified by a CDMI client, or an out-of-band management application, and specify how the data in the "Documents" folder should be managed:
"cdmi_data_redundancy": "2",
"cdmi_immediate_redundancy": "2",
"cdmi_infrastructure_redundancy": "2",
"cdmi_geographic_placement": [
"US"
],
"cdmi_encryption": "AES_256_CBC",
The first data system metadata item, "cdmi_data_redundancy", indicates how many indpendent copies of the stored data should be kept. In this case, it has been set to two, which means that two copies of the data should be stored. Likewise, "cdmi_immediate_redundancy" indicates that two copies should be provided synchronously, "cdmi_infrastructure_redundancy" indicates that the two copies should be located in separate failure domains, "cdmi_geographic_placement" indicates that the copies should remain within the United States, and "cdmi_encryption" indicates that AES with a 256 bit key in counter mode should be used to protect the data.

It is important to note that data system metadata items expresses the desired management behaviour. This is separate from the actual management behaviour. In order to indicate to a client what management behaviours are actually being provided, CDMI provides a matching series of data system metadata items, ending with the suffix "_billed":
"cdmi_data_redundancy_billed": "2",
"cdmi_immediate_redundancy_billed": "2",
"cdmi_infrastructure_redundancy_billed": "1",
"cdmi_geographic_placement_billed": [
"US"
],
"cdmi_encryption_billed": "AES_256_CTR",
In this case, the data system metadata is specifying that the storage system is able to meet the requested level of redundancy, the requested immediate redundancy, but is not able to provide the requested infrastructure redundancy, or the requested encryption method. This allows a client to discover if the requested data system services are being provided.

Data Management Summary

So, putting this all together, data system metadata is specified or inherited from parent containers. The data system metadata specified the desired data system services for stored content. The system tries to accomplish the requested data system services, and indicates what services are actually being provided in the corresponding billed data system metadata items.

All of the data system metadata items specified for the "Documents" container will also be applied to the child "Financials" container, unless overridden by a data system metadata value specified there. For example, if the "Financials" container has the following data system metadata:
"cdmi_data_redundancy": "3",
Only the redundency is changed — all other items are inherited from the parent. Thus, the billed metadata values would be:
"cdmi_data_redundancy_billed": "3",
"cdmi_immediate_redundancy_billed": "2",
"cdmi_infrastructure_redundancy_billed": "1",
"cdmi_geographic_placement_billed": [
"US"
],
"cdmi_encryption_billed": "AES_256_CTR",
Currently all data system metadata items are inherited, and there is no way to override an inherited value except by specifying a new data system metadata item.

This concludes a quick overview of the basics of data management and metadata in CDMI. In the next part of the tutorial, we will discuss the various data system metadata services defined in CDMI.

2010-02-12

CDMI 1.0 Draft Available

All the hard work from the week long meeting of the members of the SNIA Cloud Storage Technical Work Group two weeks ago in San Jose have paid off, and we're proud to announce a final 1.0 draft of the Cloud Data Management Interface for cloud storage.

The specification can be downloaded from the Draft Technical Work for Public Review section of the SNIA web site, and is labelled CDMI 1.0g.

This release completes the remaining sections of the specification that were incomplete in the 0.9 draft, and clarifies a number of areas where the spec was unclear. Areas of improvement include:
  • Addition of Retention and Hold for compliance
  • Clarification of terminology from "Account" to "Domain"
  • Addition of Hash Data System Metadata
  • ACL syntax in JSON
  • Ability to perform cross-domain actions
  • CDMI Logging Queues
  • Elimination of JSON ordering constraints
  • Specification of Encryption Data System Metadata
  • Additional examples
We're very proud of how the specification has firmed up over the last few months, and it is very impressive that we've been able to go from forming a working group to a 1.0 draft in less than a year.

2010-02-02

Opaque Clouds and Transparent Clouds

There are two emerging usage models for cloud use that I am seeing emerging: Opaque Clouds and Transparent Clouds.

Opaque Clouds are clouds where users store pre-encrypted data to the cloud, such that the cloud operator has no visibility into the users' data. In this model, the encryption keys are owned and managed by the end user, and the cloud operator is not able to provide any value-added services that require access to the plaintext of the user's data.

Transparent Clouds are clouds where users submit data to the cloud (which may be encrypted during transmission and/or when stored), but the cloud operator is capable of having access to the users' data. In this model, the cloud operator either manages the encryption, or has access to the key repositories where the users' keys are stored. While transparent clouds can still be secure, there are additional security risks, as the cloud operator fundamentally must have access to the users' plaintext.

Both of these models have merits and use cases where they make sense. For example, if a first transparent cloud is using a second cloud to provide a second geographic location for data storage, the first cloud may store data into the second cloud in opaque fashion. A second scenario may be where a cloud user stores data to a cloud for the purposes of data sharing with another trusted user, but the cloud itself is not trusted. In this case, the keys would be shared between the two users, but the cloud would be unable to see the data stored.

Ultimately, I believe that opaque cloud storage will co-exist with transparent clouds, as both can operate concurrently with the same infrastructure. (translucent clouds?) As different use cases determine the security sensitivity of the data, and many of the cloud provided value added services (search, indexing, discovery, data mining, format conversion, etc) are quite compelling, it's going to be an interesting set of trade-offs between cloud security and cloud value.