A CDMI Tutorial - Basic Input/Output

With the new 0.9 draft release of the SNIA Cloud Data Management Interface specification now released, this is a good time to highlight the aspects of the standard that make it well suited for cloud storage, and to review the changes from the previous 0.8 draft version.

This post is the first in a series on CDMI. Subsequent posts will cover the following areas:
Basic Input/Output

While supported, CDMI does not assume or require that clients interact with a storage system using the CDMI for storage and retrieval. CDMI can be used in conjunction with existing cloud protocols (such as Amazon's S3 API) and file system protocols (such as NFS, CIFS and WebDAV). The goal of CDMI is not to replace or even supplant these protocols — it is to provide a standardized access and management method that is independent from these protocols. So while storage and retrieval are important parts of the standard, they are only just one part of the standard. It is even possible to implement a fully complaint CDMI system that does not support the ability to store or retrieve data.

By not restricting to how data is stored or accessed, but still enabling a consistent and uniform way to access and manage stored data, CDMI enables a client to discover, access and manage stored content regardless of how it was originally stored, and regardless of the underlying storage implementation.

To get an idea of the basic object model by example, let's review basic storage and retrieval operations using CDMI:

Storing an Object

The most basic operations in CDMI are object storage and retrieval. Assuming that we have an authenticated session established to a CDMI cloud running at cloud.example.com and we want to store some text, here is the HTTP transaction that would be performed, as described in section 8.2 of the specification:
PUT /hello.txt HTTP/1.1
Host: cloud.example.com
Accept: application/vnd.org.snia.cdmi.dataobject+json
Content-Type: application/vnd.org.snia.cdmi.dataobject+json
X-CDMI-Specification-Version: 1.0
"mimetype" : "text/plain",
"value" : "Hello Cloud"

HTTP/1.1 201 Created
Content-Type: application/vnd.org.snia.cdmi.dataobject+json
X-CDMI-Specification-Version: 1.0
"objectURI" : "/hello.txt",
"objectID" : "AABwbQAQ810Mpei21pxfzA==",
"parentURI" : "/",
"accountURI" : "/cdmi_accounts/default_account/",
"capabilitiesURI" : "/cdmi_capabilities/dataobject/",
"completionStatus" : "Complete",
"mimetype" : "text/plain",
"metadata" : {
"cdmi_size" : "11"
Let's look at this transaction in more detail:

It's standard HTTP, and we specify that we will be submitting a request body in the form of a cdmi.dataobject, and requesting a response body in the form of a cdmi.dataobject. We submit in our request body JSON that includes a mimetype and value field, and receive in response a result (HTTP 201 Created) and a JSON structure that includes information about the newly created cloud object.

The response body JSON contains the following fields:

Field NameField Description
objectURIThe URI for the newly created object. This can be used to access the object via CDMI, and reflects the organization of the stored data. For example, if a storage system provides CDMI access to a NAS share, the objectURI reflects the file path.
objectIDEvery object within a CDMI system has a globally unique identifier that can be used to access the object. Object IDs remain constant for the life of the object, even if modified, renamed or even moved to a different cloud provided by a different vendor.
parentURIThe URI for the parent of the created object. Objects inherit metadata from their parent.
accountURIThe URI for the account that the object was created under. Accounts determine the billing relationship, reporting visibility and basic access permissions.
capabilitiesURIEvery object within a CDMI system has "capabilities", which describe what operations the system is capable of performing on that URI. This allows clients to discover the capabilities of a CDMI storage provider. This URI allows a client to find out what the capabilities of a given object are.
completionStatusThis field indicates if an object has been fully created. When performing operations that take long periods of time, such as serialization/deserialization, copies and storing large objects, this field may indicate that the object is still in progress, and thus is not ready to be accessed.
mimetypeThis field indicates the type of the value of an object, as specified at the time of creation.
metadataThis field includes user-specified metadata (not used in this example) and system generated metadata (one example of object size is shown in this example) related to the object. Metadata is a key part of the CDMI standard, and will be covered in more detail in later tutorials.

While this is the primary method to create objects in CDMI, there is an even simpler way:
PUT /hello2.txt HTTP/1.1
Host: cloud.example.com
Content-Type: text/plain
Content-Length: 11

Hello Cloud
HTTP/1.1 201 Created
This approach, described in section 8.3, uses completely standard HTTP to create a new CDMI object. The mimetype is specified through the Content-Type header, with the tradeoff that no CDMI-specific data is returned in response. The ability to support operations through 100% standard HTTP makes CDMI very easy to use from JavaScript and web environments, as does the use of JSON.

Listing Stored Object

As every data object has a parent, CDMI also allows the objects owned by a parent to be listed. This is performed by the following HTTP transaction, as described in section 9.4 of the specification:
GET / HTTP/1.1
Host: cloud.example.com
Content-Type: application/vnd.org.snia.cdmi.object+json
X-CDMI-Specification-Version: 1.0

HTTP/1.1 200 OK
Content-Type: application/vnd.org.snia.cdmi.container+json
X-CDMI-Specification-Version: 1.0
"objectURI" : "/",
"objectID" : "AABwbQAQI2MtLCwAVfYSFA==",
"parentURI" : "/",
"accountURI" : "/cdmi_accounts/default_account/",
"capabilitiesURI" : "/cdmi_capabilities/container/",
"percentageComplete" : "Complete",
"metadata" : {

"childrenrange" : "1-2",
"children" : [
As a client does not know if a given URI is a data object or a container, it asks for a generic object. In this case, the URI, "/" is a container, so the cloud storage system returns with a response body of type cdmi.container. All common fields shared between data objects and containers have consistent meanings, but the two additional fields, "childrenrange" and "children" provide information about the objects contained by the container.

Like with the data object example above, there is a simpler way to get a list of the children of an object:
GET /?children HTTP/1.1
Host: cloud.example.com

HTTP/1.1 200 OK
Content-Type: text/json
"children" : [
This approach, as described in section 9.5, illustrates the ability to request specific fields to be returned in the JSON response body, and is easily integrated into AJAX-style javascript for web-based applications.

Retrieving Stored Object

Retrieving a stored object is as straightforward as listing the children of a container:
GET /hello.txt HTTP/1.1
Host: cloud.example.com
Content-Type: application/vnd.org.snia.cdmi.object+json
X-CDMI-Specification-Version: 1.0

HTTP/1.1 200 OK
Content-Type: application/vnd.org.snia.cdmi.dataobject+json
X-CDMI-Specification-Version: 1.0
"objectURI" : "/hello.txt",
"objectID" : "AABwbQAQ810Mpei21pxfzA==",
"parentURI" : "/",
"accountURI" : "/cdmi_accounts/default_account/",
"capabilitiesURI" : "/cdmi_capabilities/dataobject/",
"completionStatus" : "Complete",
"mimetype" : "text/plain",
"metadata" : {
"cdmi_size" : "11"
"valuerange" : "0-11",
"value" : "Hello Cloud"
Like the JSON body when originally creating the data object, the CDMI storage system returns the fields associated with the object. But unlike the original create, it also returns the value range and the value of the object. This is described in more detail in section 8.4.

If just the value is desired, it can be requested as either a JSON content-type, or as a standard HTTP (non-CDMI transaction):
GET /hello2.txt HTTP/1.1
Host: cloud.example.com

HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 11

Hello Cloud
This allows CDMI clouds to be used to act as a standard web server, and enables the intriguing possibility that standard web servers could use CDMI as an upload, management and publishing protocol.


This provides a quick overview of the basics of writing, discovering and reading data in a CDMI cloud. In the next tutorial, we will review CDMI's data management functions, which rounds out the core of the proposed protocol.


A Reference Architecture for Cloud Storage

Over the last twelve months, the SNIA Cloud Storage Technical Working Group has been busily defining an industry-wide standard for cloud data storage and management. This standard, the Cloud Data Management Interface (CDMI), is currently available for public review, with an updated draft scheduled for release in later in December.

The below diagram illustrates how the CDMI standard fits into emerging cloud ecosystems:

One of the most important aspects to understand about the CDMI standard is that it is not intended to replace existing cloud data access standards. This allows CDMI to be used with existing and new clouds that store data via non-CDMI protocols such as Amazon S3's API and traditional file system protocols such as NFS, CIFS, WebDAV.

CDMI builds on top of these existing data access paths to bring rich management semantics for data in the cloud, and facilitates emerging cloud use-cases including cloud peering, federation and differentiated services. In addition, CDMI provides standard mechanisms to enable the integration of clouds with other external systems (or clouds) for notification, workflow, audit, billing and authorization purposes.

A Reference Architecture for a CDMI-Native Cloud Storage

In order to illustrate how CDMI can enable to creation of next-generation cloud architectures, I've put together an example reference architecture for a CDMI-based cloud storage system, illustrating the different components and primary data flows that would be found in most cloud implementations that fully support CDMI.

The below diagram shows a logical representation of the major components of a cloud storage system build around the CDMI standard:

Looking at the diagram, we have the following major logical components:

Filesystem Clients and Protocol Gateways

CIFS, NFS, FTP, WebDAV, iSCSI, Fibre Channel, FCoE and other standard network file and block protocol clients can be attached to cloud storage via protocol gateways. These protocol gateways translate non-cloud storage protocols into CDMI cloud transactions, and are notified about changes to management metadata in the cloud such that it can export portions of the cloud via these non-cloud protocols.

Existing Cloud Clients and API Protocol Gateways

Existing cloud clients programmed to use cloud protocols such as Amazon's S3 HTTP API can communicate with a CDMI storage cloud through an API protocol gateway. An API protocol gateway translates these non-CDMI cloud storage protocols into CDMI cloud transactions. When a given non-CDMI cloud protocol supports features not directly supported by CDMI, the API gateway can either map these operations into CDMI operations, or use vendor extensions to CDMI to implement the required functionality.

CDMI Clients

Clients that implement the CDMI protocol natively can connect directly to the CDMI storage cloud without having to go through any protocol translation layers. These clients can access the full set of CDMI functionality, and can access and manage content stored by all clients, even if the content was not stored via CDMI.

Low Latency Object Stores

In almost all implementations, CDMI transactions will be routed to one or more low-latency object stores that are able to quickly satisfy storage and retrieval requests. Multiple low-latency object stores will often be connected together, peered, or federated, to allow for data dispersion, replication and other data management functions. Once safely stored, the cloud will be ready to interpret the data system metadata that determines the optimal placement characteristics desired by the cloud user.

High Latency Object Stores

Some vendors may choose to implement higher latency object stores, such as power-managed disk or tape storage tiers. Stored objects will typically be migrated to these stores from low-latency object stores based on the CDMI data system metadata.

Object Catalogue

In order to keep track of the locations where cloud objects are stored, and the namespace in which the objects are stored, many CDMI implementations will include an object catalogue. This catalogue keeps track of objects across all object stores (and external peered and federated clouds), and provides CDMI functions such as object notification, query and billing. Notification provides updates to internal cloud components and to external systems to enable additional cloud functions such as indexing, e-discovery, classification, object processing, format conversion, workflow and advanced analytics.

Data System Manager

In a CDMI system, the data system manager is responsible for interpreting the data system metadata specified for objects and containers through the CDMI interface, and performing operations to attempt to satisfy the requests for data dispersion, redundancy, geographic placement, performance, etc. Each time an object is created or modified, the data system manager will be notified, and if the constraints specified are not met, it can further replicate or migrate content across object stores or clouds. The SDSC iRODS (Integrated Rules-Oriented Data System) is an example of an existing Data System Manager that can play this role within or to federate clouds.

Putting it all Together

When you put all of these components together, you get a cloud that fully implements the main components of the CDMI specification, and provides rich cloud functionality. On top of this foundation, a cloud vendor can innovate and create additional value added services, such as integration with computing clouds, cloud virus scanning, QoS monitoring, and much, much more.

This degree of flexibility demonstrates much of the value that CDMI has to offer the industry, both to cloud users and vendors, and it will be exciting to see these sorts of interoperable architectures emerge over time as CDMI becomes adopted and the cloud marketplace matures.


How low can disk go?

The capital acquisition cost of disk-based archiving solutions (in cost per terabyte) has dramatically fallen over the last five years. Unfortunately, the rate of reduction in cost is slowing as the cost approaches the raw cost of the disks included with the storage system.

The four major factors that have driven the reduction of the cost of disk-based archiving are as follows:
  • Increasing disk capacity (density) for a hard drive of a given price
  • Transition from the use of enterprise disks to the use of consumer grade SATA disks
  • Transition from storage interconnection via Fibre Channel to switched SAS and Ethernet
  • Transition from customer storage controllers to commodity servers and software
When analysing the costs of an disk archive, much of these savings come from reducing the cost of the non-disk components. In a large disk archival systems, the system price divided by the number of disks is now approaching the over-the-shelf price of a consumer disk (around $100 per TB). This is an important, because it indicates that most of the cost gains (from factors 2 through 4) are unsustainable in the long term. The closer the cost of the system approaches the cost of the raw storage, the less cost reductions can be achieved.

As a consequence, the rapid decrease in the cost of disk-based archiving is not a result of the intrinsic reduction in the cost of disk storage, but rather is a reduction in the cost of the overall system. And now that this reduction in cost has already largely occurred, the rate of cost reduction flattens out to more closely approximate the reduction in the cost resulting from increasing disk density.

This is a critical point to understand when comparing the costs of disk archiving to tape archiving, since many cost projections have made the assumption that this rate of reduction of disk cost will continue into the future.


The Demise of Tape is Overrated

Christopher Poelker, in a blog post on ComputerWorld titled, Is Tape Really Dead, makes a series of assertions about the superiority of disk over tape. Now to give credit where it is due, he is talking about tape's role in backup, and does conclude that tape still has a role to play. Unfortunately, many of his statements made along the way are simply inaccurate.

Chris states:

Everyone is aware of the limitations of tape solutions.

Sequential accessRandom access
Relatively slowFast
Shipped offsiteElectronically vaulted
Once a day processPeriodic or continuous
High operational touchAutomated
Inexpensive mediaMore expensive

Let's look at each of these in turn:

Sequential Access vs. Random Access
What Chris is getting at here is seek latency. Both tape and disk provide full random access, only disk is faster at it than tape. However, as hard disk capacities have increased while access bandwidth remains largely constant, from a software architecture standpoint, disks look more and more like tape. This is what is leading to the collapse of RAID 5 as a means to protect data, and, in my opinion, is what will ultimately lead to the death of disk. But more on this in a subsequent blog post.

Relatively slow vs. Fast
Tape systems, when properly used, can provide extremely high levels of throughput performance, into the 10 Gigabit/sec ranges.

Individual tape drives can already stream sequentially accessed data faster than most hard disks (120 MBytes/sec), and LTO5 will increase this lead further. When randomly accessing data spread across a tape or disk, the disk will outperform tape due to lower seek latencies. And, of course, if seek latencies are important to you, you should be looking at flash.

Shipped offsite vs. Electronically vaulted
Disk drives are far more vulnerable to damage than tapes, and simply don't have the flexibility to be able to be shipped around the same way. "Electronic vaulting" often equals expensive WAN data transfers and higher costs for power and equipment.

Once a day process vs. Periodic or continuous
This would be true if we're talking about a pure tape solution, but tape-based systems have been deployed along side disk in a storage hierarchy for decades.

High operational touch vs. Automated
Baby-sitting ten thousand disks isn't low operational touch either. Disks fail continuously, and the wrong swaps can destroy an entire RAID set. Many large archives run with tens of thousands to hundreds of thousands of tapes with very little operator intervention, and modern automated libraries are highly reliable and fault tolerant.

No dedupe vs. dedupe
This is false, as dedupe is yet another data compression technique, and applies equally well to tape as it does to disk. Again, the use of dedupe on tape in backup and archiving systems goes back decades.

Inexpensive media vs. More Expensive
We're thrown a bone here in the cost department, but what isn't considered is in addition to the consumables (disks and tapes), the disk subsystems themselves must be replaced on a far more frequent basis than tape libraries. With tape libraries, the drives can be swapped out and the tapes migrated to newer, higher capacity media without having to replace the entire library.

This also does not take into account the far higher opex costs of power and heat required for disk-based solutions.

Despite the near continuous siren call of the "Tape is Dead" crowd, tape provides significant value, often higher value for the dollar than disk, and has a long life before it. And, in many ways, it is spinning disk that should be more worried about its life in the coming decade.


The Roles of CDMI

The proposed SNIA Cloud Data Management Interface standard (CDMI) is intended to address a wide variety of different use cases, as described in the draft SNIA Cloud Storage Use Cases document.

Specifically, CDMI has the several distinct and overlapping design goals:

1. To act as a cloud management protocol for other non-cloud data access protocols, without providing data access.

The use case for this mode of use is, for example, the management of a SAN or NAS fabric allowing the provisioning and specification of data system metadata for opaque LUNs which can be dynamically provisioned programatically, for example, in conjunction with OCCI in a cloud computing environment. In this case, there is no data access via CDMI, only management and accounting access.

2. To act as an cloud management protocol and as a secondary cloud data access protocol to existing cloud and unified storage systems.

The use case for this mode of use is, for example, is to provide consistent management access to existing unified storage systems that provide block, file and object protocols. For example, a Amazon EC2 instance could be run that exposes an S3 bucket through CDMI, manages Elastic Block Storage LUNs, and implements some of the data system metadata functionality.

3. To act as a primary cloud management and cloud data access protocol for next-generation cloud storage systems

The use case for this mode of use, for example, is to enable a superset of cloud data access, manipulation and management functions, and to enable advanced scenarios such as distributed application systems build around cloud storage, cloud federation, peering and delegation. For example, a cloud could provide CDMI access to objects for cloud applications, all while manage file-system views into the object space from remote edge NAS gateways, all while federating together existing enterprise storage and public and private clouds.


Queues in a Cloud

Queues are extremely powerful approaches to data storage and data exchange that are commonly used in application programming, especially when components of a program are executed in parallel. By supporting queues, cloud storage systems provide safe and efficient mechanisms by which programs can persist their state, communicate between components, and interact with other programs and systems.

What is a Queue

A Queue is an object with zero or more data "records", such that only the oldest item is accessed at any given time. Once the oldest item is removed, the second-oldest item is accessed, and so forth until the queue is empty. This provides what is known as "first-in, first-out" access to data.

In a standard object, when you update data, you update the only record in the object. But in a queue, this creates a new record. When you read the value of an object, you get the only record in the object, but in a queue, this returns the value of the oldest record. And when you delete the value of an object, you delete the object, but in a queue, you delete only the oldest record. Thus, unlike basic objects, updating and deleting records from a queue are no longer idempotent operations.

How are Queues Typically Used

Queues are typically used in two different situations:

Internal Work List - Inside a program, the program will store a queue of data that needs to be processed in a given order, such that as resources become available, the items from the queue can be accessed in order. In this use, the queue is being used to store state.

Inter-Process Communication - Between two programs, a queue will contain a list of items that need to be communicated from one program to another. As the sender program encounters data that needs to be sent to the receiver program, it enqueues the items into the queue, and as the receiver program is able to process data from the sender program, it dequeues items from the queue. In this use, the queue is being used to exchange state.

As a quick aside, TCP/IP is an example of a queue used for information exchange between two systems. When you write data to a TCP/IP connection, you are enqueueing the data for delivery, and when the destination application reads from the TCP/IP connection, it is dequeuing the data from the network abstraction.

These uses are best illustrated through an example. Let's say that we are designing a book scanning system that runs in the cloud. We have a series of image scanners that digitize pages of books, and we have an OCR program that converts the images into text that can be indexed for search.

A simple implementation would be to scan a page, OCR it, then move on to the next page, but that doesn't meet all of the criteria of a cloud solution, as we can't scale it. A good solution would be able to handle multiple scanners, and have multiple instances of the OCR process running in parallel. And that calls for queues.

Handling Multiple Writers

A queue can be used to aggregate data values from multiple writers into a single ordered set of values. This behaviour is perfect for logging, job aggregation and any other situation where data originating from multiple entities needs to be consumed by a single entity.

For example, in our book scanning example, a typical scanning workflow may have tens to hundreds of scanners running concurrently, where each scanned page needs to be run through an OCR (Optical Character Recognition) process in order to index the contents of each scanned page.

A cloud application built around queues is easily scaled by running multiple parallel instances of the scanning application, with each of the instances using a common cloud for storage. If all of these instances store scanned pages into the same queue, the interface to the OCR process (the queue) is the same regardless of the number of writers.

The logic for the scanning process would look something like this:

1. Scan Page
2. Write image as object into cloud
3. Enqueue image object ID into cloud
4. Repeat

The logic for the OCR process would look something like this:

1. Read from cloud queue to get the object ID of the next image to process
2. Read the image from cloud using the object ID
3. Perform OCR processing
4. Add OCR text to Index
5. Delete item from cloud queue
6. Repeat

Now, we can arbitrarily scale the number of scanning processes. But what if our OCR takes too long to handle the combined workload of all of these scanning processes?

Handling Multiple Readers

If our OCR processing takes far longer than the time required to scan a page, we need to be able to increase the performance of the OCR processing. And when we add multiple parallel page scanners, things get even worse. We could try to make a faster OCR engine, but it's far easier to be able to scale out the OCR processing by running multiple instances of the OCR processing in parallel.

In this case, we need to be able to have multiple readers of the queue. And, we need to ensure the following characteristics of our solution:

1. No two queue readers will get the same item
2. No items will be lost, even in the event of a queue reader failure

If you just run two queue readers in parallel, both of these situations can occur. If the two readers run lock-step, they will get the same item. And if they both delete, the second deleted item will be lost.

Thus, we need to introduce another concept in order to maintain these characteristics — the ability to atomically transfer an item from one queue to another. With this capability, our OCR process can be modified to ensure that even with multiple readers, no data is lost or processed twice:

1. Transfer item from cloud queue to worker queue
2. Read from worker queue to get the object ID of the next image to process
3. Read the image from cloud using the object ID
4. Perform OCR processing
5. Add OCR text to Index
6. Delete item from worker queue
7. Repeat

Using this approach, you can arbitrarily scale the number of reader processes without modification. And these workers can enqueue their results into a common queue, allowing the results to be recombined for further processing.

By leveraging queues, cloud storage allows the creation of complex reliable workflows that can be scaled arbitrarily and dynamically. It also facilitates the creation of loosely coupled reliable systems of interacting programs that work together to solve a given problem in a flexible and scalable manner.


The Worst Case Scenario

What do you do when the unthinkable happens, and you lose your entire primary site of operations, including your local cloud storage? It could be a flood, a hurricane, or something as commonplace as a building fire, but it's happened — and now all of your infrastructure and storage media associated with your archive is gone.

But life continues on, and so does your business. Assuming your archive is a critical part of your workflow and corporate assets, how can you protect against these major disruptions?

To the right is a simplified flowchart to help determine when you should protect against site-loss scenarios, and what options are available to prevent loss of archived data and ensure continued access.
Click on the image to enlarge.

Single-Site Archives

If all of the equipment and storage associated with your archive is located at a single site, and that site is lost, unless you have some form of off-site storage, all data stored in the archive will be lost.

While a multi-site archive is the best solution to avoid data loss in this scenario, for cost-sensitive data where rapid restoration of data access and operations is not required, a multi-site archive may be more expensive than is warranted. In this case, two common options are to create two tape copies and have them vaulted off-site, or to store a copy of the data into a public cloud provider, such as Amazon, Iron Mountain, or Diomede Storage.

Vaulting to Tape

Storing to tape, and vaulting the tapes off-site is the least expensive option for protecting archived data. In this scenario, a node would be added to the single-site archive that creates two identical tapes containing recently archived data. These tapes are then sent off-site.

The ability for an archive to be restored from the "bare metal", from only the data objects being archived, is a very important feature of an archival system. This ensures that even if the control databases are lost, the archived data can still be accessed, and the archive can be rebuilt.

When planning a tape vaulting approach, the frequency that these tapes are created determines how much data is at risk of loss in the event of the loss of the site. For example, if tapes are generated every week, and take two business days worst case to be taken off-site, then the business can have an exposure window of up to twelve calendar days.

In the event of the catastrophic loss of the primary site, these tapes would have to be recalled from the vaulting provider, which can take some time, and hardware would have to be re-acquired to rebuild the archive. Don't underestimate the amount of time required to re-order hardware. Often the original equipment is no longer available, so a new archive will need to be specified and ordered, and can take weeks for the servers to be assembled and shipped.

Once the tapes have arrived and the hardware has been set up, the archive is rebuilt from the data stored on the tapes, and once the last tape is processed, the archive is ready to be used. This is known as a "bare-metal restore".

Of course, depending on the size of the archive, this could take a very long time. An 1 PB media archive would take 115 days to restore when running at a restore load of 800 Mbits/s, and a 10 billion object e-mail archive would take 115 days to restore when running at a restore load of 1000 objects per second. Rebuild times must be taken into account when planing for archive restoration, and often the cost of downtime associated such a restore is high enough that cloud or multi-site options are considered instead.

Storing to the Cloud (Hybrid Cloud)

Another option for single-site archives is to store a copy of the archived data to a cloud storage provider. This eliminates the headaches associated with tape management, but introduces the requirement for network connectivity to the provider. In this scenario, for each archived object to be protected off-site, the object is also stored to a cloud provider, which retains the data in the event that a restore is needed.

Unlike with tape archiving, data is stored immediately to the cloud, limited only by WAN bandwidth. However, this limitation can be substantial, and when bandwidth is insufficient, data will be at risk until the backlog clears. If data is being stored to the archive at 100 Mbits/s, an OC-3 class Internet connection would be required, which can be far more expensive than sending twenty tapes out each week.

In the event of the catastrophic loss of the primary site, hardware would be re-acquired to rebuild the archive, and network connectivity would need to be acquired to allow connectivity to the cloud. When both of these are operational, the archive would be reconnected to the cloud. This would restore access to the archived data, albeit limited by the network bandwidth. Over time, the on-site archived data can be then restored back over the WAN.

The primary disadvantages of this approach are the time required to get the hardware and network access for restoring the on-site component of the archive, with the second disadvantage being cost. Fears about unauthorized disclosure of data and loss of control over data are also common, though they can be mitigated with the appropriate use of encryption.

And often, for less than the price charged by most public cloud providers, one can afford to create a multi-site archive, either across multiple premises owned by the business, or into a second premise hosted by a third party.

Why not just use the Cloud?

Some public cloud providers encourage an architecture that has a minimal on-site presence, and stores all data off-site in the cloud. For some scenarios, this approach works very well, as it minimizes capital costs and minimizes the time required restore hardware and access in the event of a disaster. However, one must have sufficient WAN bandwidth for the expected store and retrieve loads (as opposed to just store-only traffic loads when using the cloud as a storage target), and in the event of a network connectivity failure, access to most or all of the archive can be disrupted.

This is contrasted with the hybrid cloud model, where the private cloud on-site allows continued access to the data even during WAN failures, and the public cloud is used as a low-cost data storage target.

Multi-Site Archives

When continuance of business operations are important, or archival data must be accessible across multiple sites, the archive can be extended to span multiple sites. This involves several considerations, including:
  1. What data is created in one site and accessed in another?
  2. What data should be replicated to other sites for protection?
  3. What data should be replicated to other sites for performance?
  4. In the event of a site loss scenario, what will be the additional load placed on other sites?
Such multi-site archives are very flexible, and allow seamless continuance of operations even in the event of major and catastrophic failures that affect one or more sites. While this is obviously the best solution from a business continunace standpoint, it is also the most expensive, as you must duplicate your entire archive infrastructure in multiple sites, and provide sufficient WAN bandwidth for cross-site data replication.

Of course, one can also deploy systems like Bycast's StorageGRID to provide a mixture of the above described approaches, using policies to determine which archived content is stored locally, vaulted to tape, stored in a public cloud, and replicated across multiple sites. This flexibility allows the value to the data to be mapped to the cost of the storage, and leverages a common infrastructure for all levels of protection required.


Introducing CDMI

Today, the Storage Networking Industry Association (SNIA) publicly released the first draft of the Cloud Data Management Interface (CDMI). The draft standard can be downloaded at the below address:


I'm very pleased to have been a significant contributor to this standard since the inception of the working group earlier this year. Over the last nine months, we've been able to come a long way towards defining a working standard for cloud storage management, and Bycast is proud to have contributed many best-of-breed capabilities first pioneered in Bycast's StorageGRID HTTP API, used by hundreds of customers worldwide to store and access many dozens of petabytes of data in cloud environments, both public and private.


CDMI provides a standardized method by which data objects and metadata can be stored, accessed and managed within a cloud environment. It is intended to provide a consistent method for access by applications and end-users' systems, and provide a consistent interface for providers of cloud storage.

Currently, almost all of the cloud storage providers and vendors use significantly different APIs, which forces cloud application and gateway software vendors to code and test against different APIs, and having to architect their application around the lowest common denominator. CDMI significantly reduces the complexity of development, test and integration for the application vendor, and is specifically designed to be easy to adopt for both cloud providers and application vendors. CDMI can run along side existing cloud protocols, and, as an example, a customer could run a CDMI gateway in an EC2 instance to gain access to their existing Amazon S3 bucket without Amazon having to do any work — a great example of the power of cloud!

Much like SCSI, FiberChannel and TCP/IP, such industry-wide standards provide many advantages. These range from simple but essential efficiencies, such as standardized interface documentation, conformance and performance testing tools, the creation of a market for value-added tools such as protocol analyzers and developer awareness, libraries and code examples.

Industry standards also jump-start the network effect, where more applications encourage providers to support the standard, and more providers supporting the standard encourage application vendors to support the standard. Finally, and most excitingly, CDMI increases inter-cloud interoperability, and is a fundamental enabler for advanced emerging cloud models such as federation, peering and delegation, and the emergence of specialized clouds for content delivery, processing and preservation.

A Whirlwind Tour of CDMI

CDMI stores objects (data and metadata) in named containers that group the objects together. All stored objects are accessed by a web addresses that either contain a path (eg: http://cloud.example.com/myfiles/cdmi.txt) or an object identifier (eg: http://cloud.example.com/objectid/AABwbQAQvmAJSJWUHU3awAAA==).

CDMI provides a series of RESTful HTTP operations that can be used to access and manipulate a cloud storage system. PUT is used to create and update objects, GET is used to retrieve objects, HEAD is used to retrieve metadata about objects, and DELETE is used to remove objects.

Data stored in CDMI can be referenced between clouds (where one cloud points to another), copied and moved between clouds, and can be serialized into an export format that can be used to facilitate cloud-to-cloud transfers and customer bulk data transfers. All data-metadata relationships are preserved, and standard metadata is defined to allow a client to specify how the cloud storage system should manage the data. Examples of this "Data System Metadata" include the acceptable levels of latency and the degree of protection through replication.

In addition to basic objects and containers (similar to file and folder from a file system), CDMI also supports the concept of capabilities, which allow a client to discover what a cloud storage system is capable of doing. CDMI also supports accounts, which provide control and statistics over account security, usage and billing. Finally, CDMI supports queue data storage objects, which enable many exciting new possibilities for cloud storage.

In fact, queues are important and significant enough that I'll be writing more about them and what they enable in a subsequent blog entry.

The Next Steps

With CDMI now "out in the wild", this is the point where the standards effort starts to get really interesting. Up to this point, it has been a relatively small group that has been working on the standard, and we've had to make some controversial decisions (such as eliminating locking and versioning from the first release). There's still a lot of work to be done, and as CDMI gets more visibility, we look forward to increased involvement from other players in the industry. Together, we can make this standard even better, and help shape the future of cloud storage.

So, if you are interested in cloud storage and cloud storage APIs, I would strongly encourage you to take the time to read the CDMI draft documentation, and contribute your thoughts and suggestions.

We're proud of what we've achieved, and together, we can make it even better.


Cloud Computing and Cloud Storage Standards

All of our work at the Cloud Storage technical working group in the Storage Networking Industry Association (SNIA) has been coming together, and we are nearing a public release of the Cloud Data Management Interface (CDMI).

We've also been working with the Open Grid Focum (OGF) on making it such that CDMI can be used in conjunction with the OCCI standard to manage data storage in cloud computing environments, and there are some exciting possibilities of combining CDMI with the VMWare vCloud standard that was recently donated as a potential industry standard to the DMTF.

You can read our joint whitepaper on CDMI and OCCI at the below URI:


If any readers have any questions about CDMI and how it facilitates cloud computing, please don't hesitate to comment!


Cloud Computing needs Cloud Storage

I recently spoke on a panel at the Cloud User 2009, held in San Diego, on the subject of "Navigating the Cloud Vendor Community to Achieve Sped of Migration and Ease of Use". We were originally scheduled for an hour of discussion, but due to a subsequent presenter not showing up, and great interest from the audience, we went a half an hour over schedule.

My presentation was focused on the need for all IT infrastructure to be capable of supporting cloud computing. Often, people organizations focus on one or two aspects of cloud computing (often virtualization and elastic computing), while not always taking into account the impact, requirements and dependencies on other areas of IT.

I started out by reviewing a mapping of the U.S. Government's Service Component Reference Model into the Cloud model. This diagram provides a good example of all the different areas that IT and business organizations need to consider when embarking on a cloud project. This diagram is quite effective to discuss interdependencies between different technologies and IT areas of expertise.

Cloud Storage as a co-requisite for Cloud Computing

After introducing the concept of cloud as a holistic IT practice, I spent some time focusing on the specific dependencies that cloud computing has on storage, and how cloud computing drives the need for cloud storage.

Cloud Computing
Emerging Storage
Virtual Appliances,
VM Image Mgnt &
Object Stores
Hybrid Clouds,
Distributed &
Multi-site Storage
Loosely CoupledDynamic
Simpler & Less Fragile
Scale-FreeElastic Scaling,
Billing on Usage
Tiering & Dynamic
SharedCo-HostingMulti-Tenancy &
Storage Security

The above table maps many of the business advantages and approaches for cloud computing to new requirements for storage that emerge as a result. Let's review these in detail:

Self-contained Packages

As the environment and resources for a given application or computing problem is packaged up and managed at the VM and application level, new requirements are created around managing these images and associating data with VM sessions and applications to allow them to migrate together, be snapshotted together, managed together, etc.

Thus, when you move to cloud computing, you need a new way to package up the data along with the applications, and ensure that they are self-contained.

Location Independence

These packages and data then must be able to migrate, both within the enterprise (such in DR and Business Continuity applications), and between organizations, when utilizing hybrid and public clouds. Once the stored data is packaged, this becomes easier, but often data movement must be more granular, as some data may need to remain within the organization, or may need to be spread across multiple clouds.

Thus, when you move to cloud computing, you need to make your data accessible from multiple locations, and ensure that it is consistent, complete and correct.

Loosely Coupled

One of the benefits of cloud computing is loose coupling between systems. This allows simple reconfiguration, enables the mixing of applications and infrastructure to quickly create new applications and update existing applications. The resulting collections of services are dynamically provisioned, and often do not involve people. In order to accomplish this, you need to ensure that all of the parts fit together, and can be controlled to dynamically assemble systems programatically.

Thus, when you move to cloud computing, you need to have simpler and less fragile interfaces to allow storage to be dynamically connected up to storage, as needed, when needed.


In order to quickly scale up and down cloud computing environments, one needs to be able to deploy applications and storage in a scale-free manner. Being able to dynamically create a thousand-node compute cluster is not of much use if there is not also storage infrastructure that can scale to support this cluster.

Thus, when you move to cloud computing, you need to ensure that your storage is also capable of scaling elastically, and is capable of tiering data so that it is available at the right cost and performance. Often, one of the first problems one runs into when deploying an elastic computing infrastructure is mismatches between computing and storage, and the cost of keeping all the data resulting from the cloud computing activities.

This is important: Cloud computing results in an explosion of data, and this data has to be tiered in order to stay economical.


While less of an issue with private clouds, multi-tenancy and support for multiple users on shared infrastructure is critical for leveraging many of the economic advantages resulting from resource pooling. However, this sharing brings requirements for partitioning and security to prevent unauthorized disclosure of data. When storage is shared, there must be strong assurances that information will continue to be protected.

Thus, when you move to cloud computing, especially in public and hybrid clouds, the placement and security of stored information must be carefully assessed to ensure that additional risks are not introduced.

More Industry Alignment on Object Storage

In a discussion on one of EMC's blog entries, The Future Doesn't have a File System, Paul Carpentier, of Centera fame, reiterated the need for an industry-wide, lightweight web-based standard for object-based storage access.

His initial thinking was as follows:

1. Unique identifiers; 128 bit, hex representation proposed
2. Object = immutable [Content + Metadata]; content is free format, metadata is free format, XML recommended
3. Simple access protocol; HTTP proposed; non-normative client libraries optional
4. READ and READ METADATA operation; (READ gets metadata and content)
5. WRITE and DELETE operation
6. Small set of standardized XML policy metadata constructs re service level, compliance, life cycle; TBD
7. Persisted Distributed Hash Table to allow variable identifier mapping; 128 bit to 128 bit; HTTP accessed

What is interesting is the degree to which this proposal is aligned with the work being done by the Storage Networking Industry Association in it's Cloud Storage Technical Working Group. This working group is creating a new standard call the Cloud Data Management Interface, which is intended to provide a standardized method for access and management of cloud data using a light-weight RESTful access method.

While the draft standard is not quite released to the public, let's take a quick peek at how it compares to Mr. Carpentier thoughts:

1. Unique identifiers; 128 bit, hex representation proposed

SNIA is proposing to use XAM XUIDs for identifiers, which allows vendors to innovate and define how their identifiers are comprised, while still ensuring global uniqueness and the ability for any object ID to be managed by any vendor's system.

While a basic 128-bit identifier, such as a UUID, is simpler, it does not provide strong guarantees that it will be unique across cloud vendors, and this is critical for emerging cloud models such as cloud migration, federation, peering and interchange.

2. Object = immutable [Content + Metadata]; content is free format, metadata is free format, XML recommended

While some vendors (such as Bycast) will implement the proposed standard by using immutable objects, the standard includes the optional ability to modify both object content and metadata for existing objects, without changing the object identifier.

Metadata will include both user-generated items and system-generated items, and will be represented using XML or JSON.

3. Simple access protocol; HTTP proposed; non-normative client libraries optional

SNIA is using RESTful principles and the HTTP protocol as a foundation for the standard, and simplicity is a key design goal. Almost every part of the standard is optional, and the client can discover what parts of the standard are supported by any given implementation.

Client libraries to provide simplified language mapping is anticipated, but the goal is to enable full use using standard HTTP libraries.

4. READ and READ METADATA operation; (READ gets metadata and content)

The HTTP GET and HEAD operation map to these functions.

5. WRITE and DELETE operation

The HTTP PUT and DELETE operations map to these functions.

If a cloud does not support mutable objects, then the cloud storage provider can indicate this to a client via the capabilities discovery interface, and any attempts to modify an existing object would fail.

6. Small set of standardized XML policy metadata constructs re service level, compliance, life cycle; TBD

SNIA is actively working on standardizing a set of "Data System Metadata", which allows a client to specify what level of service that it desires from a cloud. Examples include maximum latency, degree of replication, etc.

7. Persisted Distributed Hash Table to allow variable identifier mapping; 128 bit to 128 bit; HTTP accessed

This is outside of what the standard is proposing, but by using the included queue data object functionality, vendors can add functionality such as lookups and transformations. This allows extension by vendors in a standardized way, and allows them to take advantage of much of the common infrastructure provided by the standard.

In summary, I would encourage everyone who is interested in cloud storage or in the industry to take a look at the work that the SNIA is doing, and to get involved!


Dictionary of a Cloud Standard

Any cloud storage standard requires the consideration of many interrelated functional areas. Below is a comprehensive list of all the areas that must be considered in the standardization process, and any standard must choose the scope and degree to which these items are addressed.

As all of these subjects are inter-related, they are listed below in alphabetical order:

Accounts – A grouping of stored objects for the purposes of administrative control and billing purposes. Each object is owned by one or more account, and is billed to those accounts. Accounts may have sub-accounts, where content owned by a sub-account is rolled up to a higher-level account.

Accounts (Provisioning) – The interface by which administrative clients can create new accounts, modify account characteristics or remove accounts.

Audit – Detailed records of accesses and state changes to the storage system used for troubleshooting, forensic analysis and security analysis. Audit information should be accessible based on partition, account, client, audit types and other filters, ideally through a cloud API.

Client Authentication – The method by which the credentials presented by a client are verified and mapped to local identities used to determine permissions.

Client Authentication Delegation – The method of verifying and mapping credentials, where the processing is delegated to an external system or alternate cloud. (eg, AD, LDAP)

Client Interface Protocols – The method by which clients are able to access data stored in the cloud. Interface protocols are typically specific to how the client interacts with the data. For example, a file-system client would expect an interface protocol such as NFS or CIFS, where a database client would expect a SQL interface. Clients that interact with objects directly may use protocols such as XAM or RESTful HTTP object access.

Cloud Partitioning – The ability to take a single cloud and create multiple partitions that act as completely independent clouds while still using the same common cloud infrastructure.

Cloud Peering – The ability for one cloud to transparently reference and store objects in another cloud such that a client can transparently access content directly from either cloud.

Cloud Federation – The ability for one cloud to transparently reference and store objects in another cloud such that a client can transparently access content from the primary cloud.

Event Feeds – A client-accessible queue of events that occur against a subset of content (as defined by a query) that can be used for state synchronization between clouds, billing and other inter-system communication.

Introspection – The ability for a client to discover what services a given cloud is capable of performing, and what subset of these capabilities the client is allowed to use.

Metadata (User) – Arbitrary client-named key/value pairs and tag lists specified by a client that are stored along with an object.

Metadata (Data System) – Standardized named key/value pairs and tag lists specified by a client that indicate to the cloud how the object or objects should be managed. Examples include ACLs, encryption levels, degrees of replication or dispersion, and QoS goals.

Metadata (System) – Standardized named key/value pairs and tag lists generated by the cloud that indicate to the client properties of the object. examples include last access time, actual degree of replication (as opposed to requested degree of replication), and lock status.

Namespace (Local) – Each client or set of clients may see a different subset of the global namespace, or a client-specific namespace. Objects within the cloud may reside in one or more namespaces. By sharing objects across namespaces, different faceted views into the cloud can be created, and use cases such as sharing can be enabled.

Namespace (Global) – An administrator or suitably configured client may see all objects in the cloud within one namespace.

Object (Composite) – The ability for an object to contain other objects such that the collection of objects can be managed and stored together as a single object.

Object Identifiers – Each object stored within the cloud needs a globally unique identifier that stays with the object across its entire lifecycle. Ideally, these identifiers are unique across clouds, and are preserved across clouds, which enables federation, transparent migration and peering.

Object Locking – Clients may wish to be able to lock an object to prevent modification or access by other clients.

Object Reference Counting – Clients may wish to gain a form of a lock on an object that ensures that the object will remain in the cloud. Only when all of these references are released can the object be considered for removal.

Object Versioning – Clients may wish to have historical versions of objects be retained when an object is modified.

Object Permissions – Each object stored has a list of which entity is permitted to perform which action. This is typically called an Access Control List (ACL). These access controls specify basic and administrative operations can be performed.

Object Referencing – The ability to specify that a given entity within a namespace is an object in an alternate location within the namespace, in another namespace, or in another cloud, while allowing transparent client access (see cloud peering and federation).

Object Serialization – The ability for a client to take one or more objects and transform them into a single bitstream that can be used for inter-system interchange.

Object Snapshots – Clients may wish to be able to create snapshots of an object or set of objects, such that the state of the objects at that point in time can be accessed via an alternate location within a namespace.

Query – The ability to submit a series of criteria (for object content and/or metadata) and have returned (possibly as another static object) the list of objects that match the specified criteria.

Query (Persistent) – The ability to create queries that run continuously and dynamically update their results over time as the state of the cloud changes.

Usage Statistics (Client) – The ability to obtain information about how many operations have been performed by a given client over a given timeframe is required for accounting, billing and reporting purposes.

Usage Statistics (Object) – The ability to obtain information about how many operations have been performed against a given object or set of objects, and the operations that have been required to manage a given object or set of objects over a given timeframe for accounting, billing and reporting purposes.

These are all aspects that we have considered here at Bycast, and that we are contributing as part of our involvement with the SNIA Cloud Storage Technical Working Group.


Standardizing Cloud Storage

As the cloud storage concept matures and an increasing number of service and technology providers emerge in the market, there is a growing recognition of the need to standardize protocols for data access and storage management functions in a cloud storage environment.

The advantages of an open standard for cloud storage include:
  • Allowing a cloud storage client to interoperate with multiple providers
  • Enabling data portability between cloud providers
  • Facilitating common documentation, sample code and educational material
  • Allowing common test infrastructure and conformance testing
  • Reducing development work for cloud clients and providers
  • Reducing the complexity of standardized access libraries
  • Encouraging the creation of debugging tools for diagnostics, profiling and interaction analysis
With cloud computing and cloud storage being such a hot topic at this time, there are multiple initiatives underway attempting to standardize various components and interfaces within the cloud storage stack. And one of the leading initiatives is the cloud storage technical working group within the Storage Networking Industry Association.

A Cloud Storage Reference Model

In the last six months, we've been working on several deliverables, with the main work product being the creation of a standard reference model for the management and access to cloud storage resources.

In summary, here are the highlights of the working group's vision of cloud storage:


Management functions and data access are provided via a light-weight HTTP RESTful API. Object, block and database storage APIs co-exist with this core API, facilitating emerging cloud use cases and allowing continued innovation as new applications are moved into the cloud.

The HTTP API also facilitates discovery and introspection of provided API capabilities, allowing providers to support as little or as much of the API as they wish, and allowing clients to discover which capabilities are provided. This approach allows cloud vendors to provide additional capabilities (such as Nirvanix's media transcoding capabilities), and still being compatible with and leveraging the common functions of the API.

Containers and Data

Any data item stored in the cloud, including simple data streams to XAM objects, iSCSI LUNs, database tables and other data objects, can be accessed directly in a form that facilitates peering and transfer from cloud to cloud. As data items can be grouped together into named "containers", and containers can be nested, transferring aggregations of data items is as easy as transferring a container.

Likewise, management operations can be performed on containers, reducing the management complexity when compared to managing individual objects, and allowing changes to be performed atomically on sets of objects. Management properties (metadata) of data items can either be explicitly specified for a given data item, or can be inherited from the parent container.

A Vision for the Future

This simple set of principles allows for a powerful, extensible API that spans all classes of storage in the cloud. Bycast is proud to be participating as a primary contributor to this initiative, and I would encourage anyone with interest in this area to take the time to read the currently released documentation and to get involved at the SNIA cloud Google group.

SNIA is also holding a summer technical symposium in July, held in San Jose. At this event, one entire track is dedicated to cloud storage. If you are in the area, don't hesitate to get in touch with the SNIA to find out what it takes to get involved, and join us in this exciting project.

In the mean time, I'd encourage reading the current draft documentation that can be downloaded from the SNIA, as we're proud of what we've accomplished so far, and are excited about where we're going.


Where does Encryption Fit in the Cloud

Any analysis of the use of encryption in the cloud always needs to start with a discussion of the threats that the use of encryption technology is designed to reduce. In addition, often encryption alone is not the most important or tricky part of protecting against these threats. A commonly misunderstood aspect of encryption is that it somehow eliminates these risks — encryption just concentrates and separates these risks from the data itself, by moving security to the encryption keys. These keys must then be securely managed and protected against the original threats that encryption was deployed to reduce.

The appropriate use of encryption in the storage stack depends on the business requirements, risks and storage technologies used. For example, while entire drive or tape encryption may be useful in shipping a disk or tape from one branch office to another, it is not appropriate for multi-user document sharing on a NAS, or in any other scenario where the access granularity is at a different level from that of the physical media.

The Top Three Threats

For the sake of this discussion, let us assume that for cloud storage, the most significant three threats are as follows. While this list is not complete, it does include the most common threats considered in the cloud storage space:
  • Unauthorized disclosure due to cloud customer operations
  • Unauthorized disclosure due to cloud provider operations
  • Unauthorized disclosure due to transport eavesdropping
1. The Insider

Despite the focus on exotic and headline-grabbing threats to computer security, the most common form of unauthorized data disclosure is from employees or other authorized individuals within the companies or organizations that generate and use the data. Often these employees even have legitimate access to the data, which is then used in an unauthorized manner, or weak access controls are bypassed to gain access to the data.

In this threat model, encryption within the storage system does not and cannot protect the data. The only approach to protect the data is to store the data in an encrypted format before it is made available to internal end-users. Examples of this include encrypted password-protected PDFs and various digital rights management schemes for media.

Such systems often require positive verification back to a network server before access is permitted, and thus have the trade-off of being complex, costly, and ultimately easily bypassed by taking screen shots or re-recording the protected content. One can only look at the lack of success of digital rights management systems to stem the piracy of digital media to understand that a determined attacker with legitimate access to the content to be protected is an almost intractable problem.

Once again, it is worth emphasizing — Encryption will not protect your corporate data against an insider, and from a security risks standpoint, this is the most probable means of loss.

2. The Provider

Let us assume that a cloud storage system has been selected, and corporate data is being sent outside of the organization's security perimeter (these same risks are present with an internal, or private cloud, as a result of the above risk category). Once data has been stored in the cloud, there are many opportunities for a cloud provider to inadvertently or deliberately cause unauthorized disclosure of a customer's data. These can range from poorly configured firewalls, unauthorized or compromised devices on internal networks, disgruntled employees, or even bankruptcy of the provider where their assets are sold off, along with customer's data, to the highest bidder.

With this threat model, the encryption of the customer's data is a good technological countermeasure that can ensure that while sitting at rest within the cloud provider storage equipment, the data cannot be accessed except by the customer.

Now, an important wrinkle to be aware of is that cloud storage has two different ways which encryption can be architected: Blind, or Transparent.

In blind cloud storage, data is encrypted at the cloud customer's premises, and the cloud provider has no visibility into the data. They can claim no knowledge of the data being stored, and have no way to access it, since the keys are held only by the customer. While this can be a significant advantage to the cloud provider for liability reasons, it also prevents them from building any value-added services into their cloud that require access to the customer's data, and forces all data accesses to go through customer equipment before the data can be accessed.

In transparent cloud storage, data can still be encrypted, but the cloud provider must also have access to either the customer's keys or a second set of keys that allow the provider to access the customer's data. This can be used to provide value-added services such as full-content search, retention management, format conversion, and other capabilities that require the ability to read the customer's data.

From a strict security standpoint, blind cloud storage is more secure. However, with the judicious management of encryption keys, many of the threats mentioned above can be avoided, even if the provider's systems can access the plaintext.

Ultimately, if an organization is giving anyone their plaintext, or the ability to access their plaintext, they need to be sure that the organization has sufficient operational safeguards to protect against common threats to data security and disclosure.

3. The Network

Finally, disclosure during transport is an example of where encryption is virtually mandatory. Any time data is transported across an untrusted or uncontrolled network, such as the Internet, it must be encrypted, and fortunately, there are widely deployed standards, such as TLS, that are commonly used to perform this function. Cloud storage services that uses raw HTTP should only be used if the data being sent is not of concern if disclosed, if the network is completely secured, or if the data is already encrypted (blind cloud storage).

This touches on some of the issues related to the use of encryption in cloud storage, and as can be imagined, there are significant complexities related to key management that make implementations that balance usability and security quite challenging.


Object Security, Continued

Reader yossib left a comment to the previous blog entry, Cloud Storage - Part 5, Security, that warranted a more detailed response and discussion:

I enjoyed reading your article, your focus on the issue of user authentication and access control is important as it surely does not get the attention it deserves.

Do you see the security model and user access management for object storage evolving from current ACLs, Active Directory/LDAP or taking a different direction

How do you see the concepts of users and groups evolving?


Authentication of identity is so critical, because it is the foundation of access control, and as you alluded, deserves far more attention than it gets. Fortunately, the rise of a plethora of services on the Internet is forcing the issue of federated identity management, and while systems are not yet mature, there is a strong trend towards common mechanisms by which a user or computer program can have a universal identity that can cross systems.

Examples of emerging standards include OpenID, and Sun's IDM.

On Active Directory

Active Directory, while hugely successful and very valuable in a corporate setting, simply was not designed to accommodate the scale that is needed, nor the timeframes over which identities need to persist. As digital data and archives become core to our civilization, we need ways to ensure that the security of digital data can survive hundreds of years, and things that were often disregarded as "edge cases" must come to the front and centre.

Examples include:
  • What happens when someone dies?
  • What happens when someone gets subpoenaed?
  • What about the expiration of statutory rights?
  • What if the law determining the length of statutory rights is changed?
These and so many more issues make the protection of digital assets a double-edged sword — If we enshrine given restrictions in code, can we change them? And if we can change them, how can we prevent this from being defeating the original point of the protections.

And this ignores many of the challenges that are emerging from the loss of centralized control of systems. In emerging federated cloud worlds, objects may pass from system to system, both trusted and untrusted, and security must be preserved. Much of the challenges associated with the work done to try to build DRM systems is directly applicable to trusted repositories and archives, and the research tells us that this is a really hard problem.

For example, it is still an open debate if it is actually possible to have one user grant a second user access without this enabling that user to grant access to further users. And revoking access can be even more thorny.

Ultimately, we need to move away from the centrally enforced security models to a more distributed security model where objects can float around in systems that do not need to be trusted, and access is granted based on trust relationships. (An example of this is that you may grant an online search and indexing company the privileges to read your data, based on your trust that they will not disclose your data).


While ACLs have developed a reputation of being far too complex to be manageable, I believe that when tempered with methodologies such as Role Based Access Control, it can be made far simpler for the end user and application developer than it is right now.

However, ACLs fundamentally are merely advisory guidelines for a "trusted" system that interprets them to restrict access. ACLs need to evolve to the point where you have "grants" for each privilage, that enable you to perform that action. So if I wanted to share an object with you, I would give you a "grant" that gives you the ability to read a given object or set of objects. This grant could be revoked, and I could engineer it in such a way that you couldn't delegate the grant without revealing your own credentials.

Ultimately, this involves a much more complex multi-actor interaction, and my gut feel is that we can't do this with static objects. This, of course, would mean that revocation of grants could never really be absolute, (unless they expire, but who enforces that, then?) since you can't always ensure that all replicas of a given object are always kept in sync.

Finally, if these systems grow too complex, they won't work. There is much to be said for simplicity, especially in global-scale systems.

Users and Groups

This is always an interesting discussion — Groups provide such a valuable level of abstraction, but introduce so much complexity. I tend to lean towards abandoning the concept of groups as first class entities. If we just have users, we can create a user that is trusted to act as a delegate on behalf of other users. As long as one user can be granted the authority to delegate privileges to other users, we get the same functionality, and distributed group membership can be re-cast as a trust relationship between the owner and the delegator.

My feeling is that this is the only model that will scale.

Much to Consider

This is just the tip of the proverbial iceberg, and there are so many additional complexities and challenges associated with security. I'd love to continue this discussion, so if you have any questions, comments or ideas, please don't hesitate to comment.

Also, as I mentioned on my last twitter, there are many other security-related items that I plan to discuss further in a follow-up blog post, covering user identify federation, trust domains, "blind storage", peering, object destruction and more.


Object Storage, Part 5 - Security

In the days when storage was directly connected to computers and there was only one user per computer, security was simple — Just physically secure the computer and attached storage. But fast-forwarding to the Internet age, not only is storage networked, but potentially accessible to every user in the enterprise or Internet. As storage migrates from silos hidden behind computing servers to being a first-class peer on computer networks, security rapidly becomes front and centre as a key requirement, not just to protect (deny), but also to facilitate multi-application and user collaboration and sharing (allow).

As part five of the object storage series of posts, this entry covers the issues related with security in an object storage system. This extends far beyond just simple access controls, such as security issues related to the search functionality discussed in the last entry, Object Storage - Query.

Who's that Looking at my Data?

Fundamentally, security is about controlling the flow of information. Like with any storage system, there is information flow out from the system, and information flow into the system (read and write, in the case of a block device). But unlike a block device, which can only restrict read or write operations on a device or block level, an object storage system has a much richer set of information flows that need to be regulated.

Take query, for example. Information leakage through a query result set or index would be a significant problem, and access controls on each object must extend to the query results. In some environments, even timing matters, as variations in the time required to return a query may allow a user to determine if an object with a given metadata value exists or not, even if they do not have privileges to see that object.

Like all forms of access control, one first needs to authenticate the entity that is requesting access. Once you have determined who is talking with you, then the system can proceed to the question of what they are allowed to do. Then and only then can you proceed to perform the operation.

Expressed in English, this takes the following form:

Entity "X" is requesting to perform operation "Y" against object "Z".

Some examples include,

Entity "XYZCorp\Archiver" is requesting to perform operation "Modify" against object "8D73F687BA26C1A03F9D8E796A497338/com.bycast.metadata.lastmodifiedtime"

Entity "XYZCorp\Admin" is requesting to perform operation "Delete" against object "Financial Storage Archive Container"

Like what we saw with Implicit and Explicit Policies for managing storage placement, retention and others, this same model extends to security. A list of who is allowed to perform what operations on a given object can either be explicitly specified for a given object (this is called an ACL, or Access Control List, in the file world), can be implicitly specified for a given group of objects (inherited ACLs), or can be implicitly specified for a given user.

As one can imagine, these security models can become quite complex, especially when you start combining all three together. For example, if objects are stored into a container that specifies that only one user can access the objects, if the application explicitly specifies that another user can access the object, what is the correct behaviour? Should the explicit specification can override the implicit specification, or should the implicit specification can override the explicit. And both are valid, depending on the use case.

For example, in the Windows world, the "Backup Operator" must be capable of accessing all objects, regardless of their ACLs. This is an example of an implicit security policy overriding explicit security policies. In other cases, if an application wishes to explicitly share some objects with a second application, but by default, all objects should be managed by an implicit security policy, we have an example of an explicit security policy overriding an implicit security policy.

Policy Contexts

As most object storage systems are built around a flat namespace, implicit security policies apply against objects that match a set of metadata criteria. For example, the policy may say "For all objects where "com.bycast.metadata.creator" equals "XYZCorp\dslik" ...".

When logical containers are supported, they can be implemented as a special metadata item, such as where com.bycast.metadata.container having the value of "/corporate/financial" would express a subcontainer "financial" in a container named "corporate". As implicit policies can include which container they are applying to, this allows security policies to be restricted to a given container, or set of containers.

Thus, we end up with three different contexts:
  1. Explicit security policies, included with an object, that indicate who can do what.
  2. Implicit security policies, specifying for which objects they apply to, that indicate who can do what.
  3. Implicit security policies, specifying for which users they apply to, that indicate what the users can do what.
And this leads us to the "What":

Oh, The Things We Can Do

How does the operations that can be performed for object storage compare to file and block storage approaches? Well, in a summary, there's a lot more you can do — Below is a partial list of the actions that one can perform against a stored object:
  • Create an object
  • Destroy an object
  • Discover an object's existence
  • List an object's contents
  • Add a new metadata item to an object
  • Remove an existing metadata item from an object
  • Read the value of a object metadata item
  • Write a value to an object metadata item
  • Add a new data stream to an object
  • Remove an existing data stream from an object
  • Read the value of a object data stream
  • Write a value to an object data stream
  • Query for the existence of a named metadata item
  • Query for the existence of a named data stream
  • Query for the contents of a named metadata item
  • Query for the contents of a named data stream
These actions can becomes even more complex when you consider that some object storage systems allow read-only metadata items and streams, immutable metadata items and streams, or increment-only metadata items.

Unlike with file systems, which have evolved to a relatively standardized set of operations that are specified in ACLs, the security models for object storage operations is not yet well understood and standardized. Open questions include how privileges are overlaid onto the contents of objects, and how special behaviours, such as increment-only for a retention metadata item are handled.

This is one of the areas where the SNIA XAM working group has done excellent work, and I would encourage anyone interested the details of how security models map onto object storage to read the XAM Architectural Specification.


Cloud Storage Standardization, Part 1 - Why a Standard?

As cloud storage matures and use increases, there is a strong need for standardized interfaces for performing basic storage operations. Just as the SCSI standard enabled interoperability and facilitated innovation within the directly-attached storage market, the presence of a standard interface for cloud storage will provide many advantages. These include:
  • Improving quality by allowing standardized conformance testing
  • Creating a market for test, validation, profiling, debugging and analysis tools
  • Allowing the creation of standardized documentation
  • Encouraging the publication of articles and books discussing the standard
  • Reducing development work required to use cloud storage and to support multiple providers
  • Enabling the creation of standardized access libraries
  • Reducing customer lock-in and enabling multi-vendor selection
At Bycast, we provide a RESTful HTTP API for object storage access, and next month, I will be presenting our HTTP API at the SNIA cloud storage summit. As part of my preparations, I've been reviewing many of the other HTTP storage APIs used by other industry players, such as Amazon, Microsoft, and Nirvanix. There is significant overlap and commonalities between all of these APIs, and I believe that there is a good chance, at least from a technical standpoint, to create a common lightweight HTTP storage access protocol.


Just what is "Cloud Computing"?

As cloud computing increases in prominence, the argument over exactly what cloud computing actually is has also grown in proportions. Never before has a buzzword been so truly nebulous.

So, with the definition in such dispute, this is the perfect time to throw out yet another definition.

With that said, my definition of cloud computing is: *drum roll*

"Distributed location-independent scale-free cooperative agents"

"What?", one might say... "That's nothing like the others I've seen." And they would be right. That's exactly the point. This is my technical definition of cloud computing, and let's pick it apart, piece by piece:


Distributed is part of almost every cloud definition, but even the word "distributed" has different meanings to different people. For me, distributed signifies the presence of a substantial separation between entities. This definition implies several things:
  1. That there are well-defined entities that are distinct from each other
  2. That there is a method of communication between entities that is separate from the entities themselves
  3. That there is some degree of isolation between entities
This is broad definition, but one that I would argue that most technical people would agree with. This would mean that a client-server system is a distributed system, as is a telephone exchange or an interacting web server and client. However, a web server alone would not be considered distributed, nor would the Linux kernel.

This definition of distributed is about logical entities and their interactions, so it does not matter if the entities are all co-resident on a single computing system, or scattered across multiple computing systems. That, in my books, is more related to the the property of location independence.


What separates your typical computer operating system from what is typically known in the academic world as a distributed operating system is the property of location independence. This is what allows the physical location of a given entity (be it a program or state) to not matter to its correct operation. More specifically, location independent entities have the ability to transparently move between locations without affecting system operation.

Location independence is a key enabling technology. For example, in the Internet, DNS and IP routing provides location independence between Internet clients and servers. This allows the servers to be moved around without the clients having to know their physical location, or that there even was a change.

When systems are built out of location-independent entities, they can be run and stored on different physical configurations. This lays the foundation for designing scale-free systems.


In most system implementations, the chosen algorithms and architectures only work efficiently at a given scale. As the size and scope of a system is increased or decreased, bottlenecks, inefficiencies, and waste result in diminishing returns, and put limits on how big and how small a system can be. Systems designed for portable embedded devices must be designed very differently than systems designed for the worlds largest supercomputers, and rightly so.

However, a new class of computing systems has emerged that has the property of being scale-free. This means that regardless of the scale of their deployment, they are able to continue to perform their task efficiently. From an algorithms standpoint, this means that the computation required scales linearly as a function of the load on the system.

Thus, when you have scale-free location-independent systems, you can scale just by adding more servers, networks and storage devices. This is what the all of the largest (at least in terms of computational load) web companies, such as facebook, flickr, twitter and friends, all aspire to.

Cooperative Agents

Web services counts. SoA counts. In fact, most message-passing distributed systems are based around the concept of cooperating agents working together to solve a problem.

If the agents aren't cooperating, or they aren't agents, then they're not a cloud to me.

What isn't in my Definition of Cloud

It is also interesting to look at what is not in my definition of cloud. The first thing that one may notice when comparing it to most other definitions is the complete lack of business terms. That's because Cloud Computing is a technology, not a business model. There's a reason why previous attempts at cloud-based business models were called "Software as a Service", and "Application Service Providers". That's because these terms describe business models, as opposed to technological models.

Much of my annoyance with the "Cloud" as a buzzword would go away if marketing and businesses just said that they have a "Cloud business model", which provides Internet-based delivery of their application or service and leveraged cloud computing techniques.

In fact, one could arguably develop and deliver such a business model without using any cloud technology. A good example of this is the credit card processing networks, which is mainframe based, largely centralized, and very specifically designed around a given scale, with an implementation that requires very specific location bindings for entities. Yet, it provides what is arguably from a business standpoint, a cloud service.


The Fight over Cloud

As far as buzzwords goes, "Cloud" is a pretty good one. I've complained about this one in the past, and cloud is a very annoying term to me precisely because of its lack of firm definition, vapidness, and that it means different things to different people.

Now that the much debated "Cloud Manifesto" has been leaked to the web, there's a little more to chew on. The war over the definition of what cloud means has begun.

Below is my analysis of what annoys me about this document, ignoring for now the politics associated with its creation and distribution.

The author(s?) of this "manifesto" say that the document "does not intend to define a final taxonomy of cloud computing", yet that is exactly what they have ended up doing. And given that their definition reads more like a advertisement for outsourcing VMs, this is at odds of my personal view of cloud as a general architecture for building distributed systems.

To me, none of their listed criteria, either independently or in combination, make something "cloud", nor does being "cloud" imply the existence of any of these proposed criteria.

So, with that said, let's take a more detailed look at these "key characteristics of the cloud":

Scalability On Demand

While this is value that can be offered by a cloud, there are lots of non-cloud systems that provide exactly this, and you can have a cloud that does not provide scalability on demand.

For example, IBM has been offering mainframe systems with extra processors that you can pay to use, "on demand". I wouldn't classify a zSeries a cloud.

Streamlining the Data Centre 

As "streamlining" is an ambiguous word, we'll assume that the authors mean outsourcing or cost reductions. However, not all uses of cloud will result in the reduction of cost (capital or infrastructure) or moving work outside the data centre.

Improving Business Processes

Technological systems are more often than not orthogonal to business process improvement. One can deploy cloud systems and end up with a worse business process, and one can improve business processes without deploying cloud technology.

Minimizing Startup Costs

Fractional allocation to reduce the minimal quanta that must be purchased to do useful work is the closest to a acceptable criteria for cloud, but is a VM server a cloud, then? Also, one can deploy a cloud system that does not support fractional allocation.

All Together Now?

Depending on your definition of what cloud is, you could create a cloud system that provides fixed capacity processing, increases data centre costs and brings additional work into the data centre, makes no changes to business processes, and requires large startup costs.

Conversely, you could create a system that has variable on-demand capacity, reduces data centre costs, outsources data centre work, improves business processes, and minimizes startup costs, all without it being a cloud.

Of course, the linchpin of this entire argument is just what a cloud is, and this is why this manifesto matters. It is the first major attempt to put a stick in the sand and say that Cloud is X, Y and Z.

And that is why is has generated such a storm of controversy. At stake is who gets the first mover advantage in the struggle to define what exactly a cloud is.