2009-06-30

Standardizing Cloud Storage

As the cloud storage concept matures and an increasing number of service and technology providers emerge in the market, there is a growing recognition of the need to standardize protocols for data access and storage management functions in a cloud storage environment.

The advantages of an open standard for cloud storage include:
  • Allowing a cloud storage client to interoperate with multiple providers
  • Enabling data portability between cloud providers
  • Facilitating common documentation, sample code and educational material
  • Allowing common test infrastructure and conformance testing
  • Reducing development work for cloud clients and providers
  • Reducing the complexity of standardized access libraries
  • Encouraging the creation of debugging tools for diagnostics, profiling and interaction analysis
With cloud computing and cloud storage being such a hot topic at this time, there are multiple initiatives underway attempting to standardize various components and interfaces within the cloud storage stack. And one of the leading initiatives is the cloud storage technical working group within the Storage Networking Industry Association.

A Cloud Storage Reference Model

In the last six months, we've been working on several deliverables, with the main work product being the creation of a standard reference model for the management and access to cloud storage resources.

In summary, here are the highlights of the working group's vision of cloud storage:

An HTTP API

Management functions and data access are provided via a light-weight HTTP RESTful API. Object, block and database storage APIs co-exist with this core API, facilitating emerging cloud use cases and allowing continued innovation as new applications are moved into the cloud.

The HTTP API also facilitates discovery and introspection of provided API capabilities, allowing providers to support as little or as much of the API as they wish, and allowing clients to discover which capabilities are provided. This approach allows cloud vendors to provide additional capabilities (such as Nirvanix's media transcoding capabilities), and still being compatible with and leveraging the common functions of the API.

Containers and Data

Any data item stored in the cloud, including simple data streams to XAM objects, iSCSI LUNs, database tables and other data objects, can be accessed directly in a form that facilitates peering and transfer from cloud to cloud. As data items can be grouped together into named "containers", and containers can be nested, transferring aggregations of data items is as easy as transferring a container.

Likewise, management operations can be performed on containers, reducing the management complexity when compared to managing individual objects, and allowing changes to be performed atomically on sets of objects. Management properties (metadata) of data items can either be explicitly specified for a given data item, or can be inherited from the parent container.

A Vision for the Future

This simple set of principles allows for a powerful, extensible API that spans all classes of storage in the cloud. Bycast is proud to be participating as a primary contributor to this initiative, and I would encourage anyone with interest in this area to take the time to read the currently released documentation and to get involved at the SNIA cloud Google group.

SNIA is also holding a summer technical symposium in July, held in San Jose. At this event, one entire track is dedicated to cloud storage. If you are in the area, don't hesitate to get in touch with the SNIA to find out what it takes to get involved, and join us in this exciting project.

In the mean time, I'd encourage reading the current draft documentation that can be downloaded from the SNIA, as we're proud of what we've accomplished so far, and are excited about where we're going.

2009-06-29

Where does Encryption Fit in the Cloud

Any analysis of the use of encryption in the cloud always needs to start with a discussion of the threats that the use of encryption technology is designed to reduce. In addition, often encryption alone is not the most important or tricky part of protecting against these threats. A commonly misunderstood aspect of encryption is that it somehow eliminates these risks — encryption just concentrates and separates these risks from the data itself, by moving security to the encryption keys. These keys must then be securely managed and protected against the original threats that encryption was deployed to reduce.

The appropriate use of encryption in the storage stack depends on the business requirements, risks and storage technologies used. For example, while entire drive or tape encryption may be useful in shipping a disk or tape from one branch office to another, it is not appropriate for multi-user document sharing on a NAS, or in any other scenario where the access granularity is at a different level from that of the physical media.

The Top Three Threats

For the sake of this discussion, let us assume that for cloud storage, the most significant three threats are as follows. While this list is not complete, it does include the most common threats considered in the cloud storage space:
  • Unauthorized disclosure due to cloud customer operations
  • Unauthorized disclosure due to cloud provider operations
  • Unauthorized disclosure due to transport eavesdropping
1. The Insider

Despite the focus on exotic and headline-grabbing threats to computer security, the most common form of unauthorized data disclosure is from employees or other authorized individuals within the companies or organizations that generate and use the data. Often these employees even have legitimate access to the data, which is then used in an unauthorized manner, or weak access controls are bypassed to gain access to the data.

In this threat model, encryption within the storage system does not and cannot protect the data. The only approach to protect the data is to store the data in an encrypted format before it is made available to internal end-users. Examples of this include encrypted password-protected PDFs and various digital rights management schemes for media.

Such systems often require positive verification back to a network server before access is permitted, and thus have the trade-off of being complex, costly, and ultimately easily bypassed by taking screen shots or re-recording the protected content. One can only look at the lack of success of digital rights management systems to stem the piracy of digital media to understand that a determined attacker with legitimate access to the content to be protected is an almost intractable problem.

Once again, it is worth emphasizing — Encryption will not protect your corporate data against an insider, and from a security risks standpoint, this is the most probable means of loss.

2. The Provider

Let us assume that a cloud storage system has been selected, and corporate data is being sent outside of the organization's security perimeter (these same risks are present with an internal, or private cloud, as a result of the above risk category). Once data has been stored in the cloud, there are many opportunities for a cloud provider to inadvertently or deliberately cause unauthorized disclosure of a customer's data. These can range from poorly configured firewalls, unauthorized or compromised devices on internal networks, disgruntled employees, or even bankruptcy of the provider where their assets are sold off, along with customer's data, to the highest bidder.

With this threat model, the encryption of the customer's data is a good technological countermeasure that can ensure that while sitting at rest within the cloud provider storage equipment, the data cannot be accessed except by the customer.

Now, an important wrinkle to be aware of is that cloud storage has two different ways which encryption can be architected: Blind, or Transparent.

In blind cloud storage, data is encrypted at the cloud customer's premises, and the cloud provider has no visibility into the data. They can claim no knowledge of the data being stored, and have no way to access it, since the keys are held only by the customer. While this can be a significant advantage to the cloud provider for liability reasons, it also prevents them from building any value-added services into their cloud that require access to the customer's data, and forces all data accesses to go through customer equipment before the data can be accessed.

In transparent cloud storage, data can still be encrypted, but the cloud provider must also have access to either the customer's keys or a second set of keys that allow the provider to access the customer's data. This can be used to provide value-added services such as full-content search, retention management, format conversion, and other capabilities that require the ability to read the customer's data.

From a strict security standpoint, blind cloud storage is more secure. However, with the judicious management of encryption keys, many of the threats mentioned above can be avoided, even if the provider's systems can access the plaintext.

Ultimately, if an organization is giving anyone their plaintext, or the ability to access their plaintext, they need to be sure that the organization has sufficient operational safeguards to protect against common threats to data security and disclosure.

3. The Network

Finally, disclosure during transport is an example of where encryption is virtually mandatory. Any time data is transported across an untrusted or uncontrolled network, such as the Internet, it must be encrypted, and fortunately, there are widely deployed standards, such as TLS, that are commonly used to perform this function. Cloud storage services that uses raw HTTP should only be used if the data being sent is not of concern if disclosed, if the network is completely secured, or if the data is already encrypted (blind cloud storage).

This touches on some of the issues related to the use of encryption in cloud storage, and as can be imagined, there are significant complexities related to key management that make implementations that balance usability and security quite challenging.