Vanishing Into the Infrastructure

2009-10-19

Queues in a Cloud

Queues are extremely powerful approaches to data storage and data exchange that are commonly used in application programming, especially when components of a program are executed in parallel. By supporting queues, cloud storage systems provide safe and efficient mechanisms by which programs can persist their state, communicate between components, and interact with other programs and systems.

What is a Queue

A Queue is an object with zero or more data "records", such that only the oldest item is accessed at any given time. Once the oldest item is removed, the second-oldest item is accessed, and so forth until the queue is empty. This provides what is known as "first-in, first-out" access to data.

In a standard object, when you update data, you update the only record in the object. But in a queue, this creates a new record. When you read the value of an object, you get the only record in the object, but in a queue, this returns the value of the oldest record. And when you delete the value of an object, you delete the object, but in a queue, you delete only the oldest record. Thus, unlike basic objects, updating and deleting records from a queue are no longer idempotent operations.

How are Queues Typically Used

Queues are typically used in two different situations:

Internal Work List - Inside a program, the program will store a queue of data that needs to be processed in a given order, such that as resources become available, the items from the queue can be accessed in order. In this use, the queue is being used to store state.

Inter-Process Communication - Between two programs, a queue will contain a list of items that need to be communicated from one program to another. As the sender program encounters data that needs to be sent to the receiver program, it enqueues the items into the queue, and as the receiver program is able to process data from the sender program, it dequeues items from the queue. In this use, the queue is being used to exchange state.

As a quick aside, TCP/IP is an example of a queue used for information exchange between two systems. When you write data to a TCP/IP connection, you are enqueueing the data for delivery, and when the destination application reads from the TCP/IP connection, it is dequeuing the data from the network abstraction.

These uses are best illustrated through an example. Let's say that we are designing a book scanning system that runs in the cloud. We have a series of image scanners that digitize pages of books, and we have an OCR program that converts the images into text that can be indexed for search.

A simple implementation would be to scan a page, OCR it, then move on to the next page, but that doesn't meet all of the criteria of a cloud solution, as we can't scale it. A good solution would be able to handle multiple scanners, and have multiple instances of the OCR process running in parallel. And that calls for queues.

Handling Multiple Writers

A queue can be used to aggregate data values from multiple writers into a single ordered set of values. This behaviour is perfect for logging, job aggregation and any other situation where data originating from multiple entities needs to be consumed by a single entity.

For example, in our book scanning example, a typical scanning workflow may have tens to hundreds of scanners running concurrently, where each scanned page needs to be run through an OCR (Optical Character Recognition) process in order to index the contents of each scanned page.

A cloud application built around queues is easily scaled by running multiple parallel instances of the scanning application, with each of the instances using a common cloud for storage. If all of these instances store scanned pages into the same queue, the interface to the OCR process (the queue) is the same regardless of the number of writers.

The logic for the scanning process would look something like this:

1. Scan Page
2. Write image as object into cloud
3. Enqueue image object ID into cloud
4. Repeat

The logic for the OCR process would look something like this:

1. Read from cloud queue to get the object ID of the next image to process
2. Read the image from cloud using the object ID
3. Perform OCR processing
4. Add OCR text to Index
5. Delete item from cloud queue
6. Repeat

Now, we can arbitrarily scale the number of scanning processes. But what if our OCR takes too long to handle the combined workload of all of these scanning processes?

Handling Multiple Readers

If our OCR processing takes far longer than the time required to scan a page, we need to be able to increase the performance of the OCR processing. And when we add multiple parallel page scanners, things get even worse. We could try to make a faster OCR engine, but it's far easier to be able to scale out the OCR processing by running multiple instances of the OCR processing in parallel.

In this case, we need to be able to have multiple readers of the queue. And, we need to ensure the following characteristics of our solution:

1. No two queue readers will get the same item
2. No items will be lost, even in the event of a queue reader failure

If you just run two queue readers in parallel, both of these situations can occur. If the two readers run lock-step, they will get the same item. And if they both delete, the second deleted item will be lost.

Thus, we need to introduce another concept in order to maintain these characteristics — the ability to atomically transfer an item from one queue to another. With this capability, our OCR process can be modified to ensure that even with multiple readers, no data is lost or processed twice:

1. Transfer item from cloud queue to worker queue
2. Read from worker queue to get the object ID of the next image to process
3. Read the image from cloud using the object ID
4. Perform OCR processing
5. Add OCR text to Index
6. Delete item from worker queue
7. Repeat

Using this approach, you can arbitrarily scale the number of reader processes without modification. And these workers can enqueue their results into a common queue, allowing the results to be recombined for further processing.

By leveraging queues, cloud storage allows the creation of complex reliable workflows that can be scaled arbitrarily and dynamically. It also facilitates the creation of loosely coupled reliable systems of interacting programs that work together to solve a given problem in a flexible and scalable manner.

2009-09-30

The Worst Case Scenario

What do you do when the unthinkable happens, and you lose your entire primary site of operations, including your local cloud storage? It could be a flood, a hurricane, or something as commonplace as a building fire, but it's happened — and now all of your infrastructure and storage media associated with your archive is gone.

But life continues on, and so does your business. Assuming your archive is a critical part of your workflow and corporate assets, how can you protect against these major disruptions?

To the right is a simplified flowchart to help determine when you should protect against site-loss scenarios, and what options are available to prevent loss of archived data and ensure continued access.

Click on the image to enlarge.

Single-Site Archives

If all of the equipment and storage associated with your archive is located at a single site, and that site is lost, unless you have some form of off-site storage, all data stored in the archive will be lost.

While a multi-site archive is the best solution to avoid data loss in this scenario, for cost-sensitive data where rapid restoration of data access and operations is not required, a multi-site archive may be more expensive than is warranted. In this case, two common options are to create two tape copies and have them vaulted off-site, or to store a copy of the data into a public cloud provider, such as Amazon, Iron Mountain, or Diomede Storage.

Vaulting to Tape

Storing to tape, and vaulting the tapes off-site is the least expensive option for protecting archived data. In this scenario, a node would be added to the single-site archive that creates two identical tapes containing recently archived data. These tapes are then sent off-site.

The ability for an archive to be restored from the "bare metal", from only the data objects being archived, is a very important feature of an archival system. This ensures that even if the control databases are lost, the archived data can still be accessed, and the archive can be rebuilt.

When planning a tape vaulting approach, the frequency that these tapes are created determines how much data is at risk of loss in the event of the loss of the site. For example, if tapes are generated every week, and take two business days worst case to be taken off-site, then the business can have an exposure window of up to twelve calendar days.

In the event of the catastrophic loss of the primary site, these tapes would have to be recalled from the vaulting provider, which can take some time, and hardware would have to be re-acquired to rebuild the archive. Don't underestimate the amount of time required to re-order hardware. Often the original equipment is no longer available, so a new archive will need to be specified and ordered, and can take weeks for the servers to be assembled and shipped.

Once the tapes have arrived and the hardware has been set up, the archive is rebuilt from the data stored on the tapes, and once the last tape is processed, the archive is ready to be used. This is known as a "bare-metal restore".

Of course, depending on the size of the archive, this could take a very long time. An 1 PB media archive would take 115 days to restore when running at a restore load of 800 Mbits/s, and a 10 billion object e-mail archive would take 115 days to restore when running at a restore load of 1000 objects per second. Rebuild times must be taken into account when planing for archive restoration, and often the cost of downtime associated such a restore is high enough that cloud or multi-site options are considered instead.

Storing to the Cloud (Hybrid Cloud)

Another option for single-site archives is to store a copy of the archived data to a cloud storage provider. This eliminates the headaches associated with tape management, but introduces the requirement for network connectivity to the provider. In this scenario, for each archived object to be protected off-site, the object is also stored to a cloud provider, which retains the data in the event that a restore is needed.

Unlike with tape archiving, data is stored immediately to the cloud, limited only by WAN bandwidth. However, this limitation can be substantial, and when bandwidth is insufficient, data will be at risk until the backlog clears. If data is being stored to the archive at 100 Mbits/s, an OC-3 class Internet connection would be required, which can be far more expensive than sending twenty tapes out each week.

In the event of the catastrophic loss of the primary site, hardware would be re-acquired to rebuild the archive, and network connectivity would need to be acquired to allow connectivity to the cloud. When both of these are operational, the archive would be reconnected to the cloud. This would restore access to the archived data, albeit limited by the network bandwidth. Over time, the on-site archived data can be then restored back over the WAN.

The primary disadvantages of this approach are the time required to get the hardware and network access for restoring the on-site component of the archive, with the second disadvantage being cost. Fears about unauthorized disclosure of data and loss of control over data are also common, though they can be mitigated with the appropriate use of encryption.

And often, for less than the price charged by most public cloud providers, one can afford to create a multi-site archive, either across multiple premises owned by the business, or into a second premise hosted by a third party.

Why not just use the Cloud?

Some public cloud providers encourage an architecture that has a minimal on-site presence, and stores all data off-site in the cloud. For some scenarios, this approach works very well, as it minimizes capital costs and minimizes the time required restore hardware and access in the event of a disaster. However, one must have sufficient WAN bandwidth for the expected store and retrieve loads (as opposed to just store-only traffic loads when using the cloud as a storage target), and in the event of a network connectivity failure, access to most or all of the archive can be disrupted.

This is contrasted with the hybrid cloud model, where the private cloud on-site allows continued access to the data even during WAN failures, and the public cloud is used as a low-cost data storage target.

Multi-Site Archives

When continuance of business operations are important, or archival data must be accessible across multiple sites, the archive can be extended to span multiple sites. This involves several considerations, including:

What data is created in one site and accessed in another?
What data should be replicated to other sites for protection?
What data should be replicated to other sites for performance?
In the event of a site loss scenario, what will be the additional load placed on other sites?

Such multi-site archives are very flexible, and allow seamless continuance of operations even in the event of major and catastrophic failures that affect one or more sites. While this is obviously the best solution from a business continunace standpoint, it is also the most expensive, as you must duplicate your entire archive infrastructure in multiple sites, and provide sufficient WAN bandwidth for cross-site data replication.

Of course, one can also deploy systems like Bycast's StorageGRID to provide a mixture of the above described approaches, using policies to determine which archived content is stored locally, vaulted to tape, stored in a public cloud, and replicated across multiple sites. This flexibility allows the value to the data to be mapped to the cost of the storage, and leverages a common infrastructure for all levels of protection required.

2009-09-14

Introducing CDMI

Today, the Storage Networking Industry Association (SNIA) publicly released the first draft of the Cloud Data Management Interface (CDMI). The draft standard can be downloaded at the below address:

http://www.snia.org/tech_activities/publicreview

I'm very pleased to have been a significant contributor to this standard since the inception of the working group earlier this year. Over the last nine months, we've been able to come a long way towards defining a working standard for cloud storage management, and Bycast is proud to have contributed many best-of-breed capabilities first pioneered in Bycast's StorageGRID HTTP API, used by hundreds of customers worldwide to store and access many dozens of petabytes of data in cloud environments, both public and private.

Why CDMI?

CDMI provides a standardized method by which data objects and metadata can be stored, accessed and managed within a cloud environment. It is intended to provide a consistent method for access by applications and end-users' systems, and provide a consistent interface for providers of cloud storage.

Currently, almost all of the cloud storage providers and vendors use significantly different APIs, which forces cloud application and gateway software vendors to code and test against different APIs, and having to architect their application around the lowest common denominator. CDMI significantly reduces the complexity of development, test and integration for the application vendor, and is specifically designed to be easy to adopt for both cloud providers and application vendors. CDMI can run along side existing cloud protocols, and, as an example, a customer could run a CDMI gateway in an EC2 instance to gain access to their existing Amazon S3 bucket without Amazon having to do any work — a great example of the power of cloud!

Much like SCSI, FiberChannel and TCP/IP, such industry-wide standards provide many advantages. These range from simple but essential efficiencies, such as standardized interface documentation, conformance and performance testing tools, the creation of a market for value-added tools such as protocol analyzers and developer awareness, libraries and code examples.

Industry standards also jump-start the network effect, where more applications encourage providers to support the standard, and more providers supporting the standard encourage application vendors to support the standard. Finally, and most excitingly, CDMI increases inter-cloud interoperability, and is a fundamental enabler for advanced emerging cloud models such as federation, peering and delegation, and the emergence of specialized clouds for content delivery, processing and preservation.

A Whirlwind Tour of CDMI

CDMI stores objects (data and metadata) in named containers that group the objects together. All stored objects are accessed by a web addresses that either contain a path (eg: http://cloud.example.com/myfiles/cdmi.txt) or an object identifier (eg: http://cloud.example.com/objectid/AABwbQAQvmAJSJWUHU3awAAA==).

CDMI provides a series of RESTful HTTP operations that can be used to access and manipulate a cloud storage system. PUT is used to create and update objects, GET is used to retrieve objects, HEAD is used to retrieve metadata about objects, and DELETE is used to remove objects.

Data stored in CDMI can be referenced between clouds (where one cloud points to another), copied and moved between clouds, and can be serialized into an export format that can be used to facilitate cloud-to-cloud transfers and customer bulk data transfers. All data-metadata relationships are preserved, and standard metadata is defined to allow a client to specify how the cloud storage system should manage the data. Examples of this "Data System Metadata" include the acceptable levels of latency and the degree of protection through replication.

In addition to basic objects and containers (similar to file and folder from a file system), CDMI also supports the concept of capabilities, which allow a client to discover what a cloud storage system is capable of doing. CDMI also supports accounts, which provide control and statistics over account security, usage and billing. Finally, CDMI supports queue data storage objects, which enable many exciting new possibilities for cloud storage.

In fact, queues are important and significant enough that I'll be writing more about them and what they enable in a subsequent blog entry.

The Next Steps

With CDMI now "out in the wild", this is the point where the standards effort starts to get really interesting. Up to this point, it has been a relatively small group that has been working on the standard, and we've had to make some controversial decisions (such as eliminating locking and versioning from the first release). There's still a lot of work to be done, and as CDMI gets more visibility, we look forward to increased involvement from other players in the industry. Together, we can make this standard even better, and help shape the future of cloud storage.

So, if you are interested in cloud storage and cloud storage APIs, I would strongly encourage you to take the time to read the CDMI draft documentation, and contribute your thoughts and suggestions.

We're proud of what we've achieved, and together, we can make it even better.

2009-09-10

Cloud Computing and Cloud Storage Standards

All of our work at the Cloud Storage technical working group in the Storage Networking Industry Association (SNIA) has been coming together, and we are nearing a public release of the Cloud Data Management Interface (CDMI).

We've also been working with the Open Grid Focum (OGF) on making it such that CDMI can be used in conjunction with the OCCI standard to manage data storage in cloud computing environments, and there are some exciting possibilities of combining CDMI with the VMWare vCloud standard that was recently donated as a potential industry standard to the DMTF.

You can read our joint whitepaper on CDMI and OCCI at the below URI:

http://ogf.org/Resources/documents/CloudStorageForCloudComputing.pdf

If any readers have any questions about CDMI and how it facilitates cloud computing, please don't hesitate to comment!