2009-08-04

Dictionary of a Cloud Standard

Any cloud storage standard requires the consideration of many interrelated functional areas. Below is a comprehensive list of all the areas that must be considered in the standardization process, and any standard must choose the scope and degree to which these items are addressed.

As all of these subjects are inter-related, they are listed below in alphabetical order:

Accounts – A grouping of stored objects for the purposes of administrative control and billing purposes. Each object is owned by one or more account, and is billed to those accounts. Accounts may have sub-accounts, where content owned by a sub-account is rolled up to a higher-level account.

Accounts (Provisioning) – The interface by which administrative clients can create new accounts, modify account characteristics or remove accounts.

Audit – Detailed records of accesses and state changes to the storage system used for troubleshooting, forensic analysis and security analysis. Audit information should be accessible based on partition, account, client, audit types and other filters, ideally through a cloud API.

Client Authentication – The method by which the credentials presented by a client are verified and mapped to local identities used to determine permissions.

Client Authentication Delegation – The method of verifying and mapping credentials, where the processing is delegated to an external system or alternate cloud. (eg, AD, LDAP)

Client Interface Protocols – The method by which clients are able to access data stored in the cloud. Interface protocols are typically specific to how the client interacts with the data. For example, a file-system client would expect an interface protocol such as NFS or CIFS, where a database client would expect a SQL interface. Clients that interact with objects directly may use protocols such as XAM or RESTful HTTP object access.

Cloud Partitioning – The ability to take a single cloud and create multiple partitions that act as completely independent clouds while still using the same common cloud infrastructure.

Cloud Peering – The ability for one cloud to transparently reference and store objects in another cloud such that a client can transparently access content directly from either cloud.

Cloud Federation – The ability for one cloud to transparently reference and store objects in another cloud such that a client can transparently access content from the primary cloud.

Event Feeds – A client-accessible queue of events that occur against a subset of content (as defined by a query) that can be used for state synchronization between clouds, billing and other inter-system communication.

Introspection – The ability for a client to discover what services a given cloud is capable of performing, and what subset of these capabilities the client is allowed to use.

Metadata (User) – Arbitrary client-named key/value pairs and tag lists specified by a client that are stored along with an object.

Metadata (Data System) – Standardized named key/value pairs and tag lists specified by a client that indicate to the cloud how the object or objects should be managed. Examples include ACLs, encryption levels, degrees of replication or dispersion, and QoS goals.

Metadata (System) – Standardized named key/value pairs and tag lists generated by the cloud that indicate to the client properties of the object. examples include last access time, actual degree of replication (as opposed to requested degree of replication), and lock status.

Namespace (Local) – Each client or set of clients may see a different subset of the global namespace, or a client-specific namespace. Objects within the cloud may reside in one or more namespaces. By sharing objects across namespaces, different faceted views into the cloud can be created, and use cases such as sharing can be enabled.

Namespace (Global) – An administrator or suitably configured client may see all objects in the cloud within one namespace.

Object (Composite) – The ability for an object to contain other objects such that the collection of objects can be managed and stored together as a single object.

Object Identifiers – Each object stored within the cloud needs a globally unique identifier that stays with the object across its entire lifecycle. Ideally, these identifiers are unique across clouds, and are preserved across clouds, which enables federation, transparent migration and peering.

Object Locking – Clients may wish to be able to lock an object to prevent modification or access by other clients.

Object Reference Counting – Clients may wish to gain a form of a lock on an object that ensures that the object will remain in the cloud. Only when all of these references are released can the object be considered for removal.

Object Versioning – Clients may wish to have historical versions of objects be retained when an object is modified.

Object Permissions – Each object stored has a list of which entity is permitted to perform which action. This is typically called an Access Control List (ACL). These access controls specify basic and administrative operations can be performed.

Object Referencing – The ability to specify that a given entity within a namespace is an object in an alternate location within the namespace, in another namespace, or in another cloud, while allowing transparent client access (see cloud peering and federation).

Object Serialization – The ability for a client to take one or more objects and transform them into a single bitstream that can be used for inter-system interchange.

Object Snapshots – Clients may wish to be able to create snapshots of an object or set of objects, such that the state of the objects at that point in time can be accessed via an alternate location within a namespace.

Query – The ability to submit a series of criteria (for object content and/or metadata) and have returned (possibly as another static object) the list of objects that match the specified criteria.

Query (Persistent) – The ability to create queries that run continuously and dynamically update their results over time as the state of the cloud changes.

Usage Statistics (Client) – The ability to obtain information about how many operations have been performed by a given client over a given timeframe is required for accounting, billing and reporting purposes.

Usage Statistics (Object) – The ability to obtain information about how many operations have been performed against a given object or set of objects, and the operations that have been required to manage a given object or set of objects over a given timeframe for accounting, billing and reporting purposes.

These are all aspects that we have considered here at Bycast, and that we are contributing as part of our involvement with the SNIA Cloud Storage Technical Working Group.

2 comments:

Finnbarr P. Murphy said...

What about location of data? Some countries require the data to be stored within their own countries.

David Slik said...

With respect to specifying location restrictions and requirements, this is another example of "Data System Metadata", where the client specifies how the cloud should manage data.

Combined with Cloud Peering, a cloud can use that to either spread data geographically within its own infrastructure, or spread the data across other clouds.