2010-02-23

A CDMI Tutorial - Data Management, Part 1

As we covered in first part of this tutorial, the SNIA Cloud Data Management Interface provides a RESTful mechanism for the basic storage and retrieval of data. However, the core of the standard is focused around data management — how objects are stored, delivered, placed, protected and more.

This post is the second in a series on CDMI. Subsequent posts will cover the following areas:
Data Management and Metadata

CDMI storage systems see the world as a tree of objects and containers. Objects store data, and containers contain child objects and containers. Regardless of if the data is stored via HTTP or via traditional protocols such as NFS or CIFS, all data represented through the CDMI protocol is fundamentally seen in terms of objects and containers.

Management of stored data is enabled through metadata. Every container and object can have metadata associated with it. Metadata in CDMI is organized into three general categories; user metadata, storage system metadata, and data system metadata.

User Metadata

User metadata is set directly by CDMI clients, or indirectly through the extended metadata interfaces of other access protocols. For example, in NFS, extended attributes can be mapped to CDMI user metadata items. User metadata items are arbitrary, and are not interpreted by the storage system.

Storage System Metadata

Storage system metadata are generated by the CDMI storage system, and provide read-only access to information about the stored data that is managed by the storage system. The creation time of a object is a good example of a storage system metadata item.

Data System Metadata

Data system metadata are provided by a CDMI client, or specified through an out-of-band management interface, and determine how the stored data should be managed. For example, data system metadata can specify the degree of replication, or an encryption level desired to protect data while stored on disk.

It is through the specification of data system metadata that CDMI enables the management of how data should be stored.

Example Object Metadata

As an example, let's assume that we have a CDMI-enabled storage system that also provides a NFS share, and we've stored some documents onto it.


We can then connect to the system via CDMI, and access the CDMI metadata of the "Documents" container:
GET /Documents/ HTTP/1.1
Host: cloud.example.com
Content-Type: application/vnd.org.snia.cdmi.object+json
X-CDMI-Specification-Version: 1.0

HTTP/1.1 200 OK
Content-Type: application/vnd.org.snia.cdmi.container+json
X-CDMI-Specification-Version: 1.0
{
"objectURI" : "/Documents/",
"objectID" : "AABwbQAQ8ypO85j/ml8TZQ==",
"parentURI" : "/",
"accountURI" : "/cdmi_accounts/default_account/",
"capabilitiesURI" : "/cdmi_capabilities/container/",
"percentageComplete" : "Complete",
"metadata" : {
"user.DosAttrib": "0x20",
"cdmi_ctime" : "2009-12-29T12:43:32.479832Z",
"cdmi_atime" : "2010-01-02T16:12:53.521983Z",
"cdmi_mtime" : "2010-01-02T16:12:53.521983Z",
"cdmi_acount" : "52",
"cdmi_mcount" : "12",
"ACL" : {
"acetype" : "0x00",
"identifier" : "jdoe",
"aceflags" : "0x03",
"acemask" : "0x000F005F",
"acetime" : "2009-12-29T12:43:32.479832Z"
},
"cdmi_data_redundancy": "2",
"cdmi_immediate_redundancy": "2",
"cdmi_infrastructure_redundancy": "2",
"cdmi_geographic_placement": [
"US"
],
"cdmi_encryption": "AES_256_CTR",
"cdmi_data_redundancy_billed": "2",
"cdmi_immediate_redundancy_billed": "2",
"cdmi_infrastructure_redundancy_billed": "1",
"cdmi_geographic_placement_billed": [
"US"
],
"cdmi_encryption_billed": "AES_256_CTR"
},
"childrenrange" : "1-3",
"children" : [
"Financials/",
"CDMI_Spec.pdf",
"hello.txt"
]
}
There is quite a few metadata items here, and it looks quite complex, but it's not as bad as it first appears. To see what these metadata items mean, and how they are used to manage stored data, let's review them individually:

User Metadata Items

The directory (container in CDMI speak) has a single user metadata item:
      "user.DOSATTRIB": "0x20",
This user metadata item is an extended attribute that indicates the archive bit is set in the DOS mode of a directory. In this case, this user metadata item was created when the directory was created by SAMBA, and the storage server presents extended attributes on the filesystem as user metadata items.

Storage System Metadata Items

The directory has five storage system metadata items, and an ACL, which is an example of a more complex storage system metadata item:
        "cdmi_ctime" : "2009-12-29T12:43:32.479832Z",
"cdmi_atime" : "2010-01-02T16:12:53.521983Z",
"cdmi_mtime" : "2010-01-02T16:12:53.521983Z",
"cdmi_acount" : "52",
"cdmi_mcount" : "12",
"ACL" : {
"acetype" : "0x00",
"identifier" : "jdoe",
"aceflags" : "0x03",
"acemask" : "0x000F005F",
"acetime" : "2009-12-29T12:43:32.479832Z"
},
In this example, the first three metadata items contain the creation, last access and last modify times, respectively, and the remaining two show the number of accesses and the number of modifications since creation. The ACL metadata specifies the access control restrictions for the folder, and is based on NFSv4 ACLs.

Data System Metadata Items

Finally, we have the data system metadata items. These items are specified by a CDMI client, or an out-of-band management application, and specify how the data in the "Documents" folder should be managed:
"cdmi_data_redundancy": "2",
"cdmi_immediate_redundancy": "2",
"cdmi_infrastructure_redundancy": "2",
"cdmi_geographic_placement": [
"US"
],
"cdmi_encryption": "AES_256_CBC",
The first data system metadata item, "cdmi_data_redundancy", indicates how many indpendent copies of the stored data should be kept. In this case, it has been set to two, which means that two copies of the data should be stored. Likewise, "cdmi_immediate_redundancy" indicates that two copies should be provided synchronously, "cdmi_infrastructure_redundancy" indicates that the two copies should be located in separate failure domains, "cdmi_geographic_placement" indicates that the copies should remain within the United States, and "cdmi_encryption" indicates that AES with a 256 bit key in counter mode should be used to protect the data.

It is important to note that data system metadata items expresses the desired management behaviour. This is separate from the actual management behaviour. In order to indicate to a client what management behaviours are actually being provided, CDMI provides a matching series of data system metadata items, ending with the suffix "_billed":
"cdmi_data_redundancy_billed": "2",
"cdmi_immediate_redundancy_billed": "2",
"cdmi_infrastructure_redundancy_billed": "1",
"cdmi_geographic_placement_billed": [
"US"
],
"cdmi_encryption_billed": "AES_256_CTR",
In this case, the data system metadata is specifying that the storage system is able to meet the requested level of redundancy, the requested immediate redundancy, but is not able to provide the requested infrastructure redundancy, or the requested encryption method. This allows a client to discover if the requested data system services are being provided.

Data Management Summary

So, putting this all together, data system metadata is specified or inherited from parent containers. The data system metadata specified the desired data system services for stored content. The system tries to accomplish the requested data system services, and indicates what services are actually being provided in the corresponding billed data system metadata items.

All of the data system metadata items specified for the "Documents" container will also be applied to the child "Financials" container, unless overridden by a data system metadata value specified there. For example, if the "Financials" container has the following data system metadata:
"cdmi_data_redundancy": "3",
Only the redundency is changed — all other items are inherited from the parent. Thus, the billed metadata values would be:
"cdmi_data_redundancy_billed": "3",
"cdmi_immediate_redundancy_billed": "2",
"cdmi_infrastructure_redundancy_billed": "1",
"cdmi_geographic_placement_billed": [
"US"
],
"cdmi_encryption_billed": "AES_256_CTR",
Currently all data system metadata items are inherited, and there is no way to override an inherited value except by specifying a new data system metadata item.

This concludes a quick overview of the basics of data management and metadata in CDMI. In the next part of the tutorial, we will discuss the various data system metadata services defined in CDMI.

11 comments:

ganges said...

are there any products which are implemented on CDMI standards?

ganges said...

What benefits which i will get using CDMI standards in cloud?

David Slik said...

While there are no commercial products currently shipping today that provide a CDMI complaint interface, quite a few vendors, including several top tier storage vendors, are in the process of developing CDMI-based storage products.

David Slik said...

The primary benefits resulting from CDMI are:

* Reduced development cost
* Ecosystem of tools, documentation and libraries
* Interoperability between clients and providers
* Ability to create federations between clouds

You can read a more detailed discussion in my below blog entry:

http://tinyurl.com/2g5lajp

ganges said...

Thank you for sharing the information.

ganges said...

I came across one open source tool i.e. StorageIM from olocity which is developed based on CDMI technology. It monitors the storage systems and also does the reporting. Please let me know can't we implement such a tool using java for data movement between clouds.

Juniarto Halayudha said...

where is the metadata stored? when is it created?

David Slik said...

Metadata is stored in a CDMI storage system, and the metadata is created by a CDMI client.

As an example, if you had a music client, it would take the artist name, length and other metadata, and store it in the CDMI storage system.

Juniarto Halayudha said...

in the CDMI standard, there are metadata for container, acl, storage system, etc. this metadata is not from client, i think.
how does the acl work?
you said metadata is stored in CDMI storage system. where is it actually? in a no-sql json format database like mongodb?

Ilja Livenson said...

Hi, David

thanks for a very nice blog entry. I'm currently working on a CDMI implementation as well (https://github.com/livenson/vcdm, quite raw still) and would love to see continuation of your entries on CDMI.

David Slik said...

Thank you. I'm hoping to be able to continue/update this series of blog posts soon.

With respect to storing metadata, that is up to the implementer of a CDMI server, and they could use tools like mongodb.