As we covered in first part of this tutorial, the SNIA Cloud Data Management Interface provides a RESTful mechanism for the basic storage and retrieval of data. However, the core of the standard is focused around data management — how objects are stored, delivered, placed, protected and more.
This post is the second in a series on CDMI. Subsequent posts will cover the following areas:
Data Management and MetadataCDMI storage systems see the world as a tree of objects and containers. Objects store data, and containers contain child objects and containers. Regardless of if the data is stored via HTTP or via traditional protocols such as NFS or CIFS, all data represented through the CDMI protocol is fundamentally seen in terms of objects and containers.
Management of stored data is enabled through metadata. Every container and object can have metadata associated with it. Metadata in CDMI is organized into three general categories; user metadata, storage system metadata, and data system metadata.
User MetadataUser metadata is set directly by CDMI clients, or indirectly through the extended metadata interfaces of other access protocols. For example, in NFS, extended attributes can be mapped to CDMI user metadata items. User metadata items are arbitrary, and are not interpreted by the storage system.
Storage System MetadataStorage system metadata are generated by the CDMI storage system, and provide read-only access to information about the stored data that is managed by the storage system. The creation time of a object is a good example of a storage system metadata item.
Data System MetadataData system metadata are provided by a CDMI client, or specified through an out-of-band management interface, and determine how the stored data should be managed. For example, data system metadata can specify the degree of replication, or an encryption level desired to protect data while stored on disk.
It is through the specification of data system metadata that CDMI enables the management of how data should be stored.
Example Object MetadataAs an example, let's assume that we have a CDMI-enabled storage system that also provides a NFS share, and we've stored some documents onto it.
We can then connect to the system via CDMI, and access the CDMI metadata of the "Documents" container:
GET /Documents/ HTTP/1.1
Host: cloud.example.com
Content-Type: application/vnd.org.snia.cdmi.object+json
X-CDMI-Specification-Version: 1.0
HTTP/1.1 200 OK
Content-Type: application/vnd.org.snia.cdmi.container+json
X-CDMI-Specification-Version: 1.0
{
"objectURI" : "/Documents/",
"objectID" : "AABwbQAQ8ypO85j/ml8TZQ==",
"parentURI" : "/",
"accountURI" : "/cdmi_accounts/default_account/",
"capabilitiesURI" : "/cdmi_capabilities/container/",
"percentageComplete" : "Complete",
"metadata" : {
"user.DosAttrib": "0x20",
"cdmi_ctime" : "2009-12-29T12:43:32.479832Z",
"cdmi_atime" : "2010-01-02T16:12:53.521983Z",
"cdmi_mtime" : "2010-01-02T16:12:53.521983Z",
"cdmi_acount" : "52",
"cdmi_mcount" : "12",
"ACL" : {
"acetype" : "0x00",
"identifier" : "jdoe",
"aceflags" : "0x03",
"acemask" : "0x000F005F",
"acetime" : "2009-12-29T12:43:32.479832Z"
},
"cdmi_data_redundancy": "2",
"cdmi_immediate_redundancy": "2",
"cdmi_infrastructure_redundancy": "2",
"cdmi_geographic_placement": [
"US"
],
"cdmi_encryption": "AES_256_CTR",
"cdmi_data_redundancy_billed": "2",
"cdmi_immediate_redundancy_billed": "2",
"cdmi_infrastructure_redundancy_billed": "1",
"cdmi_geographic_placement_billed": [
"US"
],
"cdmi_encryption_billed": "AES_256_CTR"
},
"childrenrange" : "1-3",
"children" : [
"Financials/",
"CDMI_Spec.pdf",
"hello.txt"
]
}
There is quite a few metadata items here, and it looks quite complex, but it's not as bad as it first appears. To see what these metadata items mean, and how they are used to manage stored data, let's review them individually:
User Metadata ItemsThe directory (container in CDMI speak) has a single user metadata item:
"user.DOSATTRIB": "0x20",
This user metadata item is an extended attribute that indicates the archive bit is set in the DOS mode of a directory. In this case, this user metadata item was created when the directory was created by SAMBA, and the storage server presents extended attributes on the filesystem as user metadata items.
Storage System Metadata ItemsThe directory has five storage system metadata items, and an ACL, which is an example of a more complex storage system metadata item:
"cdmi_ctime" : "2009-12-29T12:43:32.479832Z",
"cdmi_atime" : "2010-01-02T16:12:53.521983Z",
"cdmi_mtime" : "2010-01-02T16:12:53.521983Z",
"cdmi_acount" : "52",
"cdmi_mcount" : "12",
"ACL" : {
"acetype" : "0x00",
"identifier" : "jdoe",
"aceflags" : "0x03",
"acemask" : "0x000F005F",
"acetime" : "2009-12-29T12:43:32.479832Z"
},
In this example, the first three metadata items contain the creation, last access and last modify times, respectively, and the remaining two show the number of accesses and the number of modifications since creation. The ACL metadata specifies the access control restrictions for the folder, and is based on NFSv4 ACLs.
Data System Metadata ItemsFinally, we have the data system metadata items. These items are specified by a CDMI client, or an out-of-band management application, and specify how the data in the "Documents" folder should be managed:
"cdmi_data_redundancy": "2",
"cdmi_immediate_redundancy": "2",
"cdmi_infrastructure_redundancy": "2",
"cdmi_geographic_placement": [
"US"
],
"cdmi_encryption": "AES_256_CBC",
The first data system metadata item, "cdmi_data_redundancy", indicates how many indpendent copies of the stored data should be kept. In this case, it has been set to two, which means that two copies of the data should be stored. Likewise, "cdmi_immediate_redundancy" indicates that two copies should be provided synchronously, "cdmi_infrastructure_redundancy" indicates that the two copies should be located in separate failure domains, "cdmi_geographic_placement" indicates that the copies should remain within the United States, and "cdmi_encryption" indicates that AES with a 256 bit key in counter mode should be used to protect the data.
It is important to note that data system metadata items expresses the
desired management behaviour. This is separate from the actual management behaviour. In order to indicate to a client what management behaviours are actually being provided, CDMI provides a matching series of data system metadata items, ending with the suffix "_billed":
"cdmi_data_redundancy_billed": "2",
"cdmi_immediate_redundancy_billed": "2",
"cdmi_infrastructure_redundancy_billed": "1",
"cdmi_geographic_placement_billed": [
"US"
],
"cdmi_encryption_billed": "AES_256_CTR",
In this case, the data system metadata is specifying that the storage system is able to meet the requested level of redundancy, the requested immediate redundancy, but is not able to provide the requested infrastructure redundancy, or the requested encryption method. This allows a client to discover if the requested data system services are being provided.
Data Management SummarySo, putting this all together, data system metadata is specified or inherited from parent containers. The data system metadata specified the desired data system services for stored content. The system tries to accomplish the requested data system services, and indicates what services are actually being provided in the corresponding billed data system metadata items.
All of the data system metadata items specified for the "Documents" container will also be applied to the child "Financials" container, unless overridden by a data system metadata value specified there. For example, if the "Financials" container has the following data system metadata:
"cdmi_data_redundancy": "3",
Only the redundency is changed — all other items are inherited from the parent. Thus, the billed metadata values would be:
"cdmi_data_redundancy_billed": "3",
"cdmi_immediate_redundancy_billed": "2",
"cdmi_infrastructure_redundancy_billed": "1",
"cdmi_geographic_placement_billed": [
"US"
],
"cdmi_encryption_billed": "AES_256_CTR",
Currently all data system metadata items are inherited, and there is no way to override an inherited value except by specifying a new data system metadata item.
This concludes a quick overview of the basics of data management and metadata in CDMI. In the next part of the tutorial, we will discuss the various data system metadata services defined in CDMI.