<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-6060498081821905291</id><updated>2011-07-30T15:43:21.283-07:00</updated><category term='IBM'/><category term='Tape'/><category term='Blog Responses'/><category term='HP'/><category term='Usability'/><category term='User Interface'/><category term='Metadata'/><category term='Research'/><category term='Microsoft'/><category term='Musings'/><category term='Search Engines'/><category term='Cloud Computing'/><category term='NetApp'/><category term='EMC'/><category term='Archiving'/><category term='SNIA'/><category term='Rules'/><category term='Security'/><category term='S3'/><category term='Trust'/><category term='Web 2.0'/><category term='VMs'/><category term='Google'/><category term='Graphing'/><category term='Distributed Computing'/><category term='Wikipedia'/><category term='Compliance'/><category term='Bycast'/><category term='Object Storage'/><category term='Hardware'/><category term='Time'/><category term='Storage'/><category term='XAM'/><category term='Query'/><category term='Cloud Storage'/><category term='Information Visualization'/><category term='Reliability'/><category term='ADE'/><category term='Design Patterns'/><category term='CDMI'/><category term='Scalability'/><title type='text'>Vanishing Into the Infrastructure</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>55</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-4731958822487711507</id><published>2010-06-01T17:35:00.000-07:00</published><updated>2010-06-01T17:47:09.724-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='NetApp'/><category scheme='http://www.blogger.com/atom/ns#' term='Bycast'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>Not the End, Just the End of a Beginning</title><content type='html'>Today, I am proud to announce my new blog as part of the NetApp blogging community: &lt;a href="http://blogs.netapp.com/context/"&gt;Objects in Context&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;I am very pleased to join the NetApp team, and to have this opportunity to start writing a second chapter. January 2010 marked the passage of a decade since I first started working on object storage, and we've seen many milestones, from the first large-scale deployments to the development of industry standards. Over the next few years, I fully expect things to accelerate — After all, the Internet changes everything, and storage is just starting to catch up.&lt;br /&gt;&lt;br /&gt;Moving forward, I will be re-purposing this blog here to discuss topics such as information visualization, radio design, iPad development and other similar non-work-related subjects. I encourage all my readers to continue to follow my adventures in object storage over at the NetApp blog. We've got many exciting things in store, and I can assure you that in this case, one plus one is far greater than two.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-4731958822487711507?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/4731958822487711507/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=4731958822487711507' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4731958822487711507'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4731958822487711507'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2010/06/not-end-just-end-of-beginning.html' title='Not the End, Just the End of a Beginning'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-1563045014325418980</id><published>2010-03-30T17:07:00.000-07:00</published><updated>2010-03-30T17:23:12.090-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>CDMI Functional Areas</title><content type='html'>With the 1.0 release of the Cloud Data Management Interface standard quickly wrapping up, below is my summary of the major functional areas covered in the &lt;a href="http://www.snia.org/tech_activities/publicreview#cloud"&gt;1.0g draft&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object Access by Name - Sections 8 and 9&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Cloud storage clients can store, list and retrieve objects by name. This is the most common method for accessing objects, and allows the placement of objects into "containers" to group together like objects, in a similar manner as directories in a filesystem.&lt;br /&gt;&lt;br /&gt;For more details, see &lt;a href="http://intotheinfrastructure.blogspot.com/2009/12/cdmi-tutorial-basic-inputoutput.html"&gt;CDMI Tutorial - Basic Input/Output&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object Access by ID - Sections 8 and 9&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Cloud storage clients can also store and retrieve objects by ObjectID. Every named object also has an Object ID, but not all objects have a name. When unnamed objects are created, they can only be accessed by Object ID. Storing Object IDs in a database is more efficient than storing URIs, and this mode of object access is more suited when access is ID based or query based.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Data System Metadata - Section 16.4&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Data System Metadata is a set of special metadata items that are interpreted by the cloud in order to allow the cloud storage client to specify the level of service, protection, placement and other characteristics for stored objects.&lt;br /&gt;&lt;br /&gt;For more details, see &lt;a href="http://intotheinfrastructure.blogspot.com/2010/02/cdmi-tutorial-data-management-part-1.html"&gt;CDMI Tutorial - Data Management, Part 1&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Serialization - Section 15&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Serialization allows objects and containers to be transformed into a portable data object that can be deserialized back into the original objects and containers. This is useful for archiving and for system-to-system transport.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Domains - Section 10&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Domains specify administrative control within a CDMI cloud, and define how users are mapped to permissions, specifies delegation of authentication and authorization, and provides usage summaries.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Queues - Section 11&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Queues are special data objects that can store multiple values in a first in, first out access semantics. Queues represent a key technology for connecting together applications in the cloud, as they enable reliable inter-process communication with underlying persistent data storage.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Query - Section 11.1.3&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Query is implemented as a queue interface that allows a client to express in a standardized manner a query, and for a query engine to create a response as a CDMI data object with a specific format of results.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Notification - Section 11.1.1&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Notification builds on top of queues, and allows a client to subscribe to a client-defined set of notifications about operations performed against the storage cloud.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Audit - Section 11.1.2, 17&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Audit builds on top of queues, and allows a client to subscribe to a client-defined set of log messages about system operations.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Exports - Section 13&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Exports allows a client to specify and control how access to named objects are provided through network files protocols. Exports also allow a container to be exported as a block device.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Snapshots - Section 14&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Snapshots are the ability for a client to specify that access to a set of named objects and containers should be preserved.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Retention - Section 18&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Retention and Hold are a set of functions that allow a client to specify that an object may not be modified or deleted, and for how long the restrictions must remain in place.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-1563045014325418980?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/1563045014325418980/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=1563045014325418980' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/1563045014325418980'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/1563045014325418980'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2010/03/cdmi-functional-areas.html' title='CDMI Functional Areas'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-3681534436528115724</id><published>2010-02-24T14:51:00.001-08:00</published><updated>2010-02-24T15:02:44.687-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>The World's Shortest CDMI Implementation</title><content type='html'>The CDMI standard is built around the concept of "capabilities", which describe which functionality a given cloud storage system provides to clients. As a result, it permits a system to only implement a small subset of the standard while still being compliant.&lt;br /&gt;&lt;br /&gt;So, to demonstrate this, I present to you the world's shortest* CDMI implementation, written in Ruby:&lt;br /&gt;&lt;pre name="code" class="ruby"&gt;# World's smallest CDMI implementation, or, how to say NO in CDMI.&lt;br /&gt;require 'socket'&lt;br /&gt;require 'openssl'&lt;br /&gt;&lt;br /&gt;listener = TCPServer.new('', 2000)&lt;br /&gt;&lt;br /&gt;# Set up TLS&lt;br /&gt;ssl_context = OpenSSL::SSL::SSLContext.new()&lt;br /&gt;ssl_context.cert = OpenSSL::X509::Certificate.new(File.open("cdmi_server.cert"))&lt;br /&gt;ssl_context.key = OpenSSL::PKey::RSA.new(File.open("cdmi_server.key"))&lt;br /&gt;ssl_listener = OpenSSL::SSL::SSLServer.new(listener, ssl_context)&lt;br /&gt;&lt;br /&gt;while (connection = ssl_listener.accept)&lt;br /&gt;    request = ""&lt;br /&gt; while(request.index("\n\n") == nil)&lt;br /&gt;  request &lt;&lt; connection.gets&lt;br /&gt; end&lt;br /&gt;&lt;br /&gt; print "-- CLIENT REQUEST -------------------------------------------------------------\n"&lt;br /&gt; print request&lt;br /&gt; print "-------------------------------------------------------------------------------\n"&lt;br /&gt;&lt;br /&gt; if(request.index("GET ") == 0)&lt;br /&gt;  uri = request.slice(request.index(" ") + 1, request.length)&lt;br /&gt;  uri = uri.slice(0, uri.index(" "))&lt;br /&gt;  &lt;br /&gt;  if(uri == "/")&lt;br /&gt;   connection.puts("HTTP/1.1 200 OK\nContent-Type: application/vnd.org.snia.cdmi.container+json\nX-CDMI-Specification-Version: 1.0\n\n{\"objectURI\" : \"/\", \"objectID\" : \"AABwbQAQgTpfe4qRBsyCCw==\", \"parentURI\" : \"/\", \"capabilitiesURI\" : \"/cdmi_capabilities/\", \"completionStatus\" : \"Complete\", \"metadata\" : {}, \"childrenrange\" : \"0-0\", \"children\" : [\"cdmi_capabilities/\"]}")&lt;br /&gt;  elsif(uri == "/cdmi_capabilities" || uri == "/cdmi_capabilities/" )&lt;br /&gt;   connection.puts("HTTP/1.1 200 OK\nContent-Type: application/vnd.org.snia.cdmi.capabilities+json\nX-CDMI-Specification-Version: 1.0\n\n{\"objectURI\" : \"/cdmi_capabilities/\", \"objectID\" : \"AABwbQAQnP7GJT2muKDelQ==\", \"parentURI\" : \"/\", \"capabilities\" : {\"cdmi_security_https_transport\" : \"true\", \"cdmi_read_metadata\" : \"true\", \"cdmi_list_children\" : \"true\"}, \"childrenrange\" : \"\", \"children\" : []}")&lt;br /&gt;  else&lt;br /&gt;   connection.puts("404 Not Found\n")&lt;br /&gt;  end&lt;br /&gt; else&lt;br /&gt;  connection.puts("501 Not Implemented\n")&lt;br /&gt; end&lt;br /&gt;    &lt;br /&gt; connection.close&lt;br /&gt;end&lt;/pre&gt;* It's a little longer than it could be, because it is written for readability. Removal of comments, code tightening, etc, is left as an exercise for the reader.&lt;br /&gt;&lt;br /&gt;Now, since this uses TLS, you'll need a TLS-compatible client. Here's a test client for this purpose:&lt;br /&gt;&lt;pre name="code" class="ruby"&gt;# Test client for a minimal CDMI implementation&lt;br /&gt;require 'socket'&lt;br /&gt;require 'openssl'&lt;br /&gt;&lt;br /&gt;socket = TCPSocket.new('localhost', 2000)&lt;br /&gt;&lt;br /&gt;ssl_context = OpenSSL::SSL::SSLContext.new()&lt;br /&gt;ssl_socket = OpenSSL::SSL::SSLSocket.new(socket, ssl_context)&lt;br /&gt;ssl_socket.sync_close = true&lt;br /&gt;ssl_socket.connect&lt;br /&gt;&lt;br /&gt;ssl_socket.puts("GET #{ARGV[0]} HTTP/1.0")&lt;br /&gt;ssl_socket.puts("accept: application/vnd.org.snia.cdmi.object+json")&lt;br /&gt;ssl_socket.puts("X-CDMI-Specification-Version: 1.0")&lt;br /&gt;ssl_socket.puts("")&lt;br /&gt;&lt;br /&gt;print "-- SERVER RESPONSE ------------------------------------------------------------\n"&lt;br /&gt;while line = ssl_socket.gets&lt;br /&gt; print line&lt;br /&gt;end&lt;br /&gt;print "-------------------------------------------------------------------------------\n"&lt;/pre&gt;In order to run these, you need a x.509 certificate and key. You can generate these using OpenSSL. You can follow the below instructions for apache, then instead of step 5, copy the certificate to "cdmi_server.cert", and copy the key to "cdmi_server.key".&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.akadia.com/services/ssh_test_certificate.html"&gt;http://www.akadia.com/services/ssh_test_certificate.html&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-3681534436528115724?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/3681534436528115724/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=3681534436528115724' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3681534436528115724'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3681534436528115724'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2010/02/worlds-shortest-cdmi-implementation.html' title='The World&apos;s Shortest CDMI Implementation'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-7448427396236074127</id><published>2010-02-23T12:42:00.000-08:00</published><updated>2010-02-23T15:32:23.696-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Security'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>A CDMI Tutorial - Data Management, Part 1</title><content type='html'>As we covered in first part of this tutorial, the SNIA Cloud Data Management Interface provides a RESTful mechanism for the basic storage and retrieval of data. However, the core of the standard is focused around data management — how objects are stored, delivered, placed, protected and more.&lt;br /&gt;&lt;br /&gt;This post is the second in a series on CDMI. Subsequent posts will cover the following areas:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://intotheinfrastructure.blogspot.com/2009/12/cdmi-tutorial-basic-inputoutput.html"&gt;Basic Input/Output&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://intotheinfrastructure.blogspot.com/2010/02/cdmi-tutorial-data-management-part-1.html"&gt;Data Management, Part 1&lt;/a&gt; (This post)&lt;/li&gt;&lt;li&gt;Data Management, Part 2&lt;/li&gt;&lt;li&gt;Advanced Input/Output&lt;/li&gt;&lt;li&gt;Cloud-to-Cloud Interactions&lt;/li&gt;&lt;li&gt;Queues and Query&lt;/li&gt;&lt;li&gt;Authentication and Access Control&lt;/li&gt;&lt;li&gt;Billing and Accounting&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Data Management and Metadata&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;CDMI storage systems see the world as a tree of objects and containers. Objects store data, and containers contain child objects and containers. Regardless of if the data is stored via HTTP or via traditional protocols such as NFS or CIFS, all data represented through the CDMI protocol is fundamentally seen in terms of objects and containers.&lt;br /&gt;&lt;br /&gt;Management of stored data is enabled through metadata. Every container and object can have metadata associated with it. Metadata in CDMI is organized into three general categories; user metadata, storage system metadata, and data system metadata.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;User Metadata&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;User metadata is set directly by CDMI clients, or indirectly through the extended metadata interfaces of other access protocols. For example, in NFS, extended attributes can be mapped to CDMI user metadata items. User metadata items are arbitrary, and are not interpreted by the storage system.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Storage System Metadata&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Storage system metadata are generated by the CDMI storage system, and provide read-only access to information about the stored data that is managed by the storage system. The creation time of a object is a good example of a storage system metadata item.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Data System Metadata&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Data system metadata are provided by a CDMI client, or specified through an out-of-band management interface, and determine how the stored data should be managed. For example, data system metadata can specify the degree of replication, or an encryption level desired to protect data while stored on disk.&lt;br /&gt;&lt;br /&gt;It is through the specification of data system metadata that CDMI enables the management of how data should be stored.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Example Object Metadata&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;As an example, let's assume that we have a CDMI-enabled storage system that also provides a NFS share, and we've stored some documents onto it.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_FN1WQhtIvYA/S4Q--l1rcMI/AAAAAAAAADU/mEVbOZagEPY/s1600-h/2010-02-23+CDMI+Tutorial+Image.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 211px;" src="http://4.bp.blogspot.com/_FN1WQhtIvYA/S4Q--l1rcMI/AAAAAAAAADU/mEVbOZagEPY/s400/2010-02-23+CDMI+Tutorial+Image.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5441543494800470210" /&gt;&lt;/a&gt;&lt;br /&gt;We can then connect to the system via CDMI, and access the CDMI metadata of the "Documents" container:&lt;br /&gt;&lt;pre name="code" class="js"&gt;GET /Documents/ HTTP/1.1&lt;br /&gt;Host: cloud.example.com&lt;br /&gt;Content-Type: application/vnd.org.snia.cdmi.object+json&lt;br /&gt;X-CDMI-Specification-Version: 1.0&lt;br /&gt;&lt;br /&gt;HTTP/1.1 200 OK&lt;br /&gt;Content-Type: application/vnd.org.snia.cdmi.container+json&lt;br /&gt;X-CDMI-Specification-Version: 1.0&lt;br /&gt;{&lt;br /&gt;  "objectURI" : "/Documents/",&lt;br /&gt;  "objectID" : "AABwbQAQ8ypO85j/ml8TZQ==",&lt;br /&gt;  "parentURI" : "/",&lt;br /&gt;  "accountURI" : "/cdmi_accounts/default_account/",&lt;br /&gt;  "capabilitiesURI" : "/cdmi_capabilities/container/",&lt;br /&gt;  "percentageComplete" : "Complete",&lt;br /&gt;  "metadata" : {&lt;br /&gt;    "user.DosAttrib": "0x20",&lt;br /&gt;    "cdmi_ctime" : "2009-12-29T12:43:32.479832Z",&lt;br /&gt;    "cdmi_atime" : "2010-01-02T16:12:53.521983Z",&lt;br /&gt;    "cdmi_mtime" : "2010-01-02T16:12:53.521983Z",&lt;br /&gt;    "cdmi_acount" : "52",&lt;br /&gt;    "cdmi_mcount" : "12",&lt;br /&gt;    "ACL" : {&lt;br /&gt;      "acetype" : "0x00",&lt;br /&gt;      "identifier" : "jdoe",&lt;br /&gt;      "aceflags" : "0x03",&lt;br /&gt;      "acemask" : "0x000F005F",&lt;br /&gt;      "acetime" : "2009-12-29T12:43:32.479832Z" &lt;br /&gt;    },&lt;br /&gt;    "cdmi_data_redundancy": "2",&lt;br /&gt;    "cdmi_immediate_redundancy": "2",&lt;br /&gt;    "cdmi_infrastructure_redundancy": "2",&lt;br /&gt;    "cdmi_geographic_placement": [&lt;br /&gt;        "US" &lt;br /&gt;    ],&lt;br /&gt;    "cdmi_encryption": "AES_256_CTR",&lt;br /&gt;    "cdmi_data_redundancy_billed": "2",&lt;br /&gt;    "cdmi_immediate_redundancy_billed": "2",&lt;br /&gt;    "cdmi_infrastructure_redundancy_billed": "1",&lt;br /&gt;    "cdmi_geographic_placement_billed": [&lt;br /&gt;        "US" &lt;br /&gt;    ],&lt;br /&gt;    "cdmi_encryption_billed": "AES_256_CTR" &lt;br /&gt;  },&lt;br /&gt;  "childrenrange" : "1-3",&lt;br /&gt;  "children" : [&lt;br /&gt;    "Financials/",&lt;br /&gt;    "CDMI_Spec.pdf",&lt;br /&gt;    "hello.txt" &lt;br /&gt;  ]&lt;br /&gt;}&lt;/pre&gt;There is quite a few metadata items here, and it looks quite complex, but it's not as bad as it first appears. To see what these metadata items mean, and how they are used to manage stored data, let's review them individually:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;User Metadata Items&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The directory (container in CDMI speak) has a single user metadata item:&lt;br /&gt;&lt;pre name="code" class="js"&gt;      "user.DOSATTRIB": "0x20",&lt;/pre&gt;This user metadata item is an extended attribute that indicates the archive bit is set in the DOS mode of a directory. In this case, this user metadata item was created when the directory was created by SAMBA, and the storage server presents extended attributes on the filesystem as user metadata items.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Storage System Metadata Items&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The directory has five storage system metadata items, and an ACL, which is an example of a more complex storage system metadata item:&lt;br /&gt;&lt;pre name="code" class="js"&gt;        "cdmi_ctime" : "2009-12-29T12:43:32.479832Z",&lt;br /&gt;        "cdmi_atime" : "2010-01-02T16:12:53.521983Z",&lt;br /&gt;        "cdmi_mtime" : "2010-01-02T16:12:53.521983Z",&lt;br /&gt;        "cdmi_acount" : "52",&lt;br /&gt;        "cdmi_mcount" : "12",&lt;br /&gt;        "ACL" : {&lt;br /&gt;            "acetype" : "0x00",&lt;br /&gt;            "identifier" : "jdoe",&lt;br /&gt;            "aceflags" : "0x03",&lt;br /&gt;            "acemask" : "0x000F005F",&lt;br /&gt;            "acetime" : "2009-12-29T12:43:32.479832Z" &lt;br /&gt;        },&lt;/pre&gt;In this example, the first three metadata items contain the creation, last access and last modify times, respectively, and the remaining two show the number of accesses and the number of modifications since creation. The ACL metadata specifies the access control restrictions for the folder, and is based on NFSv4 ACLs.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Data System Metadata Items&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Finally, we have the data system metadata items. These items are specified by a CDMI client, or an out-of-band management application, and specify how the data in the "Documents" folder should be managed:&lt;br /&gt;&lt;pre name="code" class="js"&gt;"cdmi_data_redundancy": "2",&lt;br /&gt;"cdmi_immediate_redundancy": "2",&lt;br /&gt;"cdmi_infrastructure_redundancy": "2",&lt;br /&gt;"cdmi_geographic_placement": [&lt;br /&gt;    "US"&lt;br /&gt;],&lt;br /&gt;"cdmi_encryption": "AES_256_CBC",&lt;/pre&gt;The first data system metadata item, "cdmi_data_redundancy", indicates how many indpendent copies of the stored data should be kept. In this case, it has been set to two, which means that two copies of the data should be stored. Likewise, "cdmi_immediate_redundancy" indicates that two copies should be provided synchronously, "cdmi_infrastructure_redundancy" indicates that the two copies should be located in separate failure domains, "cdmi_geographic_placement" indicates that the copies should remain within the United States, and "cdmi_encryption" indicates that AES with a 256 bit key in counter mode should be used to protect the data.&lt;br /&gt;&lt;br /&gt;It is important to note that data system metadata items expresses the &lt;b&gt;desired&lt;/b&gt; management behaviour. This is separate from the actual management behaviour. In order to indicate to a client what management behaviours are actually being provided, CDMI provides a matching series of data system metadata items, ending with the suffix "_billed":&lt;br /&gt;&lt;pre name="code" class="js"&gt;"cdmi_data_redundancy_billed": "2",&lt;br /&gt;"cdmi_immediate_redundancy_billed": "2",&lt;br /&gt;"cdmi_infrastructure_redundancy_billed": "1",&lt;br /&gt;"cdmi_geographic_placement_billed": [&lt;br /&gt;    "US"&lt;br /&gt;],&lt;br /&gt;"cdmi_encryption_billed": "AES_256_CTR",&lt;/pre&gt;In this case, the data system metadata is specifying that the storage system is able to meet the requested level of redundancy, the requested immediate redundancy, but is not able to provide the requested infrastructure redundancy, or the requested encryption method. This allows a client to discover if the requested data system services are being provided.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Data Management Summary&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;So, putting this all together, data system metadata is specified or inherited from parent containers. The data system metadata specified the desired data system services for stored content. The system tries to accomplish the requested data system services, and indicates what services are actually being provided in the corresponding billed data system metadata items.&lt;br /&gt;&lt;br /&gt;All of the data system metadata items specified for the "Documents" container will also be applied to the child "Financials" container, unless overridden by a data system metadata value specified there. For example, if the "Financials" container has the following data system metadata:&lt;br /&gt;&lt;pre name="code" class="js"&gt;"cdmi_data_redundancy": "3",&lt;/pre&gt;Only the redundency is changed — all other items are inherited from the parent. Thus, the billed metadata values would be:&lt;br /&gt;&lt;pre name="code" class="js"&gt;"cdmi_data_redundancy_billed": "3",&lt;br /&gt;"cdmi_immediate_redundancy_billed": "2",&lt;br /&gt;"cdmi_infrastructure_redundancy_billed": "1",&lt;br /&gt;"cdmi_geographic_placement_billed": [&lt;br /&gt;    "US"&lt;br /&gt;],&lt;br /&gt;"cdmi_encryption_billed": "AES_256_CTR",&lt;/pre&gt;Currently all data system metadata items are inherited, and there is no way to override an inherited value except by specifying a new data system metadata item.&lt;br /&gt;&lt;br /&gt;This concludes a quick overview of the basics of data management and metadata in CDMI. In the next part of the tutorial, we will discuss the various data system metadata services defined in CDMI.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-7448427396236074127?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/7448427396236074127/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=7448427396236074127' title='11 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7448427396236074127'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7448427396236074127'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2010/02/cdmi-tutorial-data-management-part-1.html' title='A CDMI Tutorial - Data Management, Part 1'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_FN1WQhtIvYA/S4Q--l1rcMI/AAAAAAAAADU/mEVbOZagEPY/s72-c/2010-02-23+CDMI+Tutorial+Image.jpg' height='72' width='72'/><thr:total>11</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-175639004955085314</id><published>2010-02-12T13:36:00.001-08:00</published><updated>2010-02-12T13:48:53.117-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>CDMI 1.0 Draft Available</title><content type='html'>All the hard work from the week long meeting of the members of the SNIA Cloud Storage Technical Work Group two weeks ago in San Jose have paid off, and we're proud to announce a final 1.0 draft of the Cloud Data Management Interface for cloud storage.&lt;br /&gt;&lt;br /&gt;The specification can be downloaded from the &lt;a href="http://www.snia.org/tech_activities/publicreview#cloud"&gt;Draft Technical Work for Public Review&lt;/a&gt; section of the SNIA web site, and is labelled CDMI 1.0g.&lt;br /&gt;&lt;br /&gt;This release completes the remaining sections of the specification that were incomplete in the 0.9 draft, and clarifies a number of areas where the spec was unclear. Areas of improvement include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Addition of Retention and Hold for compliance&lt;/li&gt;&lt;li&gt;Clarification of terminology from "Account" to "Domain"&lt;/li&gt;&lt;li&gt;Addition of Hash Data System Metadata&lt;/li&gt;&lt;li&gt;ACL syntax in JSON&lt;/li&gt;&lt;li&gt;Ability to perform cross-domain actions&lt;/li&gt;&lt;li&gt;CDMI Logging Queues&lt;/li&gt;&lt;li&gt;Elimination of JSON ordering constraints&lt;/li&gt;&lt;li&gt;Specification of Encryption Data System Metadata&lt;/li&gt;&lt;li&gt;Additional examples&lt;/li&gt;&lt;/ul&gt;We're very proud of how the specification has  firmed up over the last few months, and it is very impressive that we've been able to go from forming a working group to a 1.0 draft in less than a year.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-175639004955085314?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/175639004955085314/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=175639004955085314' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/175639004955085314'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/175639004955085314'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2010/02/cdmi-10-draft-available.html' title='CDMI 1.0 Draft Available'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-8739703800013999051</id><published>2010-02-02T16:53:00.001-08:00</published><updated>2010-02-02T17:13:04.290-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Security'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><title type='text'>Opaque Clouds and Transparent Clouds</title><content type='html'>There are two emerging usage models for cloud use that I am seeing emerging: Opaque Clouds and Transparent Clouds.&lt;br /&gt;&lt;br /&gt;Opaque Clouds are clouds where users store pre-encrypted data to the cloud, such that the cloud operator has no visibility into the users' data. In this model, the encryption keys are owned and managed by the end user, and the cloud operator is not able to provide any value-added services that require access to the plaintext of the user's data.&lt;br /&gt;&lt;br /&gt;Transparent Clouds are clouds where users submit data to the cloud (which may be encrypted during transmission and/or when stored), but the cloud operator is capable of having access to the users' data. In this model, the cloud operator either manages the encryption, or has access to the key repositories where the users' keys are stored. While transparent clouds can still be secure, there are additional security risks, as the cloud operator fundamentally must have access to the users' plaintext.&lt;br /&gt;&lt;br /&gt;Both of these models have merits and use cases where they make sense. For example, if a first transparent cloud is using a second cloud to provide a second geographic location for data storage, the first cloud may store data into the second cloud in opaque fashion. A second scenario may be where a cloud user stores data to a cloud for the purposes of data sharing with another trusted user, but the cloud itself is not trusted. In this case, the keys would be shared between the two users, but the cloud would be unable to see the data stored.&lt;br /&gt;&lt;br /&gt;Ultimately, I believe that opaque cloud storage will co-exist with transparent clouds, as both can operate concurrently with the same infrastructure. (translucent clouds?) As different use cases determine the security sensitivity of the data, and many of the cloud provided value added services (search, indexing, discovery, data mining, format conversion, etc) are quite compelling, it's going to be an interesting set of trade-offs between cloud security and cloud value.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-8739703800013999051?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/8739703800013999051/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=8739703800013999051' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/8739703800013999051'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/8739703800013999051'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2010/02/opaque-clouds-and-transparent-clouds.html' title='Opaque Clouds and Transparent Clouds'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-7160549045934440860</id><published>2010-01-25T10:28:00.000-08:00</published><updated>2010-01-25T10:35:12.428-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>Wrapping up CDMI</title><content type='html'>If you're in the San Jose area this week, SNIA is holding their Winter Symposium, and we've got a week full of items we're working on to bring the proposed Cloud Data Management Interface (CDMI) to the point where it can be a finalized standard.&lt;br /&gt;&lt;br /&gt;Here's our working agenda:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://groups.google.com/group/snia-cloud/web/cloud-storage-winter-symposium-2010"&gt;http://groups.google.com/group/snia-cloud/web/cloud-storage-winter-symposium-2010&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Some areas of focus include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;XML support - Should only JSON be supported, or should XML be included in the standard?&lt;/li&gt;&lt;li&gt;Audit Queues - Access audit data via CDMI&lt;/li&gt;&lt;li&gt;Transaction Signing - Should S3-style transaction signing be part of the spec?&lt;/li&gt;&lt;li&gt;Named Forks - How best to handle files in NFS/CIFS with multiple data forks.&lt;/li&gt;&lt;li&gt;Retention - Finalizing what we take from XAM for retention data system metadata.&lt;/li&gt;&lt;/ul&gt;If anyone has specific thoughts on these topics and anything else related to CDMI, please don't hesitate to reply to this blog entry, or send me an e-mail. (dslik at bycast.com)&lt;br /&gt;&lt;br /&gt;I'm also available on twitter as &lt;a href="http://twitter.com/dslik"&gt;dslik&lt;/a&gt;, and I'll be watching for tweets about CDMI and sent to myself.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-7160549045934440860?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/7160549045934440860/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=7160549045934440860' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7160549045934440860'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7160549045934440860'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2010/01/wrapping-up-cdmi.html' title='Wrapping up CDMI'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-5057395430437063196</id><published>2009-12-22T10:42:00.000-08:00</published><updated>2010-02-23T15:32:26.579-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>A CDMI Tutorial - Basic Input/Output</title><content type='html'>With the new 0.9 draft release of the &lt;a href="http://www.snia.org/tech_activities/publicreview"&gt;SNIA Cloud Data Management Interface specification&lt;/a&gt; now released, this is a good time to highlight the aspects of the standard that make it well suited for cloud storage, and to review the changes from the previous 0.8 draft version.&lt;br /&gt;&lt;br /&gt;This post is the first in a series on CDMI. Subsequent posts will cover the following areas:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://intotheinfrastructure.blogspot.com/2009/12/cdmi-tutorial-basic-inputoutput.html"&gt;Basic Input/Output&lt;/a&gt; (This post)&lt;/li&gt;&lt;li&gt;&lt;a href="http://intotheinfrastructure.blogspot.com/2010/02/cdmi-tutorial-data-management-part-1.html"&gt;Data Management, Part 1&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Data Management, Part 2&lt;/li&gt;&lt;li&gt;Advanced Input/Output&lt;/li&gt;&lt;li&gt;Cloud-to-Cloud Interactions&lt;/li&gt;&lt;li&gt;Queues and Query&lt;/li&gt;&lt;li&gt;Authentication and Access Control&lt;/li&gt;&lt;li&gt;Billing and Accounting&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;Basic Input/Output&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;While supported, CDMI does not assume or require that clients interact with a storage system using the CDMI for storage and retrieval. CDMI can be used in conjunction with existing cloud protocols (such as Amazon's S3 API) and file system protocols (such as NFS, CIFS and WebDAV). The goal of CDMI is not to replace or even supplant these protocols — it is to provide a standardized access and management method that is independent from these protocols. So while storage and retrieval are important parts of the standard, they are only just one part of the standard. It is even possible to implement a fully complaint CDMI system that does not support the ability to store or retrieve data.&lt;br /&gt;&lt;br /&gt;By not restricting to how data is stored or accessed, but still enabling a consistent and uniform way to access and manage stored data, CDMI enables a client to discover, access and manage stored content regardless of how it was originally stored, and regardless of the underlying storage implementation.&lt;br /&gt;&lt;br /&gt;To get an idea of the basic object model by example, let's review basic storage and retrieval operations using CDMI:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Storing an Object&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The most basic operations in CDMI are object storage and retrieval. Assuming that we have an authenticated session established to a CDMI cloud running at cloud.example.com and we want to store some text, here is the HTTP transaction that would be performed, as described in section 8.2 of the specification:&lt;br /&gt;&lt;pre name="code" class="js"&gt;PUT /hello.txt HTTP/1.1&lt;br /&gt;Host: cloud.example.com&lt;br /&gt;Accept: application/vnd.org.snia.cdmi.dataobject+json&lt;br /&gt;Content-Type: application/vnd.org.snia.cdmi.dataobject+json&lt;br /&gt;X-CDMI-Specification-Version: 1.0&lt;br /&gt;{&lt;br /&gt;    "mimetype" : "text/plain",&lt;br /&gt;    "value" : "Hello Cloud"&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;HTTP/1.1 201 Created&lt;br /&gt;Content-Type: application/vnd.org.snia.cdmi.dataobject+json&lt;br /&gt;X-CDMI-Specification-Version: 1.0&lt;br /&gt;{&lt;br /&gt;    "objectURI" : "/hello.txt",&lt;br /&gt;    "objectID" : "AABwbQAQ810Mpei21pxfzA==",&lt;br /&gt;    "parentURI" : "/",&lt;br /&gt;    "accountURI" : "/cdmi_accounts/default_account/",&lt;br /&gt;    "capabilitiesURI" : "/cdmi_capabilities/dataobject/",&lt;br /&gt;    "completionStatus" : "Complete",&lt;br /&gt;    "mimetype" : "text/plain",&lt;br /&gt;    "metadata" : {&lt;br /&gt;        "cdmi_size" : "11"&lt;br /&gt;    }&lt;br /&gt;}&lt;/pre&gt;Let's look at this transaction in more detail:&lt;br /&gt;&lt;br /&gt;It's standard HTTP, and we specify that we will be submitting a request body in the form of a cdmi.dataobject, and requesting a response body in the form of a cdmi.dataobject. We submit in our request body JSON that includes a mimetype and value field, and receive in response a result (HTTP 201 Created) and a JSON structure that includes information about the newly created cloud object.&lt;br /&gt;&lt;br /&gt;The response body JSON contains the following fields:&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td valign="top" width="150"&gt;Field Name&lt;/td&gt;&lt;td&gt;Field Description&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top"&gt;objectURI&lt;/td&gt;&lt;td&gt;The URI for the newly created object. This can be used to access the object via CDMI, and reflects the organization of the stored data. For example, if a storage system provides CDMI access to a NAS share, the objectURI reflects the file path.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top"&gt;objectID&lt;/td&gt;&lt;td&gt;Every object within a CDMI system has a globally unique identifier that can be used to access the object. Object IDs remain constant for the life of the object, even if modified, renamed or even moved to a different cloud provided by a different vendor.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top"&gt;parentURI&lt;/td&gt;&lt;td&gt;The URI for the parent of the created object. Objects inherit metadata from their parent.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top"&gt;accountURI&lt;/td&gt;&lt;td&gt;The URI for the account that the object was created under. Accounts determine the billing relationship, reporting visibility and basic access permissions.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top"&gt;capabilitiesURI&lt;/td&gt;&lt;td&gt;Every object within a CDMI system has "capabilities", which describe what operations the system is capable of performing on that URI. This allows clients to discover the capabilities of a CDMI storage provider. This URI allows a client to find out what the capabilities of a given object are.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top"&gt;completionStatus&lt;/td&gt;&lt;td&gt;This field indicates if an object has been fully created. When performing operations that take long periods of time, such as serialization/deserialization, copies and storing large objects, this field may indicate that the object is still in progress, and thus is not ready to be accessed.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top"&gt;mimetype&lt;/td&gt;&lt;td&gt;This field indicates the type of the value of an object, as specified at the time of creation.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td valign="top"&gt;metadata&lt;/td&gt;&lt;td&gt;This field includes user-specified metadata (not used in this example) and system generated metadata (one example of object size is shown in this example) related to the object. Metadata is a key part of the CDMI standard, and will be covered in more detail in later tutorials.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;While this is the primary method to create objects in CDMI, there is an even simpler way:&lt;br /&gt;&lt;pre name="code" class="js"&gt;PUT /hello2.txt HTTP/1.1&lt;br /&gt;Host: cloud.example.com&lt;br /&gt;Content-Type: text/plain&lt;br /&gt;Content-Length: 11&lt;br /&gt;&lt;br /&gt;Hello Cloud&lt;br /&gt;HTTP/1.1 201 Created&lt;/pre&gt;This approach, described in section 8.3, uses completely standard HTTP to create a new CDMI object. The mimetype is specified through the Content-Type header, with the tradeoff that no CDMI-specific data is returned in response. The ability to support operations through 100% standard HTTP makes CDMI very easy to use from JavaScript and web environments, as does the use of JSON.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Listing Stored Object&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;As every data object has a parent, CDMI also allows the objects owned by a parent to be listed. This is performed by the following HTTP transaction, as described in section 9.4 of the specification:&lt;br /&gt;&lt;pre name="code" class="js"&gt;GET / HTTP/1.1&lt;br /&gt;Host: cloud.example.com&lt;br /&gt;Content-Type: application/vnd.org.snia.cdmi.object+json&lt;br /&gt;X-CDMI-Specification-Version: 1.0&lt;br /&gt;&lt;br /&gt;HTTP/1.1 200 OK&lt;br /&gt;Content-Type: application/vnd.org.snia.cdmi.container+json&lt;br /&gt;X-CDMI-Specification-Version: 1.0&lt;br /&gt;{&lt;br /&gt;    "objectURI" : "/",&lt;br /&gt;    "objectID" : "AABwbQAQI2MtLCwAVfYSFA==",&lt;br /&gt;    "parentURI" : "/",&lt;br /&gt;    "accountURI" : "/cdmi_accounts/default_account/",&lt;br /&gt;    "capabilitiesURI" : "/cdmi_capabilities/container/",&lt;br /&gt;    "percentageComplete" : "Complete",&lt;br /&gt;    "metadata" : {&lt;br /&gt;        &lt;br /&gt;    },&lt;br /&gt;    "childrenrange" : "1-2",&lt;br /&gt;    "children" : [&lt;br /&gt;        "hello.txt",&lt;br /&gt;        "hello2.txt"&lt;br /&gt;    ]&lt;br /&gt;}&lt;/pre&gt;As a client does not know if a given URI is a data object or a container, it asks for a generic object. In this case, the URI, "/" is a container, so the cloud storage system returns with a response body of type cdmi.container. All common fields shared between data objects and containers have consistent meanings, but the two additional fields, "childrenrange" and "children" provide information about the objects contained by the container.&lt;br /&gt;&lt;br /&gt;Like with the data object example above, there is a simpler way to get a list of the children of an object:&lt;br /&gt;&lt;pre name="code" class="js"&gt;GET /?children HTTP/1.1&lt;br /&gt;Host: cloud.example.com&lt;br /&gt;&lt;br /&gt;HTTP/1.1 200 OK&lt;br /&gt;Content-Type: text/json&lt;br /&gt;{&lt;br /&gt;    "children" : [&lt;br /&gt;        "hello.txt",&lt;br /&gt;        "hello2.txt"&lt;br /&gt;    ]&lt;br /&gt;}&lt;/pre&gt;This approach, as described in section 9.5, illustrates the ability to request specific fields to be returned in the JSON response body, and is easily integrated into AJAX-style javascript for web-based applications.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Retrieving Stored Object&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Retrieving a stored object is as straightforward as listing the children of a container:&lt;br /&gt;&lt;pre name="code" class="js"&gt;GET /hello.txt HTTP/1.1&lt;br /&gt;Host: cloud.example.com&lt;br /&gt;Content-Type: application/vnd.org.snia.cdmi.object+json&lt;br /&gt;X-CDMI-Specification-Version: 1.0&lt;br /&gt;&lt;br /&gt;HTTP/1.1 200 OK&lt;br /&gt;Content-Type: application/vnd.org.snia.cdmi.dataobject+json&lt;br /&gt;X-CDMI-Specification-Version: 1.0&lt;br /&gt;{&lt;br /&gt;    "objectURI" : "/hello.txt",&lt;br /&gt;    "objectID" : "AABwbQAQ810Mpei21pxfzA==",&lt;br /&gt;    "parentURI" : "/",&lt;br /&gt;    "accountURI" : "/cdmi_accounts/default_account/",&lt;br /&gt;    "capabilitiesURI" : "/cdmi_capabilities/dataobject/",&lt;br /&gt;    "completionStatus" : "Complete",&lt;br /&gt;    "mimetype" : "text/plain",&lt;br /&gt;    "metadata" : {&lt;br /&gt;        "cdmi_size" : "11"&lt;br /&gt;    },&lt;br /&gt;    "valuerange" : "0-11",&lt;br /&gt;    "value" : "Hello Cloud"&lt;br /&gt;}&lt;/pre&gt;Like the JSON body when originally creating the data object, the CDMI storage system returns the fields associated with the object. But unlike the original create, it also returns the value range and the value of the object. This is described in more detail in section 8.4.&lt;br /&gt;&lt;br /&gt;If just the value is desired, it can be requested as either a JSON content-type, or as a standard HTTP (non-CDMI transaction):&lt;br /&gt;&lt;pre name="code" class="js"&gt;GET /hello2.txt HTTP/1.1&lt;br /&gt;Host: cloud.example.com&lt;br /&gt;&lt;br /&gt;HTTP/1.1 200 OK&lt;br /&gt;Content-Type: text/plain&lt;br /&gt;Content-Length: 11&lt;br /&gt;&lt;br /&gt;Hello Cloud&lt;/pre&gt;This allows CDMI clouds to be used to act as a standard web server, and enables the intriguing possibility that standard web servers could use CDMI as an upload, management and publishing protocol.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Summary&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This provides a quick overview of the basics of writing, discovering and reading data in a CDMI cloud. In the next tutorial, we will review CDMI's data management functions, which rounds out the core of the proposed protocol.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-5057395430437063196?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/5057395430437063196/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=5057395430437063196' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5057395430437063196'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5057395430437063196'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/12/cdmi-tutorial-basic-inputoutput.html' title='A CDMI Tutorial - Basic Input/Output'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-47384922237589886</id><published>2009-12-09T14:45:00.000-08:00</published><updated>2009-12-09T21:51:30.131-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Design Patterns'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>A Reference Architecture for Cloud Storage</title><content type='html'>Over the last twelve months, the &lt;a href="http://www.snia.org/cloud"&gt;SNIA Cloud Storage Technical Working Group&lt;/a&gt; has been busily defining an industry-wide standard for cloud data storage and management. This standard, the Cloud Data Management Interface (CDMI), is currently &lt;a href="http://www.snia.org/tech_activities/publicreview"&gt;available for public review&lt;/a&gt;, with an updated draft scheduled for release in later in December.&lt;br /&gt;&lt;br /&gt;The below diagram illustrates how the CDMI standard fits into emerging cloud ecosystems:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_FN1WQhtIvYA/SyApNO-juwI/AAAAAAAAADE/_1gavsysorE/s1600-h/cdmi_cloud_interactions.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 260px;" src="http://2.bp.blogspot.com/_FN1WQhtIvYA/SyApNO-juwI/AAAAAAAAADE/_1gavsysorE/s400/cdmi_cloud_interactions.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5413372059434859266" /&gt;&lt;/a&gt;One of the most important aspects to understand about the CDMI standard is that it is not intended to replace existing cloud data access standards. This allows CDMI to be used with existing and new clouds that store data via non-CDMI protocols such as Amazon S3's API and traditional file system protocols such as NFS, CIFS, WebDAV.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;CDMI builds on top of these existing data access paths to bring rich management semantics for data in the cloud, and facilitates emerging cloud use-cases including cloud peering, federation and differentiated services. In addition, CDMI provides standard mechanisms to enable the integration of clouds with other external systems (or clouds) for notification, workflow, audit, billing and authorization purposes.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Reference Architecture for a CDMI-Native Cloud Storage&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In order to illustrate how CDMI can enable to creation of next-generation cloud architectures, I've put together an example reference architecture for a CDMI-based cloud storage system, illustrating the different components and primary data flows that would be found in most cloud implementations that fully support CDMI.&lt;br /&gt;&lt;br /&gt;The below diagram shows a logical representation of the major components of a cloud storage system build around the CDMI standard:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_FN1WQhtIvYA/SyApAHK6__I/AAAAAAAAAC8/3LqjtEFnhK4/s1600-h/cdmi_reference_architecture.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 400px; height: 163px;" src="http://1.bp.blogspot.com/_FN1WQhtIvYA/SyApAHK6__I/AAAAAAAAAC8/3LqjtEFnhK4/s400/cdmi_reference_architecture.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5413371834000932850" /&gt;&lt;/a&gt;Looking at the diagram, we have the following major logical components:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Filesystem Clients and Protocol Gateways&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;CIFS, NFS, FTP, WebDAV, iSCSI, Fibre Channel, FCoE and other standard network file and block protocol clients can be attached to cloud storage via protocol gateways. These protocol gateways translate non-cloud storage protocols into CDMI cloud transactions, and are notified about changes to management metadata in the cloud such that it can export portions of the cloud via these non-cloud protocols.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Existing Cloud Clients and API Protocol Gateways&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Existing cloud clients programmed to use cloud protocols such as Amazon's S3 HTTP API can communicate with a CDMI storage cloud through an API protocol gateway. An API protocol gateway translates these non-CDMI cloud storage protocols into CDMI cloud transactions. When a given non-CDMI cloud protocol supports features not directly supported by CDMI, the API gateway can either map these operations into CDMI operations, or use vendor extensions to CDMI to implement the required functionality.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;CDMI Clients&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Clients that implement the CDMI protocol natively can connect directly to the CDMI storage cloud without having to go through any protocol translation layers. These clients can access the full set of CDMI functionality, and can access and manage content stored by all clients, even if the content was not stored via CDMI.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Low Latency Object Stores&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In almost all implementations, CDMI transactions will be routed to one or more low-latency object stores that are able to quickly satisfy storage and retrieval requests. Multiple low-latency object stores will often be connected together, peered, or federated, to allow for data dispersion, replication and other data management functions. Once safely stored, the cloud will be ready to interpret the data system metadata that determines the optimal placement characteristics desired by the cloud user.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;High Latency Object Stores&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Some vendors may choose to implement higher latency object stores, such as power-managed disk or tape storage tiers.  Stored objects will typically be migrated to these stores from low-latency object stores based on the CDMI data system metadata.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object Catalogue&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In order to keep track of the locations where cloud objects are stored, and the namespace in which the objects are stored, many CDMI implementations will include an object catalogue. This catalogue keeps track of objects across all object stores (and external peered and federated clouds), and provides CDMI functions such as object notification, query and billing. Notification provides updates to internal cloud components and to external systems to enable additional cloud functions such as indexing, e-discovery, classification, object processing, format conversion, workflow and advanced analytics.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Data System Manager&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In a CDMI system, the data system manager is responsible for interpreting the data system metadata specified for objects and containers through the CDMI interface, and performing operations to attempt to satisfy the requests for data dispersion, redundancy, geographic placement, performance, etc. Each time an object is created or modified, the data system manager will be notified, and if the constraints specified are not met, it can further replicate or migrate content across object stores or clouds. The SDSC &lt;a href="http://www.irods.org/"&gt;iRODS&lt;/a&gt; (Integrated Rules-Oriented Data System) is an example of an existing Data System Manager that can play this role within or to federate clouds.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Putting it all Together&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;When you put all of these components together, you get a cloud that fully implements the main components of the CDMI specification, and provides rich cloud functionality. On top of this foundation, a cloud vendor can innovate and create additional value added services, such as integration with computing clouds, cloud virus scanning, QoS monitoring, and much, much more.&lt;br /&gt;&lt;br /&gt;This degree of flexibility demonstrates much of the value that CDMI has to offer the industry, both to cloud users and vendors, and it will be exciting to see these sorts of interoperable architectures emerge over time as CDMI becomes adopted and the cloud marketplace matures.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-47384922237589886?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/47384922237589886/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=47384922237589886' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/47384922237589886'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/47384922237589886'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/12/reference-architecture-for-cloud.html' title='A Reference Architecture for Cloud Storage'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_FN1WQhtIvYA/SyApNO-juwI/AAAAAAAAADE/_1gavsysorE/s72-c/cdmi_cloud_interactions.jpg' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-5955235895994089651</id><published>2009-11-26T17:41:00.000-08:00</published><updated>2009-11-26T17:46:03.697-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Archiving'/><category scheme='http://www.blogger.com/atom/ns#' term='Tape'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Hardware'/><title type='text'>How low can disk go?</title><content type='html'>The capital acquisition cost of disk-based archiving solutions (in cost per terabyte) has dramatically fallen over the last five years. Unfortunately, the rate of reduction in cost is slowing as the cost approaches the raw cost of the disks included with the storage system.&lt;br /&gt;&lt;br /&gt;The four major factors that have driven the reduction of the cost of disk-based archiving are as follows:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Increasing disk capacity (density) for a hard drive of a given price&lt;/li&gt;&lt;li&gt;Transition from the use of enterprise disks to the use of consumer grade SATA disks&lt;/li&gt;&lt;li&gt;Transition from storage interconnection via Fibre Channel to switched SAS and Ethernet&lt;/li&gt;&lt;li&gt;Transition from customer storage controllers to commodity servers and software&lt;/li&gt;&lt;/ul&gt;When analysing the costs of an disk archive, much of these savings come from reducing the cost of the non-disk components. In a large disk archival systems, the system price divided by the number of disks is now approaching the over-the-shelf price of a consumer disk (around $100 per TB). This is an important, because it indicates that most of the cost gains (from factors 2 through 4) are unsustainable in the long term. The closer the cost of the system approaches the cost of the raw storage, the less cost reductions can be achieved.&lt;br /&gt;&lt;br /&gt;As a consequence, the rapid decrease in the cost of disk-based archiving is not a result of the intrinsic reduction in the cost of disk storage, but rather is a reduction in the cost of the overall system. And now that this reduction in cost has already largely occurred, the rate of cost reduction flattens out to more closely approximate the reduction in the cost resulting from increasing disk density.&lt;br /&gt;&lt;br /&gt;This is a critical point to understand when comparing the costs of disk archiving to tape archiving, since many cost projections have made the assumption that this rate of reduction of disk cost will continue into the future.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-5955235895994089651?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/5955235895994089651/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=5955235895994089651' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5955235895994089651'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5955235895994089651'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/11/how-low-can-disk-go.html' title='How low can disk go?'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-8011043562502687896</id><published>2009-11-23T20:18:00.000-08:00</published><updated>2009-11-23T20:25:02.374-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Archiving'/><category scheme='http://www.blogger.com/atom/ns#' term='Tape'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><title type='text'>The Demise of Tape is Overrated</title><content type='html'>Christopher Poelker, in a blog post on ComputerWorld titled, &lt;a href="http://blogs.computerworld.com/15129/is_tape_really_dead"&gt;Is Tape Really Dead&lt;/a&gt;, makes a series of assertions about the superiority of disk over tape. Now to give credit where it is due, he is talking about tape's role in backup, and does conclude that tape still has a role to play. Unfortunately, many of his statements made along the way are simply inaccurate.&lt;br /&gt;&lt;br /&gt;Chris states:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Everyone is aware of the limitations of tape solutions.&lt;br /&gt;&lt;br /&gt;&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;b&gt;Tape&lt;/b&gt;&lt;/td&gt;&lt;td&gt;&lt;b&gt;Disk&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Sequential access&lt;/td&gt;&lt;td&gt;Random access&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Relatively slow&lt;/td&gt;&lt;td&gt;Fast&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Shipped offsite&lt;/td&gt;&lt;td&gt;Electronically vaulted&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Once a day process&lt;/td&gt;&lt;td&gt;Periodic or continuous&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;High operational touch&lt;/td&gt;&lt;td&gt;Automated&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;No-dedupe&lt;/td&gt;&lt;td&gt;Dedupe&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Inexpensive media&lt;/td&gt;&lt;td&gt;More expensive&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/blockquote&gt;&lt;br /&gt;Let's look at each of these in turn:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Sequential Access vs. Random Access&lt;/b&gt;&lt;br /&gt;What Chris is getting at here is seek latency. Both tape and disk provide full random access, only disk is faster at it than tape. However, as hard disk capacities have increased while access bandwidth remains largely constant, from a software architecture standpoint, disks look more and more like tape. This is what is leading to the collapse of RAID 5 as a means to protect data, and, in my opinion, is what will ultimately lead to the death of disk. But more on this in a subsequent blog post.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Relatively slow vs. Fast&lt;/b&gt;&lt;br /&gt;Tape systems, when properly used, can provide extremely high levels of throughput performance, into the 10 Gigabit/sec ranges.&lt;br /&gt;&lt;br /&gt;Individual tape drives can already stream sequentially accessed data faster than most hard disks (120 MBytes/sec), and LTO5 will increase this lead further. When randomly accessing data spread across a tape or disk, the disk will outperform tape due to lower seek latencies. And, of course, if seek latencies are important to you, you should be looking at flash.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Shipped offsite vs. Electronically vaulted&lt;/b&gt;&lt;br /&gt;Disk drives are far more vulnerable to damage than tapes, and simply don't have the flexibility to be able to be shipped around the same way. "Electronic vaulting" often equals expensive WAN data transfers and higher costs for power and equipment.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Once a day process vs. Periodic or continuous&lt;/b&gt;&lt;br /&gt;This would be true if we're talking about a pure tape solution, but tape-based systems have been deployed along side disk in a storage hierarchy for decades.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;High operational touch vs. Automated&lt;/b&gt;&lt;br /&gt;Baby-sitting ten thousand disks isn't low operational touch either. Disks fail continuously, and the wrong swaps can destroy an entire RAID set. Many large archives run with tens of thousands to hundreds of thousands of tapes with very little operator intervention, and modern automated libraries are highly reliable and fault tolerant.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;No dedupe vs. dedupe&lt;/b&gt;&lt;br /&gt;This is false, as dedupe is yet another data compression technique, and applies equally well to tape as it does to disk. Again, the use of dedupe on tape in backup and archiving systems goes back decades.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Inexpensive media vs. More Expensive&lt;/b&gt;&lt;br /&gt;We're thrown a bone here in the cost department, but what isn't considered is in addition to the consumables (disks and tapes), the disk subsystems themselves must be replaced on a far more frequent basis than tape libraries. With tape libraries, the drives can be swapped out and the tapes migrated to newer, higher capacity media without having to replace the entire library.&lt;br /&gt;&lt;br /&gt;This also does not take into account the far higher opex costs of power and heat required for disk-based solutions.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Conclusions&lt;/b&gt;&lt;br /&gt;Despite the near continuous siren call of the "Tape is Dead" crowd, tape provides significant value, often higher value for the dollar than disk, and has a long life before it. And, in many ways, it is spinning disk that should be more worried about its life in the coming decade.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-8011043562502687896?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/8011043562502687896/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=8011043562502687896' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/8011043562502687896'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/8011043562502687896'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/11/demise-of-tape-is-overrated.html' title='The Demise of Tape is Overrated'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-7257918690754654680</id><published>2009-10-22T10:27:00.000-07:00</published><updated>2009-10-22T10:28:42.019-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>The Roles of CDMI</title><content type='html'>The proposed SNIA Cloud Data Management Interface standard (CDMI) is intended to address a wide variety of different use cases, as described in the draft &lt;a href="http://www.snia.org/tech_activities/publicreview/CloudStorageUseCasesv0.5.pdf"&gt;SNIA Cloud Storage Use Cases&lt;/a&gt; document.&lt;br /&gt;&lt;br /&gt;Specifically, CDMI has the several distinct and overlapping design goals:&lt;br /&gt;&lt;br /&gt;1. To act as a cloud management protocol for other non-cloud data access protocols, without providing data access.&lt;br /&gt;&lt;br /&gt;The use case for this mode of use is, for example, the management of a SAN or NAS fabric allowing the provisioning and specification of data system metadata for opaque LUNs which can be dynamically provisioned programatically, for example, in conjunction with OCCI in a cloud computing environment. In this case, there is no data access via CDMI, only management and accounting access.&lt;br /&gt;&lt;br /&gt;2. To act as an cloud management protocol and as a secondary cloud data access protocol to existing cloud and unified storage systems.&lt;br /&gt;&lt;br /&gt;The use case for this mode of use is, for example, is to provide consistent management access to existing unified storage systems that provide block, file and object protocols. For example, a Amazon EC2 instance could be run that exposes an S3 bucket through CDMI, manages Elastic Block Storage LUNs, and implements some of the data system metadata functionality.&lt;br /&gt;&lt;br /&gt;3. To act as a primary cloud management and cloud data access protocol for next-generation cloud storage systems&lt;br /&gt;&lt;br /&gt;The use case for this mode of use, for example, is to enable a superset of cloud data access, manipulation and management functions, and to enable advanced scenarios such as distributed application systems build around cloud storage, cloud federation, peering and delegation. For example, a cloud could provide CDMI access to objects for cloud applications, all while manage file-system views into the object space from remote edge NAS gateways, all while federating together existing enterprise storage and public and private clouds.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-7257918690754654680?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/7257918690754654680/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=7257918690754654680' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7257918690754654680'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7257918690754654680'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/10/roles-of-cdmi.html' title='The Roles of CDMI'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-6848006058003818515</id><published>2009-10-19T19:42:00.000-07:00</published><updated>2009-10-22T10:35:01.231-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Distributed Computing'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>Queues in a Cloud</title><content type='html'>Queues are extremely powerful approaches to data storage and data exchange that are commonly used in application programming, especially when components of a program are executed in parallel. By supporting queues, cloud storage systems provide safe and efficient mechanisms by which programs can persist their state, communicate between components, and interact with other programs and systems.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What is a Queue&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;A Queue is an object with zero or more data "records", such that only the oldest item is accessed at any given time. Once the oldest item is removed, the second-oldest item is accessed, and so forth until the queue is empty. This provides what is known as "first-in, first-out" access to data.&lt;br /&gt;&lt;br /&gt;In a standard object, when you update data, you update the only record in the object. But in a queue, this creates a new record. When you read the value of an object, you get the only record in the object, but in a queue, this returns the value of the oldest record. And when you delete the value of an object, you delete the object, but in a queue, you delete only the oldest record. Thus, unlike basic objects, updating and deleting records from a queue are no longer idempotent operations.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;How are Queues Typically Used&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Queues are typically used in two different situations:&lt;br /&gt;&lt;br /&gt;Internal Work List - Inside a program, the program will store a queue of data that needs to be processed in a given order, such that as resources become available, the items from the queue can be accessed in order. In this use, the queue is being used to store state.&lt;br /&gt;&lt;br /&gt;Inter-Process Communication - Between two programs, a queue will contain a list of items that need to be communicated from one program to another. As the sender program encounters data that needs to be sent to the receiver program, it enqueues the items into the queue, and as the receiver program is able to process data from the sender program, it dequeues items from the queue. In this use, the queue is being used to exchange state.&lt;br /&gt;&lt;br /&gt;As a quick aside, TCP/IP is an example of a queue used for information exchange between two systems. When you write data to a TCP/IP connection, you are enqueueing the data for delivery, and when the destination application reads from the TCP/IP connection, it is dequeuing the data from the network abstraction.&lt;br /&gt;&lt;br /&gt;These uses are best illustrated through an example. Let's say that we are designing a book scanning system that runs in the cloud. We have a series of image scanners that digitize pages of books, and we have an OCR program that converts the images into text that can be indexed for search.&lt;br /&gt;&lt;br /&gt;A simple implementation would be to scan a page, OCR it, then move on to the next page, but that doesn't meet all of the criteria of a cloud solution, as we can't scale it. A good solution would be able to handle multiple scanners, and have multiple instances of the OCR process running in parallel. And that calls for queues.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Handling Multiple Writers&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;A queue can be used to aggregate data values from multiple writers into a single ordered set of values. This behaviour is perfect for logging, job aggregation and any other situation where data originating from multiple entities needs to be consumed by a single entity.&lt;br /&gt;&lt;br /&gt;For example, in our book scanning example, a typical scanning workflow may have tens to hundreds of scanners running concurrently, where each scanned page needs to be run through an OCR (Optical Character Recognition) process in order to index the contents of each scanned page.&lt;br /&gt;&lt;br /&gt;A cloud application built around queues is easily scaled by running multiple parallel instances of the scanning application, with each of the instances using a common cloud for storage. If all of these instances store scanned pages into the same queue, the interface to the OCR process (the queue) is the same regardless of the number of writers.&lt;br /&gt;&lt;br /&gt;The logic for the scanning process would look something like this:&lt;br /&gt;&lt;br /&gt;1. Scan Page&lt;br /&gt;2. Write image as object into cloud&lt;br /&gt;3. Enqueue image object ID into cloud&lt;br /&gt;4. Repeat&lt;br /&gt;&lt;br /&gt;The logic for the OCR process would look something like this:&lt;br /&gt;&lt;br /&gt;1. Read from cloud queue to get the object ID of the next image to process&lt;br /&gt;2. Read the image from cloud using the object ID&lt;br /&gt;3. Perform OCR processing&lt;br /&gt;4. Add OCR text to Index&lt;br /&gt;5. Delete item from cloud queue&lt;br /&gt;6. Repeat&lt;br /&gt;&lt;br /&gt;Now, we can arbitrarily scale the number of scanning processes. But what if our OCR takes too long to handle the combined workload of all of these scanning processes?&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Handling Multiple Readers&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;If our OCR processing takes far longer than the time required to scan a page, we need to be able to increase the performance of the OCR processing. And when we add multiple parallel page scanners, things get even worse. We could try to make a faster OCR engine, but it's far easier to be able to scale out the OCR processing by running multiple instances of the OCR processing in parallel.&lt;br /&gt;&lt;br /&gt;In this case, we need to be able to have multiple readers of the queue. And, we need to ensure the following characteristics of our solution:&lt;br /&gt;&lt;br /&gt;1. No two queue readers will get the same item&lt;br /&gt;2. No items will be lost, even in the event of a queue reader failure&lt;br /&gt;&lt;br /&gt;If you just run two queue readers in parallel, both of these situations can occur. If the two readers run lock-step, they will get the same item. And if they both delete, the second deleted item will be lost.&lt;br /&gt;&lt;br /&gt;Thus, we need to introduce another concept in order to maintain these characteristics — the ability to atomically transfer an item from one queue to another. With this capability, our OCR process can be modified to ensure that even with multiple readers, no data is lost or processed twice:&lt;br /&gt;&lt;br /&gt;1. Transfer item from cloud queue to worker queue&lt;br /&gt;2. Read from worker queue to get the object ID of the next image to process&lt;br /&gt;3. Read the image from cloud using the object ID&lt;br /&gt;4. Perform OCR processing&lt;br /&gt;5. Add OCR text to Index&lt;br /&gt;6. Delete item from worker queue&lt;br /&gt;7. Repeat&lt;br /&gt;&lt;br /&gt;Using this approach, you can arbitrarily scale the number of reader processes without modification. And these workers can enqueue their results into a common queue, allowing the results to be recombined for further processing.&lt;br /&gt;&lt;br /&gt;By leveraging queues, cloud storage allows the creation of complex reliable workflows that can be scaled arbitrarily and dynamically. It also facilitates the creation of loosely coupled reliable systems of interacting programs that work together to solve a given problem in a flexible and scalable manner.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-6848006058003818515?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/6848006058003818515/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=6848006058003818515' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/6848006058003818515'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/6848006058003818515'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/10/queues-in-cloud.html' title='Queues in a Cloud'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-4185140580089252220</id><published>2009-09-30T18:05:00.000-07:00</published><updated>2009-09-30T18:19:43.528-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Archiving'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Reliability'/><title type='text'>The Worst Case Scenario</title><content type='html'>&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;What do you do when the unthinkable happens, and you lose your entire primary site of operations, including your local cloud storage? It could be a flood, a hurricane, or something as commonplace as a building fire, but it's happened — and now all of your infrastructure and storage media associated with your archive is gone.&lt;br /&gt;&lt;br /&gt;But life continues on, and so does your business. Assuming your archive is a critical part of your workflow and corporate assets, how can you protect against these major disruptions?&lt;br /&gt;&lt;br /&gt;To the right is a simplified flowchart to help determine when you should protect against site-loss scenarios, and what options are available to prevent loss of archived data and ensure continued access.&lt;/td&gt;&lt;td valign="top"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_FN1WQhtIvYA/SsQCoVAFS3I/AAAAAAAAAC0/I-QlG_sD3Q4/s1600-h/archive_flowchart.gif"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 216px; height: 400px;" src="http://1.bp.blogspot.com/_FN1WQhtIvYA/SsQCoVAFS3I/AAAAAAAAAC0/I-QlG_sD3Q4/s400/archive_flowchart.gif" border="0" alt="" id="BLOGGER_PHOTO_ID_5387433946097470322" /&gt;&lt;/a&gt;Click on the image to enlarge.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;br /&gt;&lt;b&gt;Single-Site Archives&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;If all of the equipment and storage associated with your archive is located at a single site, and that site is lost, unless you have some form of off-site storage, all data stored in the archive will be lost.&lt;br /&gt;&lt;br /&gt;While a multi-site archive is the best solution to avoid data loss in this scenario, for cost-sensitive data where rapid restoration of data access and operations is not required, a multi-site archive may be more expensive than is warranted. In this case, two common options are to create two tape copies and have them vaulted off-site, or to store a copy of the data into a public cloud provider, such as &lt;a href="http://aws.amazon.com/s3/"&gt;Amazon&lt;/a&gt;, &lt;a href="http://www.ironmountain.com/digital/vfs/"&gt;Iron Mountain&lt;/a&gt;, or &lt;a href="http://www.diomedestorage.com/"&gt;Diomede Storage&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Vaulting to Tape&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Storing to tape, and vaulting the tapes off-site is the least expensive option for protecting archived data. In this scenario, a node would be added to the single-site archive that creates two identical tapes containing recently archived data. These tapes are then sent off-site.&lt;br /&gt;&lt;br /&gt;The ability for an archive to be restored from the "bare metal", from only the data objects being archived, is a very important feature of an archival system. This ensures that even if the control databases are lost, the archived data can still be accessed, and the archive can be rebuilt.&lt;br /&gt;&lt;br /&gt;When planning a tape vaulting approach, the frequency that these tapes are created determines how much data is at risk of loss in the event of the loss of the site. For example, if tapes are generated every week, and take two business days worst case to be taken off-site, then the business can have an exposure window of up to twelve calendar days.&lt;br /&gt;&lt;br /&gt;In the event of the catastrophic loss of the primary site, these tapes would have to be recalled from the vaulting provider, which can take some time, and hardware would have to be re-acquired to rebuild the archive. Don't underestimate the amount of time required to re-order hardware. Often the original equipment is no longer available, so a new archive will need to be specified and ordered, and can take weeks for the servers to be assembled and shipped.&lt;br /&gt;&lt;br /&gt;Once the tapes have arrived and the hardware has been set up, the archive is rebuilt from the data stored on the tapes, and once the last tape is processed, the archive is ready to be used. This is known as a "bare-metal restore".&lt;br /&gt;&lt;br /&gt;Of course, depending on the size of the archive, this could take a very long time. An 1 PB media archive would take 115 days to restore when running at a restore load of 800 Mbits/s, and a 10 billion object e-mail archive would take 115 days to restore when running at a restore load of 1000 objects per second. Rebuild times must be taken into account when planing for archive restoration, and often the cost of downtime associated such a restore is high enough that cloud or multi-site options are considered instead.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Storing to the Cloud (Hybrid Cloud)&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Another option for single-site archives is to store a copy of the archived data to a cloud storage provider. This eliminates the headaches associated with tape management, but introduces the requirement for network connectivity to the provider. In this scenario, for each archived object to be protected off-site, the object is also stored to a cloud provider, which retains the data in the event that a restore is needed.&lt;br /&gt;&lt;br /&gt;Unlike with tape archiving, data is stored immediately to the cloud, limited only by WAN bandwidth. However, this limitation can be substantial, and when bandwidth is insufficient, data will be at risk until the backlog clears. If data is being stored to the archive at 100 Mbits/s, an OC-3 class Internet connection would be required, which can be far more expensive than sending twenty tapes out each week.&lt;br /&gt;&lt;br /&gt;In the event of the catastrophic loss of the primary site, hardware would be re-acquired to rebuild the archive, and network connectivity would need to be acquired to allow connectivity to the cloud. When both of these are operational, the archive would be reconnected to the cloud. This would restore access to the archived data, albeit limited by the network bandwidth. Over time, the on-site archived data can be then restored back over the WAN.&lt;br /&gt;&lt;br /&gt;The primary disadvantages of this approach are the time required to get the hardware and network access for restoring the on-site component of the archive, with the second disadvantage being cost. Fears about unauthorized disclosure of data and loss of control over data are also common, though they can be mitigated with the appropriate use of encryption.&lt;br /&gt;&lt;br /&gt;And often, for less than the price charged by most public cloud providers, one can afford to create a multi-site archive, either across multiple premises owned by the business, or into a second premise hosted by a third party.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Why not just use the Cloud?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Some public cloud providers encourage an architecture that has a minimal on-site presence, and stores all data off-site in the cloud. For some scenarios, this approach works very well, as it minimizes capital costs and minimizes the time required restore hardware and access in the event of a disaster. However, one must have sufficient WAN bandwidth for the expected store and retrieve loads (as opposed to just store-only traffic loads when using the cloud as a storage target), and in the event of a network connectivity failure, access to most or all of the archive can be disrupted.&lt;br /&gt;&lt;br /&gt;This is contrasted with the hybrid cloud model, where the private cloud on-site allows continued access to the data even during WAN failures, and the public cloud is used as a low-cost data storage target.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Multi-Site Archives&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;When continuance of business operations are important, or archival data must be accessible across multiple sites, the archive can be extended to span multiple sites. This involves several considerations, including:&lt;br /&gt;&lt;div&gt;&lt;ol&gt;&lt;li&gt;What data is created in one site and accessed in another?&lt;/li&gt;&lt;li&gt;What data should be replicated to other sites for protection?&lt;/li&gt;&lt;li&gt;What data should be replicated to other sites for performance?&lt;/li&gt;&lt;li&gt;In the event of a site loss scenario, what will be the additional load placed on other sites?&lt;/li&gt;&lt;/ol&gt;Such multi-site archives are very flexible, and allow seamless continuance of operations even in the event of major and catastrophic failures that affect one or more sites. While this is obviously the best solution from a business continunace standpoint, it is also the most expensive, as you must duplicate your entire archive infrastructure in multiple sites, and provide sufficient WAN bandwidth for cross-site data replication.&lt;br /&gt;&lt;br /&gt;Of course, one can also deploy systems like Bycast's StorageGRID to provide a mixture of the above described approaches, using policies to determine which archived content is stored locally, vaulted to tape, stored in a public cloud, and replicated across multiple sites. This flexibility allows the value to the data to be mapped to the cost of the storage, and leverages a common infrastructure for all levels of protection required.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-4185140580089252220?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/4185140580089252220/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=4185140580089252220' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4185140580089252220'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4185140580089252220'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/09/worst-case-scenario.html' title='The Worst Case Scenario'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_FN1WQhtIvYA/SsQCoVAFS3I/AAAAAAAAAC0/I-QlG_sD3Q4/s72-c/archive_flowchart.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-1643895349359625746</id><published>2009-09-14T13:42:00.000-07:00</published><updated>2009-09-14T13:44:31.114-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Bycast'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>Introducing CDMI</title><content type='html'>Today, the Storage Networking Industry Association (SNIA) publicly released the first draft of the Cloud Data Management Interface (CDMI). The draft standard can be downloaded at the below address:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.snia.org/tech_activities/publicreview"&gt;http://www.snia.org/tech_activities/publicreview&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;I'm very pleased to have been a significant contributor to this standard since the inception of the working group earlier this year. Over the last nine months, we've been able to come a long way towards defining a working standard for cloud storage management, and Bycast is proud to have contributed many best-of-breed capabilities first pioneered in Bycast's StorageGRID HTTP API, used by hundreds of customers worldwide to store and access many dozens of petabytes of data in cloud environments, both public and private.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Why CDMI?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;CDMI provides a standardized method by which data objects and metadata can be stored, accessed and managed within a cloud environment. It is intended to provide a consistent method for access by applications and end-users' systems, and provide a consistent interface for providers of cloud storage.&lt;br /&gt;&lt;br /&gt;Currently, almost all of the cloud storage providers and vendors use significantly different APIs, which forces cloud application and gateway software vendors to code and test against different APIs, and having to architect their application around the lowest common denominator. CDMI significantly reduces the complexity of development, test and integration for the application vendor, and is specifically designed to be easy to adopt for both cloud providers and application vendors. CDMI can run along side existing cloud protocols, and, as an example, a customer could run a CDMI gateway in an EC2 instance to gain access to their existing Amazon S3 bucket without Amazon having to do any work — a great example of the power of cloud!&lt;br /&gt;&lt;br /&gt;Much like SCSI, FiberChannel and TCP/IP, such industry-wide standards provide many advantages. These range from simple but essential efficiencies, such as standardized interface documentation, conformance and performance testing tools, the creation of a market for value-added tools such as protocol analyzers and developer awareness, libraries and code examples.&lt;br /&gt;&lt;br /&gt;Industry standards also jump-start the network effect, where more applications encourage providers to support the standard, and more providers supporting the standard encourage application vendors to support the standard. Finally, and most excitingly, CDMI increases inter-cloud interoperability, and is a fundamental enabler for advanced emerging cloud models such as federation, peering and delegation, and the emergence of specialized clouds for content delivery, processing and preservation. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Whirlwind Tour of CDMI&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;CDMI stores objects (data and metadata) in named containers that group the objects together. All stored objects are accessed by a web addresses that either contain a path (eg: http://cloud.example.com/myfiles/cdmi.txt) or an object identifier (eg: http://cloud.example.com/objectid/AABwbQAQvmAJSJWUHU3awAAA==).&lt;br /&gt;&lt;br /&gt;CDMI provides a series of RESTful HTTP operations that can be used to access and manipulate a cloud storage system. PUT is used to create and update objects, GET is used to retrieve objects, HEAD is used to retrieve metadata about objects, and DELETE is used to remove objects.&lt;br /&gt;&lt;br /&gt;Data stored in CDMI can be referenced between clouds (where one cloud points to another), copied and moved between clouds, and can be serialized into an export format that can be used to facilitate cloud-to-cloud transfers and customer bulk data transfers. All data-metadata relationships are preserved, and standard metadata is defined to allow a client to specify how the cloud storage system should manage the data. Examples of this "Data System Metadata" include the acceptable levels of latency and the degree of protection through replication.&lt;br /&gt;&lt;br /&gt;In addition to basic objects and containers (similar to file and folder from a file system), CDMI also supports the concept of capabilities, which allow a client to discover what a cloud storage system is capable of doing. CDMI also supports accounts, which provide control and statistics over account security, usage and billing. Finally, CDMI supports queue data storage objects, which enable many exciting new possibilities for cloud storage.&lt;br /&gt;&lt;br /&gt;In fact, queues are important and significant enough that I'll be writing more about them and what they enable in a subsequent blog entry.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Next Steps&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;With CDMI now "out in the wild", this is the point where the standards effort starts to get really interesting. Up to this point, it has been a relatively small group that has been working on the standard, and we've had to make some controversial decisions (such as eliminating locking and versioning from the first release). There's still a lot of work to be done, and as CDMI gets more visibility, we look forward to increased involvement from other players in the industry. Together, we can make this standard even better, and help shape the future of cloud storage.&lt;br /&gt;&lt;br /&gt;So, if you are interested in cloud storage and cloud storage APIs, I would strongly encourage you to take the time to read the CDMI draft documentation, and contribute your thoughts and suggestions.&lt;br /&gt;&lt;br /&gt;We're proud of what we've achieved, and together, we can make it even better.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-1643895349359625746?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/1643895349359625746/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=1643895349359625746' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/1643895349359625746'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/1643895349359625746'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/09/announcing-cdmi.html' title='Introducing CDMI'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-5831474028271131690</id><published>2009-09-10T09:22:00.000-07:00</published><updated>2009-09-10T09:34:04.505-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><category scheme='http://www.blogger.com/atom/ns#' term='CDMI'/><title type='text'>Cloud Computing and Cloud Storage Standards</title><content type='html'>All of our work at the Cloud Storage technical working group in the &lt;a href="http://www.snia.org/"&gt;Storage Networking Industry Association&lt;/a&gt; (SNIA) has been coming together, and we are nearing a public release of the &lt;a href="http://cloud-standards.org/wiki/index.php?title=SNIA_Cloud_Data_Management_Interface_(CDMI)"&gt;Cloud Data Management Interface&lt;/a&gt; (CDMI).&lt;br /&gt;&lt;br /&gt;We've also been working with the &lt;a href="http://www.gridforum.org/"&gt;Open Grid Focum&lt;/a&gt; (OGF) on making it such that CDMI can be used in conjunction with the OCCI standard to manage data storage in cloud computing environments, and there are some exciting possibilities of combining CDMI with the &lt;a href="http://www.vmware.com/company/news/releases/vcloud-api-vmworld09.html"&gt;VMWare vCloud standard&lt;/a&gt; that was recently donated as a potential industry standard to the DMTF.&lt;br /&gt;&lt;br /&gt;You can read our joint whitepaper on CDMI and OCCI at the below URI:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://ogf.org/Resources/documents/CloudStorageForCloudComputing.pdf"&gt;http://ogf.org/Resources/documents/CloudStorageForCloudComputing.pdf&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;If any readers have any questions about CDMI and how it facilitates cloud computing, please don't hesitate to comment!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-5831474028271131690?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/5831474028271131690/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=5831474028271131690' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5831474028271131690'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5831474028271131690'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/09/cloud-computing-and-cloud-storage.html' title='Cloud Computing and Cloud Storage Standards'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-2993921813072875086</id><published>2009-08-28T09:09:00.000-07:00</published><updated>2009-08-28T09:50:49.466-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Security'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Cloud Computing needs Cloud Storage</title><content type='html'>I recently spoke on a panel at the &lt;a href="http://www.marcusevans.com/html/eventdetail.asp?eventID=15838&amp;SectorID=17"&gt;Cloud User 2009&lt;/a&gt;, held in San Diego, on the subject of "Navigating the Cloud Vendor Community to Achieve Sped of Migration and Ease of Use". We were originally scheduled for an hour of discussion, but due to a subsequent presenter not showing up, and great interest from the audience, we went a half an hour over schedule.&lt;br /&gt;&lt;br /&gt;My presentation was focused on the need for all IT infrastructure to be capable of supporting cloud computing. Often, people organizations focus on one or two aspects of cloud computing (often virtualization and elastic computing), while not always taking into account the impact, requirements and dependencies on other areas of IT.&lt;br /&gt;&lt;br /&gt;I started out by reviewing a &lt;a href="http://sites.google.com/site/consolidatedreferencemodelv23/Home/service-component-reference-model"&gt;mapping of the U.S. Government's Service Component Reference Model into the Cloud model&lt;/a&gt;. This diagram provides a good example of all the different areas that IT and business organizations need to consider when embarking on a cloud project. This diagram is quite effective to discuss interdependencies between different technologies and IT areas of expertise.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Cloud Storage as a co-requisite for Cloud Computing&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;After introducing the concept of cloud as a holistic IT practice, I spent some time focusing on the specific dependencies that cloud computing has on storage, and how cloud computing drives the need for cloud storage.&lt;br /&gt;&lt;br /&gt;&lt;table cellpadding="3"&gt;&lt;tr&gt;&lt;td&gt;&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;&lt;b&gt;Cloud Computing&lt;br&gt;Drivers&lt;/b&gt;&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;&lt;b&gt;Emerging Storage&lt;br&gt;Needs&lt;/b&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;&lt;b&gt;Self-contained&lt;br&gt;"Packages"&lt;/b&gt;&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;Virtual Appliances,&lt;br&gt;vApps&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;VM Image Mgnt &amp;&lt;br&gt;Object Stores&lt;/td&gt;&lt;tr&gt;&lt;tr&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;&lt;b&gt;Location&lt;br&gt;Independence&lt;/b&gt;&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;Hybrid Clouds,&lt;br&gt;vMotion&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;Distributed &amp;&lt;br&gt;Multi-site Storage&lt;/td&gt;&lt;tr&gt;&lt;tr&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;&lt;b&gt;Loosely Coupled&lt;/b&gt;&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;Dynamic&lt;br&gt;Provisioning&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;Simpler &amp; Less Fragile&lt;br&gt;Interfaces&lt;/td&gt;&lt;tr&gt;&lt;tr&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;&lt;b&gt;Scale-Free&lt;/b&gt;&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;Elastic Scaling,&lt;br&gt;Billing on Usage&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;Tiering &amp; Dynamic&lt;br&gt;Placement&lt;/td&gt;&lt;tr&gt;&lt;tr&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;&lt;b&gt;Shared&lt;/b&gt;&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;Co-Hosting&lt;/td&gt;&lt;td align="center" bgcolor="EFEFEF"&gt;Multi-Tenancy &amp;&lt;br&gt;Storage Security&lt;/td&gt;&lt;tr&gt;&lt;/table&gt;&lt;br /&gt;The above table maps many of the business advantages and approaches for cloud computing to new requirements for storage that emerge as a result. Let's review these in detail:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Self-contained Packages&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;As the environment and resources for a given application or computing problem is packaged up and managed at the VM and application level, new requirements are created around managing these images and associating data with VM sessions and applications to allow them to migrate together, be snapshotted together, managed together, etc.&lt;br /&gt;&lt;br /&gt;Thus, when you move to cloud computing, you need a new way to package up the data along with the applications, and ensure that they are self-contained.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Location Independence&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;These packages and data then must be able to migrate, both within the enterprise (such in DR and Business Continuity applications), and between organizations, when utilizing hybrid and public clouds. Once the stored data is packaged, this becomes easier, but often data movement must be more granular, as some data may need to remain within the organization, or may need to be spread across multiple clouds.&lt;br /&gt;&lt;br /&gt;Thus, when you move to cloud computing, you need to make your data accessible from multiple locations, and ensure that it is consistent, complete and correct.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Loosely Coupled&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;One of the benefits of cloud computing is loose coupling between systems. This allows simple reconfiguration, enables the mixing of applications and infrastructure to quickly create new applications and update existing applications. The resulting collections of services are dynamically provisioned, and often do not involve people. In order to accomplish this, you need to ensure that all of the parts fit together, and can be controlled to dynamically assemble systems programatically.&lt;br /&gt;&lt;br /&gt;Thus, when you move to cloud computing, you need to have simpler and less fragile interfaces to allow storage to be dynamically connected up to storage, as needed, when needed.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Scale-Free&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In order to quickly scale up and down cloud computing environments, one needs to be able to deploy applications and storage in a scale-free manner. Being able to dynamically create a thousand-node compute cluster is not of much use if there is not also storage infrastructure that can scale to support this cluster.&lt;br /&gt;&lt;br /&gt;Thus, when you move to cloud computing, you need to ensure that your storage is also capable of scaling elastically, and is capable of tiering data so that it is available at the right cost and performance. Often, one of the first problems one runs into when deploying an elastic computing infrastructure is mismatches between computing and storage, and the cost of keeping all the data resulting from the cloud computing activities.&lt;br /&gt;&lt;br /&gt;This is important: Cloud computing results in an explosion of data, and this data has to be tiered in order to stay economical.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Shared&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;While less of an issue with private clouds, multi-tenancy and support for multiple users on shared infrastructure is critical for leveraging many of the economic advantages resulting from resource pooling. However, this sharing brings requirements for partitioning and security to prevent unauthorized disclosure of data. When storage is shared, there must be strong assurances that information will continue to be protected.&lt;br /&gt;&lt;br /&gt;Thus, when you move to cloud computing, especially in public and hybrid clouds, the placement and security of stored information must be carefully assessed to ensure that additional risks are not introduced.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-2993921813072875086?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/2993921813072875086/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=2993921813072875086' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/2993921813072875086'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/2993921813072875086'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/08/cloud-computing-needs-cloud-storage.html' title='Cloud Computing needs Cloud Storage'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-8126243147352011820</id><published>2009-08-28T08:40:00.000-07:00</published><updated>2009-08-28T11:45:24.593-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Blog Responses'/><title type='text'>More Industry Alignment on Object Storage</title><content type='html'>In a discussion on one of EMC's blog entries, &lt;a href="http://chucksblog.emc.com/chucks_blog/2009/08/the-future-doesnt-have-a-file-system.html"&gt;The Future Doesn't have a File System&lt;/a&gt;, Paul Carpentier, of Centera fame, reiterated the need for an industry-wide, lightweight web-based standard for object-based storage access.&lt;br /&gt;&lt;br /&gt;His initial thinking was as follows:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;1. Unique identifiers; 128 bit, hex representation proposed&lt;br /&gt;2. Object = immutable [Content + Metadata]; content is free format, metadata is free format, XML recommended&lt;br /&gt;3. Simple access protocol; HTTP proposed; non-normative client libraries optional&lt;br /&gt;4. READ and READ METADATA operation; (READ gets metadata and content)&lt;br /&gt;5. WRITE and DELETE operation&lt;br /&gt;6. Small set of standardized XML policy metadata constructs re service level, compliance, life cycle; TBD&lt;br /&gt;7. Persisted Distributed Hash Table to allow variable identifier mapping; 128 bit to 128 bit; HTTP accessed&lt;/blockquote&gt;&lt;br /&gt;What is interesting is the degree to which this proposal is aligned with the work being done by the Storage Networking Industry Association in it's &lt;a href="http://www.snia.org/cloud/"&gt;Cloud Storage Technical Working Group&lt;/a&gt;. This working group is creating a new standard call the Cloud Data Management Interface, which is intended to provide a standardized method for access and management of cloud data using a light-weight RESTful access method.&lt;br /&gt;&lt;br /&gt;While the draft standard is not quite released to the public, let's take a quick peek at how it compares to Mr. Carpentier thoughts:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;1. Unique identifiers; 128 bit, hex representation proposed&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;SNIA is proposing to use XAM XUIDs for identifiers, which allows vendors to innovate and define how their identifiers are comprised, while still ensuring global uniqueness and the ability for any object ID to be managed by any vendor's system.&lt;br /&gt;&lt;br /&gt;While a basic 128-bit identifier, such as a UUID, is simpler, it does not provide strong guarantees that it will be unique across cloud vendors, and this is critical for emerging cloud models such as cloud migration, federation, peering and interchange. &lt;br /&gt;&lt;br /&gt;&lt;b&gt;2. Object = immutable [Content + Metadata]; content is free format, metadata is free format, XML recommended&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;While some vendors (such as Bycast) will implement the proposed standard by using immutable objects, the standard includes the optional ability to modify both object content and metadata for existing objects, without changing the object identifier.&lt;br /&gt;&lt;br /&gt;Metadata will include both user-generated items and system-generated items, and will be represented using XML or JSON.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;3. Simple access protocol; HTTP proposed; non-normative client libraries optional&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;SNIA is using RESTful principles and the HTTP protocol as a foundation for the standard, and simplicity is a key design goal. Almost every part of the standard is optional, and the client can discover what parts of the standard are supported by any given implementation.&lt;br /&gt;&lt;br /&gt;Client libraries to provide simplified language mapping is anticipated, but the goal is to enable full use using standard HTTP libraries.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;4. READ and READ METADATA operation; (READ gets metadata and content)&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The HTTP GET and HEAD operation map to these functions.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;5. WRITE and DELETE operation&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The HTTP PUT and DELETE operations map to these functions.&lt;br /&gt;&lt;br /&gt;If a cloud does not support mutable objects, then the cloud storage provider can indicate this to a client via the capabilities discovery interface, and any attempts to modify an existing object would fail.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;6. Small set of standardized XML policy metadata constructs re service level, compliance, life cycle; TBD&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;SNIA is actively working on standardizing a set of "Data System Metadata", which allows a client to specify what level of service that it desires from a cloud. Examples include maximum latency, degree of replication, etc.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;7. Persisted Distributed Hash Table to allow variable identifier mapping; 128 bit to 128 bit; HTTP accessed&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This is outside of what the standard is proposing, but by using the included queue data object functionality, vendors can add functionality such as lookups and transformations. This allows extension by vendors in a standardized way, and allows them to  take advantage of much of the common infrastructure provided by the standard.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;In summary, I would encourage everyone who is interested in cloud storage or in the industry to take a look at the work that the SNIA is doing, and to get involved!&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-8126243147352011820?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/8126243147352011820/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=8126243147352011820' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/8126243147352011820'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/8126243147352011820'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/08/more-industry-alignment-on-object.html' title='More Industry Alignment on Object Storage'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-5311454232560636028</id><published>2009-08-04T09:42:00.000-07:00</published><updated>2009-08-04T09:45:06.579-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Bycast'/><title type='text'>Dictionary of a Cloud Standard</title><content type='html'>Any cloud storage standard requires the consideration of many interrelated functional areas. Below is a comprehensive list of all the areas that must be considered in the standardization process, and any standard must choose the scope and degree to which these items are addressed.&lt;br /&gt;&lt;br /&gt;As all of these subjects are inter-related, they are listed below in alphabetical order:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Accounts&lt;/b&gt; – A grouping of stored objects for the purposes of administrative control and billing purposes. Each object is owned by one or more account, and is billed to those accounts. Accounts may have sub-accounts, where content owned by a sub-account is rolled up to a higher-level account.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Accounts (Provisioning)&lt;/b&gt; – The interface by which administrative clients can create new accounts, modify account characteristics or remove accounts.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Audit&lt;/b&gt; – Detailed records of accesses and state changes to the storage system used for troubleshooting, forensic analysis and security analysis. Audit information should be accessible based on partition, account, client, audit types and other filters, ideally through a cloud API.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Client Authentication&lt;/b&gt; – The method by which the credentials presented by a client are verified and mapped to local identities used to determine permissions.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Client Authentication Delegation&lt;/b&gt; – The method of verifying and mapping credentials, where the processing is delegated to an external system or alternate cloud.  (eg, AD, LDAP)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Client Interface Protocols&lt;/b&gt; – The method by which clients are able to access data stored in the cloud. Interface protocols are typically specific to how the client interacts with the data. For example, a file-system client would expect an interface protocol such as NFS or CIFS, where a database client would expect a SQL interface. Clients that interact with objects directly may use protocols such as XAM or RESTful HTTP object access.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Cloud Partitioning&lt;/b&gt; – The ability to take a single cloud and create multiple partitions that act as completely independent clouds while still using the same common cloud infrastructure.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Cloud Peering&lt;/b&gt; – The ability for one cloud to transparently reference and store objects in another cloud such that a client can transparently access content directly from either cloud.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Cloud Federation&lt;/b&gt; – The ability for one cloud to transparently reference and store objects in another cloud such that a client can transparently access content from the primary cloud.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Event Feeds&lt;/b&gt; – A client-accessible queue of events that occur against a subset of content (as defined by a query) that can be used for state synchronization between clouds, billing and other inter-system communication.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Introspection&lt;/b&gt; – The ability for a client to discover what services a given cloud is capable of performing, and what subset of these capabilities the client is allowed to use.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Metadata (User)&lt;/b&gt; – Arbitrary client-named key/value pairs and tag lists specified by a client that are stored along with an object.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Metadata (Data System)&lt;/b&gt; – Standardized named key/value pairs and tag lists specified by a client that indicate to the cloud how the object or objects should be managed. Examples include ACLs, encryption levels, degrees of replication or dispersion, and QoS goals.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Metadata (System)&lt;/b&gt; – Standardized named key/value pairs and tag lists generated by the cloud that indicate to the client properties of the object. examples include last access time, actual degree of replication (as opposed to requested degree of replication), and lock status.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Namespace (Local)&lt;/b&gt; – Each client or set of clients may see a different subset of the global namespace, or a client-specific namespace. Objects within the cloud may reside in one or more namespaces. By sharing objects across namespaces, different faceted views into the cloud can be created, and use cases such as sharing can be enabled.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Namespace (Global)&lt;/b&gt; – An administrator or suitably configured client may see all objects in the cloud within one namespace.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object (Composite)&lt;/b&gt; – The ability for an object to contain other objects such that the collection of objects can be managed and stored together as a single object.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object Identifiers&lt;/b&gt; – Each object stored within the cloud needs a globally unique identifier that stays with the object across its entire lifecycle. Ideally, these identifiers are unique across clouds, and are preserved across clouds, which enables federation, transparent migration and peering.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object Locking&lt;/b&gt; – Clients may wish to be able to lock an object to prevent modification or access by other clients.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object Reference Counting&lt;/b&gt; – Clients may wish to gain a form of a lock on an object that ensures that the object will remain in the cloud. Only when all of these references are released can the object be considered for removal.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object Versioning&lt;/b&gt; – Clients may wish to have historical versions of objects be retained when an object is modified.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object Permissions&lt;/b&gt; – Each object stored has a list of which entity is permitted to perform which action. This is typically called an Access Control List (ACL). These access controls specify basic and administrative operations can be performed.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object Referencing&lt;/b&gt; – The ability to specify that a given entity within a namespace is an object in an alternate location within the namespace, in another namespace, or in another cloud, while allowing transparent client access (see cloud peering and federation).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object Serialization&lt;/b&gt; – The ability for a client to take one or more objects and transform them into a single bitstream that can be used for inter-system interchange.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object Snapshots&lt;/b&gt; – Clients may wish to be able to create snapshots of an object or set of objects, such that the state of the objects at that point in time can be accessed via an alternate location within a namespace.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Query&lt;/b&gt; – The ability to submit a series of criteria (for object content and/or metadata) and have returned (possibly as another static object) the list of objects that match the specified criteria.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Query (Persistent)&lt;/b&gt; – The ability to create queries that run continuously and dynamically update their results over time as the state of the cloud changes.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Usage Statistics (Client)&lt;/b&gt; – The ability to obtain information about how many operations have been performed by a given client over a given timeframe is required for accounting, billing and reporting purposes.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Usage Statistics (Object)&lt;/b&gt; – The ability to obtain information about how many operations have been performed against a given object or set of objects, and the operations that have been required to manage a given object or set of objects over a given timeframe for accounting, billing and reporting purposes.&lt;br /&gt;&lt;br /&gt;These are all aspects that we have considered here at Bycast, and that we are contributing as part of our involvement with the SNIA Cloud Storage Technical Working Group.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-5311454232560636028?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/5311454232560636028/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=5311454232560636028' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5311454232560636028'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5311454232560636028'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/08/dictionary-of-cloud-standard.html' title='Dictionary of a Cloud Standard'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-5588252397840527432</id><published>2009-06-30T16:11:00.000-07:00</published><updated>2009-06-30T16:18:01.995-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Bycast'/><category scheme='http://www.blogger.com/atom/ns#' term='XAM'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Standardizing Cloud Storage</title><content type='html'>As the cloud storage concept matures and an increasing number of service and technology providers emerge in the market, there is a growing recognition of the need to standardize protocols for data access and storage management functions in a cloud storage environment.&lt;br /&gt;&lt;br /&gt;The advantages of an open standard for cloud storage include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Allowing a cloud storage client to interoperate with multiple providers&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Enabling data portability between cloud providers&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Facilitating common documentation, sample code and educational material&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Allowing common test infrastructure and conformance testing&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Reducing development work for cloud clients and providers&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Reducing the complexity of standardized access libraries&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Encouraging the creation of debugging tools for diagnostics, profiling and interaction analysis&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;With cloud computing and cloud storage being such a hot topic at this time, there are multiple initiatives underway attempting to standardize various components and interfaces within the cloud storage stack. And one of the leading initiatives is the &lt;a href="http://groups.google.com/group/snia-cloud/"&gt;cloud storage technical working group&lt;/a&gt; within the Storage Networking Industry Association.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Cloud Storage Reference Model&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In the last six months, we've been working on several deliverables, with the main work product being the creation of a standard reference model for the management and access to cloud storage resources.&lt;br /&gt;&lt;br /&gt;In summary, here are the highlights of the working group's vision of cloud storage:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;An HTTP API&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Management functions and data access are provided via a light-weight HTTP RESTful API. Object, block and database storage APIs co-exist with this core API, facilitating emerging cloud use cases and allowing continued innovation as new applications are moved into the cloud.&lt;br /&gt;&lt;br /&gt;The HTTP API also facilitates discovery and introspection of provided API capabilities, allowing providers to support as little or as much of the API as they wish, and allowing clients to discover which capabilities are provided. This approach allows cloud vendors to provide additional capabilities (such as &lt;a href="http://services.nirvanix.com/ws/Video.asmx?op=Transcode"&gt;Nirvanix's media transcoding&lt;/a&gt; capabilities), and still being compatible with and leveraging the common functions of the API.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Containers and Data&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Any data item stored in the cloud, including simple data streams to XAM objects, iSCSI LUNs, database tables and other data objects, can be accessed directly in a form that facilitates peering and transfer from cloud to cloud. As data items can be grouped together into named "containers", and containers can be nested, transferring aggregations of data items is as easy as transferring a container.&lt;br /&gt;&lt;br /&gt;Likewise, management operations can be performed on containers, reducing the management complexity when compared to managing individual objects, and allowing changes to be performed atomically on sets of objects. Management properties (metadata) of data items can either be explicitly specified for a given data item, or can be inherited from the parent container.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Vision for the Future&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This simple set of principles allows for a powerful, extensible API that spans all classes of storage in the cloud. Bycast is proud to be participating as a primary contributor to this initiative, and I would encourage anyone with interest in this area to take the time to read the currently released documentation and to get involved at the &lt;a href="http://groups.google.com/group/snia-cloud/"&gt;SNIA cloud Google group&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;SNIA is also holding a &lt;a href="http://www.snia.org/events/technicalsymposium/"&gt;summer technical symposium&lt;/a&gt; in July, held in San Jose. At this event, one entire track is dedicated to cloud storage. If you are in the area, don't hesitate to get in touch with the SNIA to find out what it takes to get involved, and join us in this exciting project.&lt;br /&gt;&lt;br /&gt;In the mean time, I'd encourage reading the &lt;a href="http://groups.google.com/group/snia-cloud/web/early-drafts-of-snia-documents"&gt;current draft documentation&lt;/a&gt; that can be downloaded from the SNIA, as we're proud of what we've accomplished so far, and are excited about where we're going.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-5588252397840527432?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/5588252397840527432/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=5588252397840527432' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5588252397840527432'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5588252397840527432'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/06/standardizing-cloud-storage.html' title='Standardizing Cloud Storage'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-6447864788860763344</id><published>2009-06-29T21:48:00.000-07:00</published><updated>2009-06-29T22:10:52.334-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Security'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><title type='text'>Where does Encryption Fit in the Cloud</title><content type='html'>Any analysis of the use of encryption in the cloud always needs to start with a discussion of the threats that the use of encryption technology is designed to reduce. In addition, often encryption alone is not the most important or tricky part of protecting against these threats. A commonly misunderstood aspect of encryption is that it somehow eliminates these risks — encryption just concentrates and separates these risks from the data itself, by moving security to the encryption keys. These keys must then be securely managed and protected against the original threats that encryption was deployed to reduce.&lt;br /&gt;&lt;br /&gt;The appropriate use of encryption in the storage stack depends on the business requirements, risks and storage technologies used. For example, while entire drive or tape encryption may be useful in shipping a disk or tape from one branch office to another, it is not appropriate for multi-user document sharing on a NAS, or in any other scenario where the access granularity is at a different level from that of the physical media.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;The Top Three Threats&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;For the sake of this discussion, let us assume that for cloud storage, the most significant three threats are as follows. While this list is not complete, it does include the most common threats considered in the cloud storage space:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Unauthorized disclosure due to cloud customer operations&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Unauthorized disclosure due to cloud provider operations&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Unauthorized disclosure due to transport eavesdropping&lt;/li&gt;&lt;/ul&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;1. The Insider&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Despite the focus on exotic and headline-grabbing threats to computer security, the most common form of unauthorized data disclosure is from employees or other authorized individuals within the companies or organizations that generate and use the data. Often these employees even have legitimate access to the data, which is then used in an unauthorized manner, or weak access controls are bypassed to gain access to the data.&lt;br /&gt;&lt;br /&gt;In this threat model, encryption within the storage system does not and cannot protect the data. The only approach to protect the data is to store the data in an encrypted format before it is made available to internal end-users. Examples of this include encrypted password-protected PDFs and various digital rights management schemes for media.&lt;br /&gt;&lt;br /&gt;Such systems often require positive verification back to a network server before access is permitted, and thus have the trade-off of being complex, costly, and ultimately easily bypassed by taking screen shots or re-recording the protected content. One can only look at the lack of success of digital rights management systems to stem the piracy of digital media to understand that a determined attacker with legitimate access to the content to be protected is an almost intractable problem.&lt;br /&gt;&lt;br /&gt;Once again, it is worth emphasizing — Encryption will not protect your corporate data against an insider, and from a security risks standpoint, this is the most probable means of loss.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;2. The Provider&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Let us assume that a cloud storage system has been selected, and corporate data is being sent outside of the organization's security perimeter (these same risks are present with an internal, or private cloud, as a result of the above risk category). Once data has been stored in the cloud, there are many opportunities for a cloud provider to inadvertently or deliberately cause unauthorized disclosure of a customer's data. These can range from poorly configured firewalls, unauthorized or compromised devices on internal networks, disgruntled employees, or even bankruptcy of the provider where their assets are sold off, along with customer's data, to the highest bidder.&lt;br /&gt;&lt;br /&gt;With this threat model, the encryption of the customer's data is a good technological countermeasure that can ensure that while sitting at rest within the cloud provider storage equipment, the data cannot be accessed except by the customer.&lt;br /&gt;&lt;br /&gt;Now, an important wrinkle to be aware of is that cloud storage has two different ways which encryption can be architected: Blind, or Transparent.&lt;br /&gt;&lt;br /&gt;In blind cloud storage, data is encrypted at the cloud customer's premises, and the cloud provider has no visibility into the data. They can claim no knowledge of the data being stored, and have no way to access it, since the keys are held only by the customer. While this can be a significant advantage to the cloud provider for liability reasons, it also prevents them from building any value-added services into their cloud that require access to the customer's data, and forces all data accesses to go through customer equipment before the data can be accessed.&lt;br /&gt;&lt;br /&gt;In transparent cloud storage, data can still be encrypted, but the cloud provider must also have access to either the customer's keys or a second set of keys that allow the provider to access the customer's data. This can be used to provide value-added services such as full-content search, retention management, format conversion, and other capabilities that require the ability to read the customer's data.&lt;br /&gt;&lt;br /&gt;From a strict security standpoint, blind cloud storage is more secure. However, with the judicious management of encryption keys, many of the threats mentioned above can be avoided, even if the provider's systems can access the plaintext.&lt;br /&gt;&lt;br /&gt;Ultimately, if an organization is giving anyone their plaintext, or the ability to access their plaintext, they need to be sure that the organization has sufficient operational safeguards to protect against common threats to data security and disclosure.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;3. The Network&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Finally, disclosure during transport is an example of where encryption is virtually mandatory. Any time data is transported across an untrusted or uncontrolled network, such as the Internet, it must be encrypted, and fortunately, there are widely deployed standards, such as TLS, that are commonly used to perform this function. Cloud storage services that uses raw HTTP should only be used if the data being sent is not of concern if disclosed, if the network is completely secured, or if the data is already encrypted (blind cloud storage).&lt;br /&gt;&lt;br /&gt;This touches on some of the issues related to the use of encryption in cloud storage, and as can be imagined, there are significant complexities related to key management that make implementations that balance usability and security quite challenging.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-6447864788860763344?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/6447864788860763344/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=6447864788860763344' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/6447864788860763344'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/6447864788860763344'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/06/where-does-encryption-fit-in-cloud.html' title='Where does Encryption Fit in the Cloud'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-2238602524883678712</id><published>2009-05-29T09:35:00.001-07:00</published><updated>2009-06-30T16:17:01.182-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Archiving'/><category scheme='http://www.blogger.com/atom/ns#' term='Security'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Blog Responses'/><title type='text'>Object Security, Continued</title><content type='html'>Reader yossib left a comment to the previous blog entry, &lt;a href="http://intotheinfrastructure.blogspot.com/2009/05/object-storage-part-5-security.html"&gt;Cloud Storage - Part 5, Security&lt;/a&gt;, that warranted a more detailed response and discussion:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;I enjoyed reading your article, your focus on the issue of user authentication and access control is important as it surely does not get the attention it deserves.&lt;br /&gt;&lt;br /&gt;Do you see the security model and user access management for object storage evolving from current ACLs, Active Directory/LDAP or taking a different direction&lt;br /&gt;&lt;br /&gt;How do you see the concepts of users and groups evolving?&lt;/blockquote&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Authentication&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Authentication of identity is so critical, because it is the foundation of access control, and as you alluded, deserves far more attention than it gets. Fortunately, the rise of a plethora of services on the Internet is forcing the issue of federated identity management, and while systems are not yet mature, there is a strong trend towards common mechanisms by which a user or computer program can have a universal identity that can cross systems.&lt;br /&gt;&lt;br /&gt;Examples of emerging standards include OpenID, and Sun's &lt;a href="http://www.sun.com/software/identity/"&gt;IDM&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;On Active Directory&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Active Directory, while hugely successful and very valuable in a corporate setting, simply was not designed to accommodate the scale that is needed, nor the timeframes over which identities need to persist. As digital data and archives become core to our civilization, we need ways to ensure that the security of digital data can survive hundreds of years, and things that were often disregarded as "edge cases" must come to the front and centre.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Examples include:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;What happens when someone dies?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;What happens when someone gets subpoenaed?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;What about the expiration of statutory rights?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;What if the law determining the length of statutory rights is changed?&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;These and so many more issues make the protection of digital assets a double-edged sword — If we enshrine given restrictions in code, can we change them? And if we can change them, how can we prevent this from being defeating the original point of the protections.&lt;br /&gt;&lt;br /&gt;And this ignores many of the challenges that are emerging from the loss of centralized control of systems. In emerging federated cloud worlds, objects may pass from system to system, both trusted and untrusted, and security must be preserved. Much of the challenges associated with the work done to try to build DRM systems is directly applicable to trusted repositories and archives, and the research tells us that this is a &lt;i&gt;really hard problem&lt;/i&gt;.&lt;br /&gt;&lt;br /&gt;For example, it is still an open debate if it is actually possible to have one user grant a second user access without this enabling that user to grant access to further users. And revoking access can be even more thorny.&lt;br /&gt;&lt;br /&gt;Ultimately, we need to move away from the centrally enforced security models to a more distributed security model where objects can float around in systems that do not need to be trusted, and access is granted based on trust relationships. (An example of this is that you may grant an online search and indexing company the privileges to read your data, based on your trust that they will not disclose your data).&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;ACLs&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;While ACLs have developed a reputation of being far too complex to be manageable, I believe that when tempered with methodologies such as Role Based Access Control, it can be made far simpler for the end user and application developer than it is right now.&lt;br /&gt;&lt;br /&gt;However, ACLs fundamentally are merely advisory guidelines for a "trusted" system that interprets them to restrict access. ACLs need to evolve to the point where you have "grants" for each privilage, that enable you to perform that action. So if I wanted to share an object with you, I would give you a "grant" that gives you the ability to read a given object or set of objects. This grant could be revoked, and I could engineer it in such a way that you couldn't delegate the grant without revealing your own credentials.&lt;br /&gt;&lt;br /&gt;Ultimately, this involves a much more complex multi-actor interaction, and my gut feel is that we can't do this with static objects. This, of course, would mean that revocation of grants could never really be absolute, (unless they expire, but who enforces that, then?) since you can't always ensure that all replicas of a given object are always kept in sync.&lt;br /&gt;&lt;br /&gt;Finally, if these systems grow too complex, they won't work. There is much to be said for simplicity, especially in global-scale systems.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Users and Groups&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is always an interesting discussion — Groups provide such a valuable level of abstraction, but introduce so much complexity. I tend to lean towards abandoning the concept of groups as first class entities. If we just have users, we can create a user that is trusted to act as a delegate on behalf of other users. As long as one user can be granted the authority to delegate privileges to other users, we get the same functionality, and distributed group membership can be re-cast as a trust relationship between the owner and the delegator.&lt;br /&gt;&lt;br /&gt;My feeling is that this is the only model that will scale.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight:bold;"&gt;Much to Consider&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This is just the tip of the proverbial iceberg, and there are so many additional complexities and challenges associated with security. I'd love to continue this discussion, so if you have any questions, comments or ideas, please don't hesitate to comment.&lt;br /&gt;&lt;br /&gt;Also, as I mentioned on my last twitter, there are many other security-related items that I plan to discuss further in a follow-up blog post, covering user identify federation, trust domains, "blind storage", peering, object destruction and more.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-2238602524883678712?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/2238602524883678712/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=2238602524883678712' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/2238602524883678712'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/2238602524883678712'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/05/object-security-continued.html' title='Object Security, Continued'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-3697441339674656465</id><published>2009-05-26T17:32:00.000-07:00</published><updated>2009-05-26T17:59:59.197-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Security'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><title type='text'>Object Storage, Part 5 - Security</title><content type='html'>In the days when storage was directly connected to computers and there was only one user per computer, security was simple — Just physically secure the computer and attached storage. But fast-forwarding to the Internet age, not only is storage networked, but potentially accessible to every user in the enterprise or Internet. As storage migrates from silos hidden behind computing servers to being a first-class peer on computer networks, security rapidly becomes front and centre as a key requirement, not just to protect (deny), but also to facilitate multi-application and user collaboration and sharing (allow).&lt;br /&gt;&lt;br /&gt;As part five of the object storage series of posts, this entry covers the issues related with security in an object storage system. This extends far beyond just simple access controls, such as security issues related to the search functionality discussed in the last entry, &lt;a href="http://intotheinfrastructure.blogspot.com/2009/03/object-storage-query.html"&gt;Object Storage - Query&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Who's that Looking at my Data?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Fundamentally, security is about controlling the flow of information. Like with any storage system, there is information flow out from the system, and information flow into the system (read and write, in the case of a block device). But unlike a block device, which can only restrict read or write operations on a device or block level, an object storage system has a much richer set of information flows that need to be regulated.&lt;br /&gt;&lt;br /&gt;Take query, for example. Information leakage through a query result set or index would be a significant problem, and access controls on each object must extend to the query results. In some environments, even timing matters, as variations in the time required to return a query may allow a user to determine if an object with a given metadata value exists or not, even if they do not have privileges to see that object.&lt;br /&gt;&lt;br /&gt;Like all forms of access control, one first needs to authenticate the entity that is requesting access. Once you have determined who is talking with you, then the system can proceed to the question of what they are allowed to do. Then and only then can you proceed to perform the operation.&lt;br /&gt;&lt;br /&gt;Expressed in English, this takes the following form:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Entity "X" is requesting to perform operation "Y" against object "Z".&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Some examples include,&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Entity "XYZCorp\Archiver" is requesting to perform operation "Modify" against object "8D73F687BA26C1A03F9D8E796A497338/com.bycast.metadata.lastmodifiedtime"&lt;br /&gt;&lt;br /&gt;Entity "XYZCorp\Admin" is requesting to perform operation "Delete" against object "Financial Storage Archive Container"&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Like what we saw with &lt;a href="http://intotheinfrastructure.blogspot.com/2009/03/object-storage-explicit-and-implicit.html"&gt;Implicit and Explicit Policies&lt;/a&gt; for managing storage placement, retention and others, this same model extends to security. A list of who is allowed to perform what operations on a given object can either be explicitly specified for a given object (this is called an ACL, or Access Control List, in the file world), can be implicitly specified for a given group of objects (inherited ACLs), or can be implicitly specified for a given user.&lt;br /&gt;&lt;br /&gt;As one can imagine, these security models can become quite complex, especially when you start combining all three together. For example, if objects are stored into a container that specifies that only one user can access the objects, if the application explicitly specifies that another user can access the object, what is the correct behaviour? Should the explicit specification can override the implicit specification, or should the implicit specification can override the explicit. And both are valid, depending on the use case.&lt;br /&gt;&lt;br /&gt;For example, in the Windows world, the "Backup Operator" must be capable of accessing all objects, regardless of their ACLs. This is an example of an implicit security policy overriding explicit security policies. In other cases, if an application wishes to explicitly share some objects with a second application, but by default, all objects should be managed by an implicit security policy, we have an example of an explicit security policy overriding an implicit security policy.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Policy Contexts&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;As most object storage systems are built around a flat namespace, implicit security policies apply against objects that match a set of metadata criteria. For example, the policy may say "For all objects where "com.bycast.metadata.creator" equals "XYZCorp\dslik" ...".&lt;br /&gt;&lt;br /&gt;When logical containers are supported, they can be implemented as a special metadata item, such as where com.bycast.metadata.container having the value of "/corporate/financial" would express a subcontainer "financial" in a container named "corporate". As implicit policies can include which container they are applying to, this allows security policies to be restricted to a given container, or set of containers.&lt;br /&gt;&lt;br /&gt;Thus, we end up with three different contexts:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Explicit security policies, included with an object, that indicate who can do what.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Implicit security policies, specifying for which objects they apply to, that indicate who can do what.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Implicit security policies, specifying for which users they apply to, that indicate what the users can do what.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;And this leads us to the "What":&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Oh, The Things We Can Do&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;How does the operations that can be performed for object storage compare to file and block storage approaches? Well, in a summary, there's a lot more you can do — Below is a partial list of the actions that one can perform against a stored object:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Create an object&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Destroy an object&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Discover an object's existence&lt;br /&gt;&lt;/li&gt;&lt;li&gt;List an object's contents&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Add a new metadata item to an object&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Remove an existing metadata item from an object&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Read the value of a object metadata item&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Write a value to an object metadata item&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Add a new data stream to an object&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Remove an existing data stream from an object&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Read the value of a object data stream&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Write a value to an object data stream&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Query for the existence of a named metadata item&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Query for the existence of a named data stream&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Query for the contents of a named metadata item&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Query for the contents of a named data stream&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;These actions can becomes even more complex when you consider that some object storage systems allow read-only metadata items and streams, immutable metadata items and streams, or increment-only metadata items.&lt;br /&gt;&lt;br /&gt;Unlike with file systems, which have evolved to a relatively standardized set of operations that are specified in ACLs, the security models for object storage operations is not yet well understood and standardized. Open questions include how privileges are overlaid onto the contents of objects, and how special behaviours, such as increment-only for a retention metadata item are handled.&lt;br /&gt;&lt;br /&gt;This is one of the areas where the SNIA XAM working group has done excellent work, and I would encourage anyone interested the details of how security models map onto object storage to read the &lt;a href="http://www.snia.org/tech_activities/standards/curr_standards/xam/"&gt;XAM Architectural Specification&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-3697441339674656465?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/3697441339674656465/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=3697441339674656465' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3697441339674656465'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3697441339674656465'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/05/object-storage-part-5-security.html' title='Object Storage, Part 5 - Security'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-4666668668817405246</id><published>2009-04-29T11:20:00.000-07:00</published><updated>2009-05-26T16:31:15.969-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Bycast'/><category scheme='http://www.blogger.com/atom/ns#' term='XAM'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><title type='text'>Cloud Storage Standardization, Part 1 - Why a Standard?</title><content type='html'>As cloud storage matures and use increases, there is a strong need for standardized interfaces for performing basic storage operations. Just as the SCSI standard enabled interoperability and facilitated innovation within the directly-attached storage market, the presence of a standard interface for cloud storage will provide many advantages. These include:&lt;ul&gt;&lt;li&gt;Improving quality by allowing standardized conformance testing&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Creating a market for test, validation, profiling, debugging and analysis tools&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Allowing the creation of standardized documentation&lt;/li&gt;&lt;li&gt;Encouraging the publication of articles and books discussing the standard&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Reducing development work required to use cloud storage and to support multiple providers&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Enabling the creation of standardized access libraries&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Reducing customer lock-in and enabling multi-vendor selection&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;At Bycast, we provide a RESTful HTTP API for object storage access, and next month, I will be presenting our HTTP API at the SNIA cloud storage summit. As part of my preparations, I've been reviewing many of the other HTTP storage APIs used by other industry players, such as Amazon, Microsoft, and Nirvanix. There is significant overlap and commonalities between all of these APIs, and I believe that there is a good chance, at least from a technical standpoint, to create a common lightweight HTTP storage access protocol.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-4666668668817405246?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/4666668668817405246/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=4666668668817405246' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4666668668817405246'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4666668668817405246'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/04/cloud-storage-standardization-part-1.html' title='Cloud Storage Standardization, Part 1 - Why a Standard?'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-1319741581905819924</id><published>2009-04-07T23:23:00.000-07:00</published><updated>2009-04-07T23:42:07.126-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Distributed Computing'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Just what is "Cloud Computing"?</title><content type='html'>As cloud computing increases in prominence, the argument over exactly what cloud computing actually is has also grown in proportions. Never before has a buzzword been so truly nebulous.&lt;br /&gt;&lt;br /&gt;So, with the definition in such dispute, this is the perfect time to throw out yet another definition.&lt;br /&gt;&lt;br /&gt;With that said, my definition of cloud computing is: *drum roll*&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;"Distributed location-independent scale-free cooperative agents"&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;"What?", one might say... "That's nothing like the others I've seen." And they would be right. That's exactly the point. This is my &lt;b&gt;technical&lt;/b&gt; definition of cloud computing, and let's pick it apart, piece by piece:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Distributed&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Distributed is part of almost every cloud definition, but even the word "distributed" has different meanings to different people. For me, distributed signifies the presence of a substantial separation between entities. This definition implies several things:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;That there are well-defined entities that are distinct from each other&lt;br /&gt;&lt;/li&gt;&lt;li&gt;That there is a method of communication between entities that is separate from the entities themselves&lt;br /&gt;&lt;/li&gt;&lt;li&gt;That there is some degree of isolation between entities&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;This is broad definition, but one that I would argue that most technical people would agree with. This would mean that a client-server system is a distributed system, as is a telephone exchange or an interacting web server and client. However, a web server alone would not be considered distributed, nor would the Linux kernel.&lt;br /&gt;&lt;br /&gt;This definition of distributed is about logical entities and their interactions, so it does not matter if the entities are all co-resident on a single computing system, or scattered across multiple computing systems. That, in my books, is more related to the the property of location independence.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Location-Independence&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;What separates your typical computer operating system from what is typically known in the academic world as a distributed operating system is the property of location independence. This is what allows the physical location of a given entity (be it a program or state) to not matter to its correct operation. More specifically, location independent entities have the ability to transparently move between locations without affecting system operation.&lt;br /&gt;&lt;br /&gt;Location independence is a key enabling technology. For example, in the Internet, DNS and IP routing provides location independence between Internet clients and servers. This allows the servers to be moved around without the clients having to know their physical location, or that there even was a change.&lt;br /&gt;&lt;br /&gt;When systems are built out of location-independent entities, they can be run and stored on different physical configurations. This lays the foundation for designing scale-free systems.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Scale-Free&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In most system implementations, the chosen algorithms and architectures only work efficiently at a given scale. As the size and scope of a system is increased or decreased, bottlenecks, inefficiencies, and waste result in diminishing returns, and put limits on how big and how small a system can be. Systems designed for portable embedded devices must be designed very differently than systems designed for the worlds largest supercomputers, and rightly so.&lt;br /&gt;&lt;br /&gt;However, a new class of computing systems has emerged that has the property of being scale-free. This means that regardless of the scale of their deployment, they are able to continue to perform their task efficiently. From an algorithms standpoint, this means that the computation required scales linearly as a function of the load on the system.&lt;br /&gt;&lt;br /&gt;Thus, when you have scale-free location-independent systems, you can scale just by adding more servers, networks and storage devices. This is what the all of the largest (at least in terms of computational load) web companies, such as facebook, flickr, twitter and friends, all aspire to.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Cooperative Agents&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Web services counts. SoA counts. In fact, most message-passing distributed systems are based around the concept of cooperating agents working together to solve a problem.&lt;br /&gt;&lt;br /&gt;If the agents aren't cooperating, or they aren't agents, then they're not a cloud to me.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What isn't in my Definition of Cloud&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;It is also interesting to look at what is not in my definition of cloud. The first thing that one may notice when comparing it to most other definitions is the complete lack of business terms. That's because Cloud Computing is a technology, not a business model. There's a reason why previous attempts at cloud-based business models were called "Software as a Service", and "Application Service Providers". That's because these terms describe business models, as opposed to technological models.&lt;br /&gt;&lt;br /&gt;Much of my annoyance with the "Cloud" as a buzzword would go away if marketing and businesses just said that they have a "Cloud business model", which provides Internet-based delivery of their application or service and leveraged cloud computing techniques.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In fact, one could arguably develop and deliver such a business model without using any cloud technology. A good example of this is the credit card processing networks, which is mainframe based, largely centralized, and very specifically designed around a given scale, with an implementation that requires very specific location bindings for entities. Yet, it provides what is arguably from a business standpoint, a cloud service.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-1319741581905819924?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/1319741581905819924/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=1319741581905819924' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/1319741581905819924'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/1319741581905819924'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/04/just-what-is-cloud-computing.html' title='Just what is &quot;Cloud Computing&quot;?'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-2292444150781717478</id><published>2009-03-27T17:50:00.000-07:00</published><updated>2009-03-27T18:00:10.494-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>The Fight over Cloud</title><content type='html'>As far as buzzwords goes, "Cloud" is a pretty good one. &lt;a href="http://intotheinfrastructure.blogspot.com/2008/09/raining-on-cloud.html"&gt;I've complained about this one in the past&lt;/a&gt;, and cloud is a very annoying term to me precisely because of its lack of firm definition, vapidness, and that it means different things to different people.&lt;br /&gt;&lt;br /&gt;Now that the much debated "Cloud Manifesto" has been &lt;a href="http://gevaperry.typepad.com/main/2009/03/the-open-cloud-manifesto-much-ado-about-nothing.html"&gt;leaked to the web&lt;/a&gt;, there's a little more to chew on. The war over the definition of what cloud means has begun.&lt;br /&gt;&lt;br /&gt;Below is my analysis of what annoys me about this document, ignoring for now the politics associated with its creation and distribution.&lt;br /&gt;&lt;br /&gt;The author(s?) of this "manifesto" say that the document "does not intend to define a final taxonomy of cloud computing", yet that is exactly what they have ended up doing. And given that their definition reads more like a advertisement for outsourcing VMs, this is at odds of my personal view of cloud as a general architecture for building distributed systems.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;To me, none of their listed criteria, either independently or in combination, make something "cloud", nor does being "cloud" imply the existence of any of these proposed criteria.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;So, with that said, let's take a more detailed look at these "key characteristics of the cloud":&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Scalability On Demand&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;While this is value that can be offered by a cloud, there are lots of non-cloud systems that provide exactly this, and you can have a cloud that does not provide scalability on demand.&lt;br /&gt;&lt;br /&gt;For example, IBM has been offering mainframe systems with extra processors that you can pay to use, "on demand". I wouldn't classify a &lt;a href="http://www-03.ibm.com/systems/z/hardware/z990/index.html"&gt;zSeries&lt;/a&gt; a cloud.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Streamlining the Data Centre &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;As "streamlining" is an ambiguous word, we'll assume that the authors mean outsourcing or cost reductions. However, not all uses of cloud will result in the reduction of cost (capital or infrastructure) or moving work outside the data centre.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Improving Business Processes&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Technological systems are more often than not orthogonal to business process improvement. One can deploy cloud systems and end up with a worse business process, and one can improve business processes without deploying cloud technology.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Minimizing Startup Costs&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Fractional allocation to reduce the minimal quanta that must be purchased to do useful work is the closest to a acceptable criteria for cloud, but is a VM server a cloud, then? Also, one can deploy a cloud system that does not support fractional allocation.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;All Together Now?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Depending on your definition of what cloud is, you could create a cloud system that provides fixed capacity processing, increases data centre costs and brings additional work into the data centre, makes no changes to business processes, and requires large startup costs.&lt;br /&gt;&lt;br /&gt;Conversely, you could create a system that has variable on-demand capacity, reduces data centre costs, outsources data centre work, improves business processes, and minimizes startup costs, all without it being a cloud.&lt;br /&gt;&lt;br /&gt;Of course, the linchpin of this entire argument is just what a cloud is, and this is why this manifesto matters. It is the first major attempt to put a stick in the sand and say that Cloud is X, Y and Z.&lt;br /&gt;&lt;br /&gt;And that is why is has generated such a storm of controversy. At stake is who gets the first mover advantage in the struggle to define what exactly a cloud is.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-2292444150781717478?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/2292444150781717478/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=2292444150781717478' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/2292444150781717478'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/2292444150781717478'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/03/fight-over-cloud.html' title='The Fight over Cloud'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-87543443359938170</id><published>2009-03-25T17:45:00.000-07:00</published><updated>2009-05-26T16:30:54.345-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='Query'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><title type='text'>Object Storage, Part 4 - Query</title><content type='html'>Being able to store data is of limited use unless there are efficient mechanisms by which data can be located and retrieved. In fact, one can argue that file systems are just special purpose data allocation and query systems. Most users and applications locate and access files through a directory of one sort or another, be it a file system, relational database or a custom index within a proprietary file structure, highlighting the importance of query in data storage.&lt;br /&gt;&lt;br /&gt;As part four of the object storage series of posts, this entry covers the use of metadata to provide rich query capabilities, and how these capabilities enable implicit policies as discussed in the last entry, &lt;a href="http://intotheinfrastructure.blogspot.com/2009/03/object-storage-explicit-and-implicit.html"&gt;Object Storage - Explicit and Implicit Policies&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Where Has That File Gone?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Most of us have experienced the frustration of searching for a file and not being able to find it. But we have it easy compared to how data was stored before the widespread adoption of the file system.&lt;br /&gt;&lt;br /&gt;In block-based storage systems, data is accessed by an address that defines where the data starts, and a length, which determines how much data needs to be read (or written). While this approach is simple and efficient, it is difficult to manage, as you need to keep an external catalogue of where each item is stored.&lt;br /&gt;&lt;br /&gt;File systems simplified the problem for the user and the developer by creating a standard directory that could be used to organize files into hierarchical structures and associate metadata, such as file names and creation dates with each file. With the introduction of the file system, search systems were able to look at the directory and build an index of information such as file names, dates and other metadata.&lt;br /&gt;&lt;br /&gt;So, we progressed from walking the disk to walking the file system directories. We then skipped to reading an index, then reduced the index results by filtering out all entries except the ones that matched our query. WIth full-text indexing of the contents of the files, in addition to the file metadata, now integrated into the operating system, a user can even filter their results to just files that contain specific words and phrases.&lt;br /&gt;&lt;br /&gt;Of course, these impressive improvements in technology have been largely offset by an explosion of files, as hundreds of thousands of files to millions of files are now quite common in home and small office settings, and in large enterprises, tens to hundreds of billions of files can reside on enterprise storage systems.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;But What About the Developer?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Despite these impressive achievements, for software developers, the facilities offered by a file system have not changed significantly since early file systems were created in the 60's and 70's. While search has improved the life for users, developers often are forced to create their own application-specific index of files for searching purposes.&lt;br /&gt;&lt;br /&gt;A good example is to look at two popular applications offered by Apple on the Macintosh platform: iTunes and iPhoto. Both of these applications store each song and photo, respectively, as a file. But as a user, you never see these files — you see a custom user interface that is designed for tasks associated with managing and playing music, and managing and organizing photos.&lt;br /&gt;&lt;br /&gt;When you open these applications, they do not access every single photo or song. They access indexes, which allow queries to be performed quickly to get results to the user interface. Thus, if you want to hear those &lt;a href="http://en.wikipedia.org/wiki/Pachelbel's_Canon"&gt;hits of the 1600's&lt;/a&gt;, or view a slideshow of photos tagged "&lt;a href="http://www.tafoni.com/Welcome.html"&gt;Tafoni&lt;/a&gt;", iTunes and iPhoto is actually doing a query, much like the following SQL:&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;SELECT * WHERE CENTURY == "1600"&lt;br /&gt;&lt;br /&gt;SELECT * WHERE TAG == "tafoni"&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;Specifically, the applications are performing a query for metadata, and the century that a song was composed in, and the tags of a photo are all examples of metadata.&lt;br /&gt;&lt;br /&gt;Unfortunately, a file system only has limited fixed metadata, and doesn't understand or have a way to be extended to include application-specific metadata. But an object storage system does understand metadata, and thus offers developers powerful query features that can dramatically reduce the complexity of development while also increasing the value of metadata interoperability across applications.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What can Object Query Do?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Simply put, an object storage system can do everything that a basic relational database can do, but without needing a schema. Every piece of metadata associated with objects can be queried, and the metadata is arbitrary, defined by applications and end users.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Storing an object with metadata is analogous to a INSERT&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Changing metadata in an object is analogous to an UPDATE&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Deleting metadata or an object is analogous to a DELETE&lt;br /&gt;&lt;/li&gt;&lt;li&gt;And object storage query is analogous to a SELECT&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Because the metadata is an intrinsic part of each stored object, you never have to worry about transactional consistency, or inconsistencies between an index and the actual metadata of the object.&lt;br /&gt;&lt;br /&gt;And, one can visualize implicit policies as just policies performed against a query result. Specify the metadata constraints, perform a query, and apply the policy to the results. (In reality, it is a little more complex, but we'll get to notifications later in the series)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;If iTunes used Object Storage...&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;To illustrate how query in the storage system is of significant value to developers, let's imagine that Apple included an object storage system as part of the Mac OS, and had written iTunes to use object storage instead of using &lt;a href="http://www.sqlite.org/"&gt;SQLite&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;When you first start up iTunes, it remembers the last view you were looking at. A view is the results of a query, so it would issue that query to the storage system for the metadata of all song objects that match the query parameters. This would return the list of metadata that is used to display the list of songs. When a user double-clicked to play a song, iTunes would open the song data from the corresponding object, and start playing.&lt;br /&gt;&lt;br /&gt;There is no need to incorporate a database, no need to create a fixed schema (just add metadata and go!), no need to worry about consistency or corruption, and way less code to manage.&lt;br /&gt;&lt;br /&gt;Most importantly, because the query is done by object storage system, your query performance scales as your storage infrastructure scales. So a query from a million objects on a home PC runs just as fast as a query from a billion objects within an enterprise. And as the object storage system is improved over time, all the applications get faster, for free.&lt;br /&gt;&lt;br /&gt;But easing the load on the software developer isn't the only value. Because the objects are stored in a common storage system, if iTunes desires, it can allow any application to query for the music it manages. So if another application developer wants to each for music, it can construct a query that will return results from iTunes' repository. Controlled access across applications opens up all sorts of opportunities to create systems built around loosely coupling agents accessing and manipulating a shared repository. For example, a format conversion tool could convert MP3 files into AAC files in the background, transparently, and fully interoperate with iTunes.&lt;br /&gt;&lt;br /&gt;Of course, if an application wanted to keep its objects private, that's just a permissions setting. Which is an excellent segue into the next entry — Security in a object storage system.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-87543443359938170?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/87543443359938170/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=87543443359938170' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/87543443359938170'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/87543443359938170'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/03/object-storage-query.html' title='Object Storage, Part 4 - Query'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-4989469047315261280</id><published>2009-03-12T11:03:00.000-07:00</published><updated>2009-03-12T11:07:48.092-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='IBM'/><category scheme='http://www.blogger.com/atom/ns#' term='Distributed Computing'/><category scheme='http://www.blogger.com/atom/ns#' term='Microsoft'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>On Ten Year Trends - Parallel Goes Mainstream</title><content type='html'>Several bloggers have been discussing the current "ten year trends" that are transforming the computing environment, including &lt;a href="http://blogs.netapp.com/dave/2009/03/three-ten-year.html"&gt;Cloud Computing, Virtualization, Flash&lt;/a&gt;, and &lt;a href="http://blog.fosketts.net/2009/03/11/ten-year-trend-mobility/"&gt;Mobility&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;However, there is a deeper trend that while well underway, is so significant that it is often overlooked. That is the forced transition to loosely coupled distributed computing models caused by the knee in increasing compute performance by making individual processors faster.&lt;br /&gt;&lt;br /&gt;Over the next five years, I believe we are going to see a full-fledged transition in hardware systems from tightly coupled single and multi-processor shared memory systems to loosely coupled many-processor systems that communicate via message passing. This trend is already evident in the high-performance computing space, where having hit the limits of individual processor performance, then having hit the limits of shared memory, virtually all new architectures have adopted this approach.&lt;br /&gt;&lt;br /&gt;One of the fundamental drivers of this transition is a new economic model that measures computing power per dollar, or computing power per watt. This model favours many smaller less powerful processors over fewer high-power processors. After all, with Atom and ARM processors costing only a few dollars per core in volume, and performing around the same as a 1 GHz Pentium did just a few years ago, with a fraction of the power and heat dissipation, I predict we will soon see a transition to inexpensive servers where one or two rack units are crammed full of hundreds to thousands of these class of processors, all connected together by what looks like a Ethernet fabric.&lt;br /&gt;&lt;br /&gt;The timing is right — &lt;a href="http://research.microsoft.com/en-us/news/features/ccf-022409.aspx"&gt;Microsoft's talking about it&lt;/a&gt;, &lt;a href="http://www.research.ibm.com/journal/rd/521/team.html"&gt;IBM's already done it&lt;/a&gt;, and the mobile market has brought the component volumes up to the point where the economics are right.&lt;br /&gt;&lt;br /&gt;But five years does not make a ten-year trend: The larger change is that to programming models, languages and mindsets. Very few people in the industry can work comfortably and productively in massively parallel environments, as we have seen in the difficulty in building scalable web-based systems like Twitter. Computer Science still teaches programming and computer architecture around the old model of the "one true core", with parallel and distributed programming taught as a speciality, often at the graduate studies level.&lt;br /&gt;&lt;br /&gt;It is going to take five or more years for these new massively parallel hardware and ways of thinking to filter down to the next generation of programmers who can think parallel, create new tools, libraries and stacks, and then play tens to hundreds of thousands of nodes like an instrument.&lt;br /&gt;&lt;br /&gt;Imagine having a hundred thousand cores on your desktop. Imagine having ten million of them in your data centre. The problems are significant, and exciting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-4989469047315261280?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/4989469047315261280/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=4989469047315261280' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4989469047315261280'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4989469047315261280'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/03/on-ten-year-trends-parallel-goes.html' title='On Ten Year Trends - Parallel Goes Mainstream'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-2005507243076478902</id><published>2009-03-09T17:41:00.000-07:00</published><updated>2009-05-26T16:30:44.393-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Rules'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><title type='text'>Object Storage, Part 3 - Explicit and Implicit Policies</title><content type='html'>Once metadata becomes an intrinsic part of each stored object, application-specified metadata provides a rich vocabulary by which to enable applications to communicate with the underlying storage system and for administrators to manage them.&lt;br /&gt;&lt;br /&gt;As part three of the object storage series of posts, this entry covers the ability to specify explicit and implicit policies that gives both the application and the administrator control over how data is managed, and drives additional value in the storage subsystem. This entry builds on top of the last entry, &lt;a href="http://intotheinfrastructure.blogspot.com/2009/02/object-storage-metadata.html"&gt;Object Storage - Metadata&lt;/a&gt;, which introduced the importance of metadata and why it applies to storage and to object storage in particular.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;More Than Just a Bit Bucket&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;When people first think about storage, they think about bits. And that's fundamentally what storage systems do. They take your bits, keep them, and give them back to you. But if that's all a storage system does, it's pretty dumb, as there is a lot more to what applications need then just storing bits.&lt;br /&gt;&lt;br /&gt;Storage is also more than the storage system and applications — The storage administrator is also an important player in enterprise storage, and is often charged with goals that may or may not agree with the desires of the application.&lt;br /&gt;&lt;br /&gt;So, if storing bits is "Dumb Storage", what is "Intelligent Storage"? Well, an easy example is the below list of many of the things that applications and administrators want to have their storage system do for them:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Index&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Protect&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Share&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Compress&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Replicate&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Distribute&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Archive&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Cache&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Tier&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Version&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;And this list is just the beginning.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Explicit Policies&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In order for these higher-level behaviours that are desired by administrators and applications to be fulfilled, they first need to be communicated to a storage system. And metadata fulfils this role perfectly.&lt;br /&gt;&lt;br /&gt;If an application wishes for a given stored data to be protected such that only that application can access it, it needs only to attach metadata to the object that indicates this intent, and trust that the storage system will honour its request. This agreement between the application and the storage system is the contact of functionality.&lt;br /&gt;&lt;br /&gt;Want multiple copies? Add metadata. Want it shredded on delete? Add metadata. Want index keywords? Add metadata. Etc.&lt;br /&gt;&lt;br /&gt;Thus, through a vocabulary of well-defined metadata that it will honour, the capabilities of a storage system can be advertised to an application. And the sum of this metadata forms an explicit policy, specified by the application to the storage system, as an atomic part of the stored object.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Implicit Policies&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;But this isn't the only way that policies can work. While the application knows best from its perspective, it is only one small and limited part of an enterprise. There are larger forces at work — desires to ensure that data is not lost in a disaster, desires to reduce costs, desires to meet legal obligations, and to manage information over time and space.&lt;br /&gt;&lt;br /&gt;Enter the storage administrator.&lt;br /&gt;&lt;br /&gt;Like with explicit policies, implicit policies are also built around metadata. But instead of having an intent being explicitly stated as metadata directives, implicit policies map an intent to a collection of objects with common characteristics.&lt;br /&gt;&lt;br /&gt;Let's imagine that the storage administrator wishes to ensure that critical financial documents are protected against a site disaster, and are retained for a minimum of ten years. The administrator can create an implicit policy that says:&lt;br /&gt;&lt;br /&gt;For all objects with metadata that indicates it is a financial document, make two copies, one remote, and keep them for ten years.&lt;br /&gt;&lt;br /&gt;Once again, the metadata is key. The metadata might be a path in a file system, or a document type, or the division within an organization. Regardless, the administrator now has a tool to take subsets of stored data, limited only by their imagination and the avaialble metadata, and make things happen.&lt;br /&gt;&lt;br /&gt;And unlike explicit policies, implicit policies can be changed without having to change the metadata of the stored objects. By combining both types of policies, an application can specify metadata (an explicit policy) that selects which implicit policy is to be used. And all of these approaches can be combined: An application may say that this object "Must be high performance", that the retention is "governed by implicit policy named Finance", and say nothing about replication.&lt;br /&gt;&lt;br /&gt;As can be imagined, this approach to storage management is very powerful, and allows the value of storage to be expressed in terms of business values to an organization. In our next entry, we will look at some concrete examples of explicit and implicit policies, and see how these are often implemented in object storage systems.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-2005507243076478902?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/2005507243076478902/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=2005507243076478902' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/2005507243076478902'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/2005507243076478902'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/03/object-storage-explicit-and-implicit.html' title='Object Storage, Part 3 - Explicit and Implicit Policies'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-721431069048231060</id><published>2009-02-24T17:12:00.000-08:00</published><updated>2009-05-26T16:30:13.809-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Metadata'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><title type='text'>Object Storage, Part 2 - Metadata</title><content type='html'>A key aspect of object-based storage is the storage of metadata as an intrinsic part of each stored object. Allowing applications to define and include arbitrary metadata as part of a stored object provides the foundation for enabling many enhanced capabilities for both the application and the storage system.&lt;br /&gt;&lt;br /&gt;As part two of the object storage series of posts, this entry covers the importance of metadata to object storage and builds on top of last week's entry, &lt;a href="http://intotheinfrastructure.blogspot.com/2009/02/on-object-based-storage.html"&gt;On Object-Based Storage&lt;/a&gt;, which introduces Object-based Storage.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What is Metadata, Anyways?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Metadata is one of those unfortunate terms that the computer industry has abused to the point where it now has a wide range of meanings. From a strict definitions standpoint, metadata is "data about data", but in more general use, it refers to any descriptive data.&lt;br /&gt;&lt;br /&gt;For example, a file name is considered to be metadata about the file, as is file creation date, access lists of who is allowed to access the file, and a thumbnail view of how the file would look when printed.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_FN1WQhtIvYA/SaSbBoyUZnI/AAAAAAAAACk/mH2kC9FNBLU/s1600-h/2009-02-24+Simple+Object+Example.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 325px; height: 155px;" src="http://1.bp.blogspot.com/_FN1WQhtIvYA/SaSbBoyUZnI/AAAAAAAAACk/mH2kC9FNBLU/s400/2009-02-24+Simple+Object+Example.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5306536713379931762" /&gt;&lt;/a&gt;&lt;div style="text-align: center;"&gt;Figure 1 - Objects Including Data and Metadata&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Metadata is often blurred together with data, and can be considered to be data depending on the context. This can be illustrated in the below example of a compound object document. For example, an indication of which typeface to use for a paragraph of text is often referred to as metadata associated with the text, as is the location offset and scaling factor describing how an image should be rendered. While this information is indeed metadata to the paragraph and image, respectively, from the context of the document, this information is an intrinsic part of the document that must be present for it to be rendered as the user intended.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_FN1WQhtIvYA/SaSbBokUwfI/AAAAAAAAACc/e9ayxrMIc2k/s1600-h/2009-02-24+Composite+Object+Example.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 390px; height: 260px;" src="http://1.bp.blogspot.com/_FN1WQhtIvYA/SaSbBokUwfI/AAAAAAAAACc/e9ayxrMIc2k/s400/2009-02-24+Composite+Object+Example.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5306536713321234930" /&gt;&lt;/a&gt;&lt;div style="text-align: center;"&gt;Figure 2 - Composite Objects Including Metadata as Data&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;Thus, while in principle, data can stand alone without the associated metadata, in reality, the metadata often provides the context that makes the data usable. (After all, think about how hard it would be to find a file without file names or directories)&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Metadata for Applications&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;For most applications, being able to attach metadata to stored data is a fundamental requirement for structured storage. By storing metadata along side, or intermixed with the data, applications are able to ensure that the data is sufficiently described for manipulation or display to the end user.&lt;br /&gt;&lt;br /&gt;Since there is no universal file format, each application vendor has had to choose between living with the limits of standard file formats, or creating their own proprietary file format. While standardized file formats have emerged over the years, and often include the ability to be extended to include application-defined tags or properties, there is no "universal file format", and more often than not, application developers resort to creating their own format. And even with general formats such as XML, without additional descriptive information, such as a schema (more metadata), the files are not self-describing.&lt;br /&gt;&lt;br /&gt;While object storage does not create a universal file format, it does provide a consistent and standard way for applications to store metadata along with data, in a format that is independent from any specific application. Providing applications with a consistent way package up all of the data and metadata into a single storage object, then commit it atomically to storage provides many advantages over ad-hoc solutions.&lt;br /&gt;&lt;br /&gt;For example, let us use the example of someone who is writing a web-based blogging system. Each blog entry has the body text in HTML (the data), and a series of metadata items associated with the post, such as a title, creation date, posting date, posting status (draft, posted), and an author. A typical design would be to use a database to store the metadata, and store the HTML posts as files. In this implementation, even if the data is stored in the database (always a temptation, but rarely a good idea), databases are intrinsically loosely coupled data stores, and much complexity and fragility ensues.&lt;br /&gt;&lt;br /&gt;Contrast this to an application designed around an object store. For each post, a "Blog Post Object" is created, which includes the post data, and the metadata. The object is committed to storage as an atomic element, where it persists. Each committed object is self-describing, no schema is needed, and the amount of complexity to the application writer is vastly reduced.&lt;br /&gt;&lt;br /&gt;Now one might question how an object store is different from a database, and that is a good question. In fact, one could consider a database a specialized form of an object store, and an object store to be a specialized form of a database. And ultimately, both of these perspectives are correct. The key aspect to keep in mind is where services are being provided — In a database, services are provided to the application by a middle layer that runs on top of a non-intelligent storage system, where with object-based storage, services are provided by the storage system itself. This is a key difference, which will be the basis of much of the remainder of this series of articles.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Metadata for Storage Systems&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Today's storage systems is like having an illiterate librarians managing a building full of pages of paper. People given a page, they put them in a location, people ask for a page at a given location, and they give them back the page. While this works, it's not very intelligent.&lt;br /&gt;&lt;br /&gt;But what if storage could be more? What if you could ask for a book? What if you could ask for all the books about a given topic? Or by a given author? Object storage is our literate librarian, who understands the metadata associated with stored objects.&lt;br /&gt;&lt;br /&gt;When the storage system understands what is being stored, this enables all sorts of capabilities and optimizations that would otherwise not be possible. Now the storage system can provide the ability to search content. Now the storage system can intelligently optimize storage and retrieval performance and latency. And most importantly, now there is a way for the application and the storage system to communicate with each other.&lt;br /&gt;&lt;br /&gt;There are many exciting capabilities that emerge from having richer communication between the application and the storage system, and these are worth describing in more depth. The subsequent entries in this series will discuss these emergent capabilities, including query, placement, protection, permissions, representation, policies, compression and versioning. These will be the subject of the entries to follow.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-721431069048231060?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/721431069048231060/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=721431069048231060' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/721431069048231060'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/721431069048231060'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/02/object-storage-metadata.html' title='Object Storage, Part 2 - Metadata'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_FN1WQhtIvYA/SaSbBoyUZnI/AAAAAAAAACk/mH2kC9FNBLU/s72-c/2009-02-24+Simple+Object+Example.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-5347883565393582535</id><published>2009-02-17T14:39:00.001-08:00</published><updated>2009-02-17T15:02:00.891-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='S3'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Blog Responses'/><title type='text'>Watch for Goats in the Cloud</title><content type='html'>&lt;table&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;George Crump, of &lt;a href="http://www.storage-switzerland.com/"&gt;Storage Switzerland&lt;/a&gt; posted an article titled, &lt;a href="http://www.storage-switzerland.com/Articles/Entries/2009/2/16_Cloud_Storage_Reality.html"&gt;Cloud Storage Reality&lt;/a&gt;, where he talked about the emerging class of "Cloud Storage" solutions. His conclusion: That Cloud Storage is a reality and ready for prime time.&lt;br /&gt;&lt;br /&gt;But is it? Or more specifically, is all that is called "Cloud Storage" ready for prime time.&lt;br /&gt;&lt;br /&gt;George's listing of the key advantages of cloud storage when compared with traditional enterprise storage systems, in dispersion, nodes, scale, granular, ease and self-upgrading are dead-on.&lt;/td&gt;&lt;/td&gt;&lt;td width="200" valign="top" align="center"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://4.bp.blogspot.com/_FN1WQhtIvYA/SZs9Cv2IFmI/AAAAAAAAACM/dol4lF2_e3Q/s1600-h/goats_in_the_cloud.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 200px; height: 241px;" src="http://4.bp.blogspot.com/_FN1WQhtIvYA/SZs9Cv2IFmI/AAAAAAAAACM/dol4lF2_e3Q/s400/goats_in_the_cloud.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5303900103571412578" /&gt;&lt;/a&gt;"Say, What's that mountain goat doing clear up here in&lt;br&gt;this cloud bank?"&lt;/tr&gt;&lt;tr&gt;&lt;td colspan="2"&gt;&lt;br /&gt;Similarly, we agree with his taxonomy of the three different deployment models, Service Only, Software Only and Pre-packaged Cloud.&lt;br /&gt;&lt;br /&gt;But there is a key distinction between different ways that clouds can be deployed that can make the difference between a high-risk failure and a low-risk success:&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Storage Just in the Cloud?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;While cloud storage is a proven architecture, pure Internet-based storage remains risky. Before enterprises will be willing to trust their data and their business to a provider, they first look for industry maturity, stability and reliability. After all, the pure internet-based storage industry is still in early stages of adoption, and one can argue that it already failed once, during the "Storage Utility Provider" craze at the beginning of the decade. Heck, even Enron was getting into that business.&lt;br /&gt;&lt;br /&gt;And enterprise uptime is only half the QoS battle — Even if the remote storage service provider has 100% uptime, access to the provider is limited by the reliability of the Internet networks, and access is restricted by the bandwidth to the Internet.&lt;br /&gt;&lt;br /&gt;After all, despite all the talk of bandwidth being free, the costs of an OC-3 to the Internet still makes most CFO's reach for their chests.&lt;br /&gt;&lt;br /&gt;Then, if you really want to kill pure Internet-based storage, get the lawyers involved...&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What Really Works&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Cloud Storage is production ready and widely deployed, but only in configurations that extend into the customer's data centre. I would wager that virtually all enterprise-class cloud storage deployments include data being stored in the customer's data centre. You see this with profiles of Amazon's S3 customers, and we see this with our customers. This is to be expected, of course, since all private cloud deployments exist primarily within the customer's data centre.&lt;br /&gt;&lt;br /&gt;So, to summarize, where does internet-resident cloud storage work?&lt;br /&gt;&lt;br /&gt;Cloud Storage providing off-site protection copies for data that is also held on-site.&lt;br /&gt;&lt;br /&gt;Cloud Storage providing lower-cost storage for data where high levels of QoS are not required.&lt;br /&gt;&lt;br /&gt;Cloud Storage facilitating data sharing across sites.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-5347883565393582535?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/5347883565393582535/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=5347883565393582535' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5347883565393582535'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5347883565393582535'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/02/watch-for-goats-in-clouds.html' title='Watch for Goats in the Cloud'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_FN1WQhtIvYA/SZs9Cv2IFmI/AAAAAAAAACM/dol4lF2_e3Q/s72-c/goats_in_the_cloud.jpg' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-216161337562044223</id><published>2009-02-15T23:11:00.000-08:00</published><updated>2009-05-26T16:30:27.329-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='XAM'/><category scheme='http://www.blogger.com/atom/ns#' term='Object Storage'/><title type='text'>Object Storage, Part 1 - Introduction</title><content type='html'>Object-Based Storage is an alternate approach to specifying an interface between higher level application programs and storage devices for the purposes of storing digital data. While not commonly used, the many advantages of object-based storage are resulting in increasing adoption, and over the next decade it is expected to become widespread.&lt;br /&gt;&lt;br /&gt;As the first part of a series of blog posts talking about object-based storage, this post introduces object-based storage in the context of other widely used storage interface  technologies, and briefly covers the advantages inherent in object-based storage.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Stream-Based Storage&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In early computing systems, data was stored as series of bits or bytes that could be written or read over time. Examples included ticker tape for bit-streamed data, and punch cards for byte-streamed data. Within a computing system, some of the first persistent storage systems were based around writing data as sequences of magnetic signals on tape or on a rotating drum. These stored values could then be read back in the same order they were written.&lt;br /&gt;&lt;br /&gt;While this approach is still used in storage devices such as tape, for low-latency storage purposes, the main disadvantage of stream-based storage was the time required to access a given piece of information. In order to access a given piece of information, a program had to specify the location within the stream that the information was contained, which required keeping track of many bits of addressing information. Thus, in order to save bits, the locations in the stream were divided into equally sized "blocks" that could be used to refer to locations within the stream.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Block-Based Storage&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;From an interface standpoint, today's hard disks are still conceptually modelled as a long stream of bytes divided into equally sized blocks. All accesses to the storage devices are performed by reading or writing blocks over industry standard protocols such as SCSI and Fibre Channel, which typically specify that each block contains 512 bytes of user-accessible information.&lt;br /&gt;&lt;br /&gt;Under the covers, the hard disk controllers understand that the data isn't actually stored in one long sequential stream of bits, and is stored across multiple platters of spinning discs, and the physical location of a given block is specified by translating the block address into a distance and angle on the surface of the disc. For solid-state storage devices, block addresses are mapped to physical groups of semiconductor devices arranged in two or three-dimensional structures.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;File-based Storage&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Given that application data, be it documents or images or sound data, are often larger than a block and may not fill up blocks completely, a higher-level logical structure maps these documents, typically called files, onto the blocks on disk. This software is typically called a file system, and provides a directory of files, metadata about the files, and information known as "extents", which points to the list of blocks that contain the contents of the file.&lt;br /&gt;&lt;br /&gt;When a file system is layered on top of a block storage device, the user is already being presented with a form of object storage. In this case, there are two types of objects, files and directories. However, this is where the similarities end.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Object-Based Storage&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;In an object-based storage interface, instead of writing a stream of data, or writing blocks of data, or writing a file, an application program writes an object. In interfaces such as &lt;a href="http://en.wikipedia.org/wiki/Xam"&gt;XAM&lt;/a&gt; (eXtensible Access Method) and &lt;a href="http://en.wikipedia.org/wiki/Object_storage_device"&gt;OSD&lt;/a&gt; (Object Storage Device), instead of manipulating files, applications manipulate objects, and can perform a much wider variety of operations on these objects.&lt;br /&gt;&lt;br /&gt;There are several aspects of objects that differentiate them from files:&lt;ul&gt;&lt;li&gt;Objects are compound. While files can also contain multiple different types of data, as is commonly found in an .zip or .tar archive file, objects intrinsically contain multiple different pieces of information together into one package. Metadata and data are all stored with an internal directory that allows each sub-component of the object to be accessed.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Objects are self-describing. Instead of just having the metadata that is associated with a file, such as creation time and file name, objects have arbitrary metadata specified by an application that is specific to the application and data's nature. For example, a sound object can have sampling frequency, gain levels and other sound-specific metadata, while a document may have metadata describing the application that created it, and may even contain a low-resolution preview image of the document.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Objects are self-contained. Unlike a file, the metadata associated with an object is an intrinsic part of the object. When a file is stored on disk or even copied from one location to another, the metadata is not stored as part of the file contents, and may not be transferred along with the file data. With objects, the metadata is an intrinsic part of the object, and cannot be separated without creating a different object.&lt;/li&gt;&lt;/ul&gt;&lt;b&gt;How is This Different?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This isn't new, and these three concepts have been widely used for organizing the contents of files stored on block devices in most software systems. Every Microsoft Word document, Macintosh Resource Fork and Windows Executable is already an object based on this definition. So, why is object storage needed when these concepts are already widely implemented?&lt;br /&gt;&lt;br /&gt;The key to answering this question is looking at the entire storage stack, instead of looking at each part in isolation. In a file-based storage system, if an application wants to load a preview image from a file, the application must first open the file, know how to understand the file (which is typically application-specific), and read the data. At the next layer down, the file system must translate these read requests (which view the file as a stream of data) into block addresses using the file system extents metadata, issue block requests to the storage device, which in turn need to look up the location of the data in the blocks to return it back up to the application.&lt;br /&gt;&lt;br /&gt;With object-based storage, the application tells the storage system direction that it wants data from a specific object, and the request can be sent all the way down to the storage device, which because it knows about the object structure itself, is able to better respond to the request, and process the request more efficiently.&lt;br /&gt;&lt;br /&gt;And because the storage device can understand the structure of objects, it is able to more intelligently and more efficiently handle these application requests.&lt;br /&gt;&lt;br /&gt;This may seem like a subtle distinction, but there are numerous advantages resulting from standardizing object structures and pushing down awareness of object contents to the storage layer, and this additional awareness enables many new capabilities and increased efficiencies. These will be the subject of the entries to follow.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-216161337562044223?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/216161337562044223/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=216161337562044223' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/216161337562044223'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/216161337562044223'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/02/on-object-based-storage.html' title='Object Storage, Part 1 - Introduction'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-4680714108860180811</id><published>2009-01-29T20:52:00.000-08:00</published><updated>2009-01-29T21:06:52.501-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Musings'/><title type='text'>Software Development and Evolutionary Biology</title><content type='html'>I've finally figured out why, over time, open source will become the dominant form of software development:&lt;div&gt;&lt;ul&gt;&lt;li&gt;Open source is, well, open. They can freely share knowledge, experiences, mistakes and achievements without restrictions.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Open source projects cross-pollinate, exchange code and rapidly change over time.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Open source projects tend to have significant diversity, with multiple competing projects that do the same or similar things.&lt;/li&gt;&lt;/ul&gt;From an evolutionary biology standpoint, we have more frequent sharing of genetic code, higher rates of mutations, and a broader diversity. Thus, these projects have a better chance of survival.&lt;br /&gt;&lt;br /&gt;Contrast this to closed source development projects:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Closed source is still open, in that people take their experiences, skills and knowledge with them when they move from company to company. But instead of information being shared on a hourly basis, the sharing of information tends to be on the time scale of years, and the degree of sharing is far lower.&lt;/li&gt;&lt;li&gt;Closed source projects resist "contamination" of their code base, and try to prevent any leakage of their code to the outside world. They also try to do the minimal amount of changes necessary in order to maximize return on investment.&lt;/li&gt;&lt;li&gt;Closed source projects tend to avoid markets where there is significant diversity. Interestingly, when there is competition, the products involved tend to be far better than in areas that are dominated and controlled by one or two vendors.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;As a consequence, closed source projects tend to evolve slowly, become rigid and inflexible, and are unable to adapt to rapidly changing environments.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;When you mix open source and closed source together in an ecosystem, interesting things happen. The first is that the competitive pressure resulting from open source pushes closed source projects to act more like open source projects. Closed source projects also tend to initially be parasitic, using open source without contributing much back.&lt;br /&gt;&lt;br /&gt;There are exceptions to all of these generalizations, but this feels about right to me. It would be fascinating to see the results of research that looks at software development from the viewpoint of evolutionary biology.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-4680714108860180811?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/4680714108860180811/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=4680714108860180811' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4680714108860180811'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4680714108860180811'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/01/software-development-and-evolutionary.html' title='Software Development and Evolutionary Biology'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-2585280616578698760</id><published>2009-01-29T18:25:00.001-08:00</published><updated>2009-01-29T21:07:41.382-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Bycast'/><category scheme='http://www.blogger.com/atom/ns#' term='EMC'/><title type='text'>Jump Starting Off-Site Storage</title><content type='html'>During a discussion about Cloud Storage on the &lt;a href="http://storagemojo.com/2009/01/22/cloud-storage-symposium-impressions/#comments"&gt;StorageMojo&lt;/a&gt; blog, &lt;a href="http://media.seagate.com/center/storage-effect/"&gt;Pete Steege&lt;/a&gt; asked about the challenges of the initial load of data into a cloud storage provider.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;How are storage cloud companies handling the “first backup” issue? Multiple terabytes or petabytes that need to be migrated to the cloud initially?&lt;br /&gt;&lt;br /&gt;The incremental part of the process is a no-brainer.&lt;/blockquote&gt;&lt;br /&gt;One solution that we use at Bycast is to deploy two or more edge servers with attached storage at the customer's premise, and allow them to perform bulk ingest over the REST API (or via CIFS/NFS). When the ingest is complete or nearly complete, the object storage repository on disk can be physically shipped to be integrated into the "cloud", and subsequent transactions can be performed over the network.&lt;br /&gt;&lt;br /&gt;Another approach taken by EMC's Mozy backup service is what they call "data seeding". Here, you purchase a 2 TB USB drive from EMC, store your data onto the drive, then ship it back to EMC in order to get the process going. I couldn't find any current references of this capability on their web site, so this may no longer be a supported feature.&lt;br /&gt;&lt;br /&gt;With such hybrid models, you need the software intelligence to ensure that the data is always accessible via the cloud API, always protected from loss, corruption and unauthorized disclosure, and to ensure that the data is audited to ensure that you know that all data that you ingested locally made it into the cloud.&lt;br /&gt;&lt;br /&gt;Despite the associated complexity, this is a very powerful approach, as one should never underestimate the bandwidth of a 747 full of disks or tapes.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-2585280616578698760?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/2585280616578698760/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=2585280616578698760' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/2585280616578698760'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/2585280616578698760'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/01/jump-starting-off-site-storage.html' title='Jump Starting Off-Site Storage'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-3586338067897310573</id><published>2009-01-23T16:08:00.000-08:00</published><updated>2009-01-23T16:29:39.626-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='S3'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Reliability'/><category scheme='http://www.blogger.com/atom/ns#' term='Bycast'/><title type='text'>When to use S3</title><content type='html'>In response to my previous post, &lt;a href="http://permabit.wordpress.com/"&gt;jeredfloyd&lt;/a&gt; of &lt;a href="http://www.permabit.com/"&gt;Permabit&lt;/a&gt; asked about when S3 would be useful use as storage for our customers.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Do you feel S3 has the reliability and availability for your customers today? I love the concept, but I've so far been scared off by horror stories of downtime. Also, what about security concerns?&lt;/blockquote&gt;&lt;br /&gt;These are good questions, and I'm going to elaborate on these concerns and where we see S3 as providing value to our customer.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Bottom Line&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I wouldn't use or recommend S3 for anything other than a low-grade secondary replica location for redundancy purposes. Having said that, the levels of reliability and accessibility that I've seen are already higher than what my experiences have been with tape libraries.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Bring Your Own Security&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;From a security standpoint, I wouldn't put anything on S3 that hasn't been encrypted and wrapped with an integrity verification layer, as we do in StorageGRID. And if the data is encrypted, there is less of a concern about deleting it if you can't get to it any more. Just throw away the keys.&lt;br /&gt;&lt;br /&gt;As you can't implement secure wipe using their API, even if you overwrite the data, so you would also want to be sure that you're not storing really sensitive information there, even with today's standard encryption algorithms and key strengths.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;S3 Isn't Inexpensive&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;One of the things that I want to emphasize is that based on our analysis of their economics, if you are storing data for long periods of time, it's far cheaper to just add storage nodes with SATA shelves.&lt;br /&gt;&lt;br /&gt;Tape isn't cheaper until you're looking at 50+ TB libraries. For infrequently accessed data and redundancy copies (you need to make more when putting them on tape, since it's not as reliable as disk), it quickly becomes very economical for large capacity deployments.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Despite this, S3 Still Has Value&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Having said this, even with these concerns, I see several situations where S3 support brings real value for our customers:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;If you're really small (less than 50 TB), adding storage capacity is still pretty expensive as a percentage of your yearly budget because our customers typically add in 10TB or larger increments. Using S3 as an overflow pool (keeping one or two copies locally on disk, and using S3 as your second or third copy) lets you defer that purchase for a little while, and when you do make that purchase, you can automatically migrate all the data on S3 off onto your new storage resource.&lt;/li&gt;&lt;li&gt;If it takes you too long to purchase hardware, or your budgetary cycle for capital purchases takes too long, or even just an unexpected load where there just isn't time to provision more storage, you can shift second or third copies off onto S3 to free up space, and expense it to the business as a opex or project cost.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;If you have a short-term storage need, and don't want to invest in hardware yet, just put it off onto S3. It will cost a little more per TB, but since wouldn't be able to amortize the storage costs of in house hardware across the typical three-year lifespan of that hardware, it works out to be cheaper in the end.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;If you're almost full, you've ignored the alarms telling you that you don't have enough space on other nodes to repair your storage redundancy if you loose a node, and you don't have any storage ready to replace a failed storage, S3 would be a good "last resort" option for creating new replicas to restore your desired level of redundancy.&lt;/li&gt;&lt;/ol&gt;So, to summarize, I'd use it for a sort-term storage resource to defer capital costs, a short-term emergency storage resource to keep you going, and for storage of short-term data. And in all cases, I wouldn't have the only copy in the grid on S3.&lt;br /&gt;&lt;br /&gt;Based on these use-cases, it would be of most value to smaller IT shops with smaller systems. As you get into larger archives and storage systems (200+ TB), many of these situations will never come up.&lt;br /&gt;&lt;br /&gt;Regardless of your size, having S3 as a choice as a storage tier gives administrators another tool to handle different situations, and that flexibility can be quite useful. Ultimately, it's up to them to decide if the costs (and bandwidth usage) makes sense for them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-3586338067897310573?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/3586338067897310573/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=3586338067897310573' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3586338067897310573'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3586338067897310573'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/01/when-to-use-s3.html' title='When to use S3'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-1107662182110827828</id><published>2009-01-23T01:07:00.001-08:00</published><updated>2009-01-23T01:25:32.702-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='S3'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Bycast'/><title type='text'>Some Notes on Amazon S3</title><content type='html'>During our recent meetings, there were a fair number of questions and discussions about the economics of public cloud storage providers, such as Amazon's S3 service.&lt;br /&gt;&lt;br /&gt;This &lt;a href="http://news.ycombinator.com/item?id=422225"&gt;YCombinator discussion&lt;/a&gt; has lots of good information about pricing, usage and experiences of some of S3's supporters and detractors. It's well worth reading.&lt;br /&gt;&lt;br /&gt;Interestingly, thanks to a new &lt;a href="http://code.google.com/p/s3fs/wiki/FuseOverAmazon"&gt;S3 user space file-system FUSE module&lt;/a&gt;, Bycast has pretty much everything we need to provide a S3 tier of storage to our StorageGRID customers. Of course, such a capability would need to be productized, which would allow an administrator to have a place to configure the tier and securely enter and store their S3 credentials through our administrative interface, but thanks to the filesystem virtualizing the S3 API, all the hard work is already done.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-1107662182110827828?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/1107662182110827828/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=1107662182110827828' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/1107662182110827828'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/1107662182110827828'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/01/some-notes-on-amazon-s3.html' title='Some Notes on Amazon S3'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-6451363293938967328</id><published>2009-01-23T00:34:00.000-08:00</published><updated>2009-01-23T01:17:06.329-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='S3'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='XAM'/><title type='text'>Cloud Storage Protocol Standardization</title><content type='html'>During one of the panel discussions at the &lt;a href="http://www.snia.org/events/wintersymp2009/cloud"&gt;SNIA Cloud Storage Summit&lt;/a&gt;, the topic of why standards for data exchange and system management would be beneficial for cloud storage. While there are many different advantages, one of the areas that I spent a few minutes talking about was the development efficiency that is realized as a result of standards.&lt;br /&gt;&lt;br /&gt;When a protocol is standardized and adopted multiple vendors as a way to connect systems or subsystems, the following things start to emerge:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;A formal protocol specification&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Web pages describing the protocol&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Books about or with chapters about the protocol&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Example open-source implementations&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Standard interface libraries&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Conformance test suites&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Benchmarking suites&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Protocol analysers and recorders&lt;/li&gt;&lt;/ol&gt;In essence, an ecosystem starts to emerge around the protocol, and many small companies and individuals build expertise and tools that enable the rapid uptake of the protocol.&lt;br /&gt;&lt;br /&gt;It's been my observation that the software developers and architects tend to have a major say in the selection of protocols, especially for subsystem interconnects, and that they tend to choose the protocol that makes their life the easiest. Thus, protocols that have all of these resources widely and inexpensively available quickly become the protocol of choice, resulting in a continued upward spiral of adoption, experience, tools and systems. &lt;br /&gt;&lt;br /&gt;We're starting to see this with XAM, with #1, #4 and #5 already available, and #2, #6 in progress. And I'm sure that somewhere out there, someone's writing a book about XAM, or at least a chapter about it.&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In the cloud storage protocol arena, Amazon's S3 service has such a strong lead in this area with their S3 HTTP protocol that many of these resources have already been built, despite it being a proprietary protocol. While most other cloud storage service providers have built similar HTTP protocols, with the IP ownership restrictions around Amazon's protocol still up in the air, there is a fair bit of uncertainty if their protocol will ever be able to be used with more than S3.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Which leads us back to the need for standardized protocols.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-6451363293938967328?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/6451363293938967328/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=6451363293938967328' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/6451363293938967328'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/6451363293938967328'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/01/cloud-storage-protocol-standardization.html' title='Cloud Storage Protocol Standardization'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-245061134450541541</id><published>2009-01-23T00:29:00.000-08:00</published><updated>2009-01-23T00:52:51.695-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='SNIA'/><category scheme='http://www.blogger.com/atom/ns#' term='Bycast'/><category scheme='http://www.blogger.com/atom/ns#' term='XAM'/><title type='text'>At the SNIA Winter Symposium</title><content type='html'>I've been busy attending the &lt;a href="http://www.snia.org/events/wintersymp2009/"&gt;SNIA Winter Symposium&lt;/a&gt; this week. In addition to the usual XAM workgroup meetings, I've also been participating in the &lt;a href="http://www.snia.org/events/wintersymp2009/cloud"&gt;Cloud Storage Summit&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;On Wednesday, the organizers were gracious enough to let me present a quick talk on Private Storage Clouds, where I covered the economic drivers behind cloud storage, the differences between public and private clouds, talked about how Bycast's StorageGRID software allows the creation of private clouds, and discussed some examples of customers where we are deployed and in production.&lt;br /&gt;&lt;br /&gt;While I didn't have as much time as I would have liked, I was able to cover a majority of the points that I had planned to discuss, and the session was well received.&lt;br /&gt;&lt;br /&gt;It's interesting to see how much surprise there was regarding how our system is being used by customers. Robin Harris of the Data Mobility Group called us the "&lt;a href="http://storagemojo.com/2009/01/22/cloud-storage-symposium-impressions/"&gt;most surprising company&lt;/a&gt;" based on what we have been doing. In contrast to most of the cloud storage deployments, many of our customers are placing mission critical data on our storage system, either for archive or primary storage, and they have no backups outside of the redundancy provided as an intrinsic part of StorageGRID.&lt;br /&gt;&lt;br /&gt;During some of the other sessions, the data being stored on many other cloud deployments was described as being "data that could be lost", or "the garbage dumpster".&lt;br /&gt;&lt;br /&gt;Thanks to an innovative use of WebEx, you can view all of the talks online at the &lt;a href="http://www.snia.org/events/wintersymp2009/cloud"&gt;cloud storage sessions&lt;/a&gt; page. I'd encourage you to have a look at these, as there is quite a lot of interesting material here, especially the views of the analysts from the Thursday sessions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-245061134450541541?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/245061134450541541/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=245061134450541541' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/245061134450541541'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/245061134450541541'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2009/01/at-snia-winter-symposium.html' title='At the SNIA Winter Symposium'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-1857542969196966585</id><published>2008-12-23T05:26:00.000-08:00</published><updated>2008-12-23T05:29:17.514-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='HP'/><category scheme='http://www.blogger.com/atom/ns#' term='ADE'/><category scheme='http://www.blogger.com/atom/ns#' term='Distributed Computing'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Concurrency and Complexity</title><content type='html'>HP's &lt;a href="http://arstechnica.com/guides/other/hp-cloud-computing-interview.ars/1"&gt;Russ Daniels&lt;/a&gt; understands a critical issue facing the next generation of software development:&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;blockquote&gt;"If you think about whatever those skills are that are necessary to be able to program a computer, we don't understand what they are very well, but we know that they're not very widely distributed. So you end up with a relatively small portion of the population who can write a program successfully. Then you start introducing certain kinds of abstractions: if you introduce recursion, you lose some people; if you introduce pointers, you lose some people; if you introduce event-driven models, you lose some people; when you introduce concurrency, you lose virtually everybody.&lt;br /&gt;&lt;br /&gt;So we create these programming languages that support concurrency and then you find that there's a lock and an unlock around every method, because otherwise people hurt themselves. It presents an interesting dilemma—we need to be able to get more stuff done, we need to figure out how to do stuff in parallel, but we're pretty bad at doing that."&lt;/blockquote&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;One of the keys to solving this complexity problem is to make the mechanics of parallelism safe, automatic, and transparent to the programmer. Ultimately, this approach needs to extend far beyond just parallelism, as we see the same failings in the areas of security, fault tolerance, fault recovery, dynamic scaling, and resource economics.&lt;br /&gt;&lt;br /&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;An ADE Example&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;In order to see how this could work, let's take a quick look at an example of how ADE transparently provides concurrency and protection of resources that must be accessed serially:&lt;br /&gt;&lt;br /&gt;Let's imagine that we're tasked with building a simple LAN switch, and we want to provide an IP address resolution for connected Ethernet hosts. If we want to send an IP packet to a given computer, we first need to find what the MAC address of the computer with a given IP address is. This is performed using the Address Resolution Protocol (ARP), which was originally documented in &lt;a href="http://www.ietf.org/rfc/rfc0826.txt?number=826"&gt;IETF RFC 826&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;At a simplified level, the protocol works as follows:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;A host which wants to talk with a specific IP address sends an ARP broadcast request asking for the MAC address that traffic should be addressed.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;The switch intercepts the broadcast, and looks up the IP address in its local cache. If the IP to MAC address mapping is found, it responds to the originally requesting host with the ARP response.&lt;/li&gt;&lt;li&gt;If the mapping is not found in the local cache, the switch broadcasts the ARP request to all other ports.&lt;/li&gt;&lt;li&gt;If the switch receives a ARP response, it adds the IP to MAC mapping to its local cache, and forwards the response to the originally requesting host.&lt;/li&gt;&lt;/ol&gt;This is equivalent to a simple database insert/lookup problem, and in traditional concurrent programming models, errors can easily be introduced if care is not taken to lock the local cache while performing modifications.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In contrast, because ADE handles concurrency by forcing all component interactions to be atomic transactions, no locks are required and the system can be proven to be correct and deadlock free.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In this example, we are going to consider three actors: The ARP Lookup Process, the ARP Cache Process and the ARP Discovery Process. These actors have the following roles:&lt;/div&gt;&lt;div&gt;&lt;ul&gt;&lt;li&gt;The ARP Lookup Process receives ARP lookup requests from the network hosts, asks the ARP Cache Process if the mapping is known, asks the ARP Discovery Process for the mapping if not known, and responds to the requesting host.&lt;/li&gt;&lt;li&gt;The ARP Cache Process looks up mappings in the local cache, and adds mappings when asked.&lt;/li&gt;&lt;li&gt;The ARP Discovery Process sends ARP lookup requests to the network, and when a mapping is discovered, it asks the ARP Cache Process to remember them.&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div&gt;In this model, there would be one instance of the ARP Lookup Process per ARP lookup packet received on the network, one instance of the ARP Cache Process, and one instance ARP Discovery Process per cache miss.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;In a busy network, there could be dozens of ARP requests being processed, and many concurrent requests that may require changes to be made to the ARP cache. Even if ARP discovery is adding a new mappings at the same time that a new request is looking up a mapping, there is no possibility that the cache can become corrupt, because all access to the cache is automatically serialized by virtue of the entities and their messages that are exchanged.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;This is the key aspect of ADE that simplifies the creation of distributed systems. Because concurrency is automatic, when problems are decomposed into collections of interacting actors, any parallelism inherent in the problem will be automatically utilized.&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-1857542969196966585?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/1857542969196966585/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=1857542969196966585' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/1857542969196966585'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/1857542969196966585'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/12/concurrency-and-complexity.html' title='Concurrency and Complexity'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-718033007783094060</id><published>2008-12-15T19:28:00.000-08:00</published><updated>2008-12-15T19:30:41.501-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='XAM'/><category scheme='http://www.blogger.com/atom/ns#' term='ADE'/><category scheme='http://www.blogger.com/atom/ns#' term='Distributed Computing'/><title type='text'>XAM Active Objects</title><content type='html'>The SNIA XAM standard provides a comprehensive object model and interface for the storage of dynamic data objects. The standard provides methods by which collections of data (known as XSets) can be created and deleted, data can be stored, retrieved and manipulated inside these XSets, and queries can be performed to locate XSets that match specified charactertistics.&lt;br /&gt;&lt;br /&gt;While the first version of the standard was under development, I worked with Jared Floyd of Permabit to ensure that the XAM Job model was architected in such a way that it would include all of the components required to support active objects, like those supported in ADE.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;What are XAM Active Objects?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This is best illustrated by an example:&lt;br /&gt;&lt;br /&gt;Let us create a hypothetical XSet that includes an XStream (binary data) that contains the bytecodes defining a Java program. When this XSet is committed, it can automatically start executing. The entity that executes these active objects can either be performed by a sidecar system that attaches to the XAM Storage System (XSS), and becomes aware of new active objects via the standard XAM query facilities, or the execution of these active objects can be an intrinsic part of the XSS.&lt;br /&gt;&lt;br /&gt;The Java program would be executed in the security context of the XSET it is contained within, and thus it can access local data stored within it's own XSET, and optionally perform XAM operations based on its security credentials. This allows it to read other XSets, create new XSets and perform queries to discover XSets.&lt;br /&gt;&lt;br /&gt;For example, an active XAM object could remain resident within the storage system, performing queries for specific types of XSETs (for example, PDF objects), and convert them into a newer version of the PDF format. Such an model would allow dynamic format conversion of archived content, just by loading in a new XSET. This model is also be useful for analysis, where a large number of XSETs need to be analysed or datamined in the background.&lt;br /&gt;&lt;br /&gt;Because XAM Storage Systems will often be distributed, multiple active objects can easily be parallelized across the compute infrastructure the makes up the system, and parallel computing patterns are easy to implement, as one active object can create XSETs that act as child active objects (the equivalent of performing a fork in UNIX). As code can be bundled with data, this model will also enable the creation of data-driven MIMD and SIMD parallel data processing systems.&lt;br /&gt;&lt;br /&gt;XSETs become process contexts, complete with inputs, code, state and outputs, with the full ability to discover and access data within the storage system, create new data, spawn child processes and report status information to external systems. Because XSETs belong to user contexts, adding processing quotas, limits on computing resources, and other policies can use the same methods as used for other XAM policies.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Part of the Standard?&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;All this could be implemented today as a vendor-specific extension to the standard. However, adding this capability to the XAM standard would require work to be done in the following areas:&lt;ul&gt;&lt;li&gt;Active object language profiles would have to be defined, since multiple languages including Java, .net and interpreted languages such as ruby could all be supported.&lt;/li&gt;&lt;li&gt;Language bindings would need to be standardized to allow these programs embedded in the active objects to be perform XAM operations on the XSS they are resident in.&lt;/li&gt;&lt;li&gt;Standard job-style status reporting would be beneficial to allow standardized active object execution status monitoring.&lt;/li&gt;&lt;li&gt;Policies for computing-related aspects, such as resource usage and quotas would need to be defined.&lt;/li&gt;&lt;li&gt;Requirements for security isolation would need to be defined.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;Given that none of these would require changes to the core XAM standard, this could easily be added as an optional part of the standard, much like Level 2 query, which provides full-text search within XStreams.&lt;br /&gt;&lt;br /&gt;In summary, the XAM standard provides a foundation upon which a rich data-driven distributed computing system can easily be created. This opens up many intriguing possibilities, and would be relatively easy to formally add to the standard.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-718033007783094060?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/718033007783094060/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=718033007783094060' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/718033007783094060'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/718033007783094060'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/12/xam-active-objects.html' title='XAM Active Objects'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-3239806359861834837</id><published>2008-11-23T15:18:00.000-08:00</published><updated>2008-11-23T15:44:51.491-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Research'/><category scheme='http://www.blogger.com/atom/ns#' term='ADE'/><category scheme='http://www.blogger.com/atom/ns#' term='Distributed Computing'/><title type='text'>The ADE Process Model</title><content type='html'>To illustrate how ADE-based systems operate, it is best to use an example. In this entry, we will examine at a high level how a simple ADE program is run, from bootstrapping to shutdown.&lt;br /&gt;&lt;br /&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;ADE Versions&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Over the last ten years, there have been three major revisions of ADE:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;ADE V1 was developed in 1998, and was a minimal system designed to facilitate experiments with message-based computing. In this version, ADE was hosted as a library that ran on Mac OS.&lt;/li&gt;&lt;li&gt;ADE V2 was a result of continued improvements over the next two years, and included a pure message-based execution model and the ability to run directly on a processor without requiring a host operating system.&lt;/li&gt;&lt;li&gt;ADE V3 was created specifically for the Bycast StorageGRID platform during 2000 through 2004, and has been tuned to provide high levels of performance while retaining most of the advantages of the ADE approach to distributed computing. In this version, ADE was updated to use an Operating System abstraction layer, allowing it to run on a wide variety of UNIX-like and embedded operating systems.&lt;/li&gt;&lt;/ul&gt;In order to demonstrate the pure message-based execution model, this example is based on ADE V2.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Bootstrapping&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;When ADE first starts, there are no processes present and no messages being sent. When in this state, there are no processes to send a message to, and there are no processes that could send a message. Thus, in order to transition the system into an executing state, we must "bootstrap" the system to the point where there is at least one process executing.&lt;br /&gt;&lt;br /&gt;Given that ADE processes are just messages, when I refer to a message as a process, I'm referring to a message that contains executable code as part of the message contents. And since messages are just self-describing data structures (known as "containers"), a message can be serialized to disk. As messages are created, manipulated and exchanged in serialized form, one of the key design criteria of the container data format was highly efficient serialization.&lt;br /&gt;&lt;br /&gt;We use this property to bootstrap the system. Once the ADE kernel is running, we load an (initially hand-created) message into memory. This message contains executable code for our bootstrap process, as well as the state for the process.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_FN1WQhtIvYA/SSnobJyqGjI/AAAAAAAAABs/M-IuCPQNIrw/s1600-h/message1.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 249px; height: 118px;" src="http://2.bp.blogspot.com/_FN1WQhtIvYA/SSnobJyqGjI/AAAAAAAAABs/M-IuCPQNIrw/s400/message1.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5272000391996381746" /&gt;&lt;/a&gt;&lt;div style="text-align: center;"&gt;Figure 1: An executable message (Capability ID 100)&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;Loading this message into memory involves the following steps:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;An entry is added to an in-memory database, keyed by the message capability. The message capability is a randomly assigned 128 bit identifier assigned by the system that uniquely identifies the message.&lt;/li&gt;&lt;li&gt;The kernel creates a protected memory space for the message. This memory space provides the execution context.&lt;/li&gt;&lt;li&gt;The serialized message is loaded into this memory space.&lt;/li&gt;&lt;/ol&gt;At this point, we have a process in memory, ready to do work, but no messages to trigger any work to be done. In order to kick off execution, we need to load a second message into the system. This message is far simpler than the first message, as it just triggers the executable code in the first message. This message consists of a message capability, a source CID (set to zero for bootstrapping), a destination CID, which is set to the capability identifier for the first message, and a message type, which is a string that identifies which executable code in the destination that should be invoked.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_FN1WQhtIvYA/SSnorrYp54I/AAAAAAAAAB0/sJj392l7qEc/s1600-h/message2.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 249px; height: 118px;" src="http://3.bp.blogspot.com/_FN1WQhtIvYA/SSnorrYp54I/AAAAAAAAAB0/sJj392l7qEc/s400/message2.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5272000675892029314" /&gt;&lt;/a&gt;&lt;div style="text-align: center;"&gt;Figure 2: A message destined to CID 100&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;This is read as, "Message CID 200, from CID 0, to CID 100, invokes /create".&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Message Processing&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;When this second message is loaded into memory, the presence of the destination capability identifier field triggers the message to be added to a scheduler queue for delivery to the specified destination capability. The ADE kernel sees this scheduler entry, and moves the triggering message into the memory space of the destination message, where it becomes part of the destination message. The scheduler then switches to the context of the destination message, and executes the code indicated by the triggering message. As this executable code has access to the message contents, it can now inspect the contents of the triggering message, and manipulate its own message state.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_FN1WQhtIvYA/SSno61xbCyI/AAAAAAAAAB8/7eVc4JRPpRs/s1600-h/message3.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 249px; height: 156px;" src="http://2.bp.blogspot.com/_FN1WQhtIvYA/SSno61xbCyI/AAAAAAAAAB8/7eVc4JRPpRs/s400/message3.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5272000936378305314" /&gt;&lt;/a&gt;&lt;div style="text-align: center;"&gt;Figure 3: Message CID 100, after receiving message CID 200.&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;In our example, message CID 100 then executes the "create" method, and the executable code in the create method creates and sends a new message to itself (which is the only possible destination, and the only destination known to the process), invoking the "destroy" method. The final step in the "create" method is to clean up the contents of triggering message from the process message state.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_FN1WQhtIvYA/SSnpDxDCC_I/AAAAAAAAACE/ucwIbhcbOmQ/s1600-h/message4.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 249px; height: 213px;" src="http://1.bp.blogspot.com/_FN1WQhtIvYA/SSnpDxDCC_I/AAAAAAAAACE/ucwIbhcbOmQ/s400/message4.jpg" border="0" alt="" id="BLOGGER_PHOTO_ID_5272001089728809970" /&gt;&lt;/a&gt;&lt;div style="text-align: center;"&gt;Figure 4: Message CID 100, with message CID 300 about to be sent.&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;This newly sent message in turn gets enqueued by the scheduler, then processed, causing the "destroy" method to be invoked by the destination process. This executable code for the "destroy" method removes the entire state of the message, which then results in the message itself being removed (and the entry being removed from the in-memory database), returning us to our original pre-bootstrap state.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;In Summary&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;This process model, of sending messages to other messages, executing code in destination messages, manipulating the state of the destination messages, and creating new messages, allows full computational capabilities, and provides a rich foundation for the creation of many advanced distributed computing models.&lt;br /&gt;&lt;br /&gt;In our next ADE post, we'll examine some of the design patterns and capabilities that emerge as a result of this process model.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-3239806359861834837?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/3239806359861834837/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=3239806359861834837' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3239806359861834837'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3239806359861834837'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/11/ade-process-model.html' title='The ADE Process Model'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_FN1WQhtIvYA/SSnobJyqGjI/AAAAAAAAABs/M-IuCPQNIrw/s72-c/message1.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-803846058062996232</id><published>2008-11-19T11:09:00.000-08:00</published><updated>2008-11-19T11:30:15.716-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Hardware'/><category scheme='http://www.blogger.com/atom/ns#' term='Reliability'/><category scheme='http://www.blogger.com/atom/ns#' term='HP'/><title type='text'>SAS for Blades</title><content type='html'>Blade-based computing form factors offer many potential advantages for constructing storage subsystems using standard hardware, but until recently, the lack of cost-effective storage connectivity has hampered blades when compared to 1 and 2U servers with directly connected SCSI or SATA/SAS shelves.&lt;br /&gt;&lt;br /&gt;As I mentioned in an earlier post about &lt;a href="http://intotheinfrastructure.blogspot.com/2008/10/blades-for-storage.html"&gt;IBM's BladeCenter S&lt;/a&gt; storage attachment capabilities, this is starting to change as low-cost SAS storage connectivity switching is integrated into blade solutions.&lt;br /&gt;&lt;br /&gt;And now I am pleased to see that HP has introduced a new product, the &lt;a href="http://h18004.www1.hp.com/products/storageworks/3gbsas_switch/index.html"&gt;HP StorageWorks 3Gb SAS BL Switch&lt;/a&gt;. This product provides eight external facing SAS 3GB ports, and if you install two of these switches in a c3000 chassis, you can attach up to 16 external SAS/SATA shelves. Assuming 12 TB of SATA storage per shelf, that allows you to have 192 TB of raw addressable storage per c3000. &lt;br /&gt;&lt;br /&gt;This is perfect for high density storage systems, as you end up with 6U for the c3000, 32U for the storage shelves, and 4U remaining for a KVM and networking equipment. As the c3000 series blades also allows you to install "Storage Blades", you can host databases locally to the blade. This makes an excellent platform for a scalable high-density StorageGRID deployment, as each site can start with two blades providing basic services, and as storage capacity is expanded, additional shelves and blades can be added as needed. And building out to 200TB per rack is a pretty respectable density. One could easily use this hardware to build a 1 PB storage system based around five racks running StorageGRID, connected together using 10 GB network uplinks.&lt;br /&gt;&lt;br /&gt;When comparing this to the IBM offering, there are both advantages and disadvantages. In order to allow an HP blade to access the SAS switch, you have to install a Smart Array P700m controller card, which represents an additional cost. With the IBM offering, the RAID controller in part of the SAS switch module itself. But given that the HP switch allows twice the SAS attachment density, and that having the controller card as part of the blade provides a greater degree of failure isolation, I'm inclined to prefer the HP solution.&lt;br /&gt;&lt;br /&gt;But regardless of the minor differences, the bottom line is that both systems are now far more viable for use as lower-cost storage infrastructure due to the elimination of the need for fibre-channel based connectivity to the storage shelves. This is a huge improvement in terms of costs, and one that we won't see again until FCOE (or SAS over Ethernet) becomes widely deployed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-803846058062996232?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/803846058062996232/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=803846058062996232' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/803846058062996232'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/803846058062996232'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/11/sas-for-blades.html' title='SAS for Blades'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-6455297351658440898</id><published>2008-11-18T00:54:00.000-08:00</published><updated>2008-11-18T01:10:26.273-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Research'/><category scheme='http://www.blogger.com/atom/ns#' term='ADE'/><category scheme='http://www.blogger.com/atom/ns#' term='Distributed Computing'/><title type='text'>ADE: A Decade of Distributed Computing</title><content type='html'>The Asynchronous Distributed Environment (ADE) is a message-based framework for the creation of distributed grid computing systems, including the Bycast StorageGRID. With over 40,000 node-years of production runtime, this framework represents one of the more mature distributed computing environments.&lt;br /&gt;&lt;br /&gt;I originally developed ADE as a testbed for distributed computing concepts, and published a paper describing the system, titled "The ADE Environment: A Macroscopic Object Architecture for Software Componentization and Distributed Computing" at the MacHack conference back in 1998. As an environment for experimentation, it was very successful, allowing the rapidly exploration of many patterns for creating massively concurrent distributed systems, and for refining the environment to improve the programming model and associated infrastructure.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The ADE Process Model&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;ADE takes a rather unique approach to the traditional CSP process model: processes are messages. Thus, instantiating a process is as easy as sending a message that includes executable code. Each message has a unique "capability" identifier that allows messages to be sent to itself. When a message is received by a process, the executable code corresponding to the received message is run, which allows the process to manipulate its state (the process message) and to optionally generate additional messages. &lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://2.bp.blogspot.com/_FN1WQhtIvYA/SSKDicOoAeI/AAAAAAAAABk/orVowCNsG8U/s1600-h/ADE+Messages.jpg"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 301px; height: 400px;" src="http://2.bp.blogspot.com/_FN1WQhtIvYA/SSKDicOoAeI/AAAAAAAAABk/orVowCNsG8U/s400/ADE+Messages.jpg" border="0" alt=""id="BLOGGER_PHOTO_ID_5269919141693227490" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;An example of a more complex process trace can be found as part of this &lt;a href="http://www.graphviz.org/bugs/b84.html"&gt;Graphviz .dot file&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;This approach has several major advantages:&lt;ol&gt;&lt;li&gt;Parallelism is implicit and automatic, as dependencies are expressed as message exchange relationships. All forms of serial and parallel computing can thus expressed as directed acyclic graphs.&lt;/li&gt;&lt;li&gt;Processes can easily be migrated from one system to another, as they are just messages. Lightweight and automated migration permits experimentation with alternate approaches to distributed processing, such as migrating processes to a data source or resource.&lt;/li&gt;&lt;li&gt;As processes are just messages, the entire execution state of a system can be captured by checkpointing all message across a given cut line, and by retaining historical message states, execution can be run backwards, and "anti-messages" can undo processing.&lt;/li&gt;&lt;li&gt;Execution state can easily be logged and inspected for debugging and visualization. By inspecting the real-time message graph, deadlock and livelock can be automatically detected, as can the critical path for performance optimization.&lt;/li&gt;&lt;/ol&gt;And as one can imagine, this is just the tip of the iceberg.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Maturing Environment&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Much has changed over the last decade, and ADE has transformed from a research system into an industry proven technology. As ADE originally was based on ATM networking, we re-wrote the networking and node-to-node messaging to use TCP/IP, and the process model was changed to allow statically linked code to be associated with processes, instead of including it with the message. As storage systems require extremely high execution performance, we optimized for message processing speed and efficiency, and some of the less-used features, such as process migration, were retired from our production builds.&lt;br /&gt;&lt;br /&gt;The result has been a high-performance, yet flexible system that facilitates the rapid development and testing of distributed systems. By focusing on allowing distributed software to be easily developed, visualized and tested, we've been able to rapidly innovate and add functionality to our system. And ultimately, this has been the most significant advantage of ADE.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-6455297351658440898?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/6455297351658440898/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=6455297351658440898' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/6455297351658440898'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/6455297351658440898'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/11/ade-decade-of-distributed-computing.html' title='ADE: A Decade of Distributed Computing'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_FN1WQhtIvYA/SSKDicOoAeI/AAAAAAAAABk/orVowCNsG8U/s72-c/ADE+Messages.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-3665411646021410849</id><published>2008-11-10T14:59:00.000-08:00</published><updated>2008-11-10T15:11:22.347-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Bycast'/><category scheme='http://www.blogger.com/atom/ns#' term='EMC'/><category scheme='http://www.blogger.com/atom/ns#' term='Web 2.0'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Welcome, EMC</title><content type='html'>Today, EMC announced their &lt;a href="http://www.emc.com/products/detail/software/atmos.htm"&gt;Atmos&lt;/a&gt; product, perviously code-named "Maui". It is an interesting offering, and one that we are quite familiar with... given that object-based storage is what Bycast has been developing since 2000, and has had deployed and operational in customer sites since 2002.&lt;br /&gt;&lt;br /&gt;Let's do a quick rundown of the core features that Bycast &lt;a href="http://www.bycast.com/products/storagegrid_overview.asp"&gt;StorageGRID&lt;/a&gt; offers that are also offered by Atmos:&lt;ul&gt;&lt;li&gt;Object based storage? Check&lt;/li&gt;&lt;li&gt;Metadata with objects? Check&lt;/li&gt;&lt;li&gt;Multi-site? Check&lt;/li&gt;&lt;li&gt;Multi-tenancy? Check&lt;/li&gt;&lt;li&gt;File-system interface? Check&lt;/li&gt;&lt;li&gt;Web-services API? Check&lt;/li&gt;&lt;li&gt;Policy-driven replication? Check&lt;/li&gt;&lt;li&gt;Compression? Check&lt;/li&gt;&lt;li&gt;Object-based De-dupe? Check&lt;/li&gt;&lt;li&gt;Object versioning? Check&lt;/li&gt;&lt;li&gt;MAID-style spin-down? Check&lt;/li&gt;&lt;li&gt;Web-based admin? Check&lt;/li&gt;&lt;/ul&gt;There are also a couple of significant features found in StorageGRID that are not present in Atmos:&lt;ul&gt;&lt;li&gt;Encryption? Unknown&lt;/li&gt;&lt;li&gt;Object integrity protection? Unknown&lt;/li&gt;&lt;li&gt;Clustered gateways? Unknown&lt;/li&gt;&lt;li&gt;Storage hardware independent? Nope&lt;/li&gt;&lt;li&gt;Storage vendor independent? Nope&lt;/li&gt;&lt;li&gt;Support for tape and other high-latency media? Nope&lt;/li&gt;&lt;li&gt;Six years of operation in the field? Nope&lt;/li&gt;&lt;li&gt;Mature, customer-proven product? Nope&lt;/li&gt;&lt;/ul&gt;Over the last five years of deploying large-scale distributed object-based storage systems, we've learned a lot about what things work well, and where things go wrong. Based on the initial feature set, we know that Atmos will be a successful offering for EMC. There will, of course, be the usual teething pains, not unlike those that happened with Centera, but that is the nature of all 1.0 software.&lt;br /&gt;&lt;br /&gt;So, to our friends at EMC, I say, Welcome. We're glad to see object-based storage becoming mainstream.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-3665411646021410849?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/3665411646021410849/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=3665411646021410849' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3665411646021410849'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3665411646021410849'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/11/welcome-emc.html' title='Welcome, EMC'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-7059219735823201401</id><published>2008-11-03T22:44:00.000-08:00</published><updated>2008-11-03T23:12:59.623-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Bycast'/><category scheme='http://www.blogger.com/atom/ns#' term='Information Visualization'/><category scheme='http://www.blogger.com/atom/ns#' term='Graphing'/><category scheme='http://www.blogger.com/atom/ns#' term='User Interface'/><title type='text'>On Graphing Data</title><content type='html'>Over the years, I've had to do a fair bit of data analysis for network monitoring and software instrumentation. Computers, especially when monitoring themselves, can easily generate vast amounts of data, and when you are looking for the one anomaly or correlation, the only way to quickly find them is to represent the information visually. And for data that changes over time, that means generating graphs of the data.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Little History&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Back in 2000, when we starting building our web-based management and monitoring tool at Bycast, we spent a lot of time looking at different third-party graphing toolkits. This was long before AJAX and mature client-side JavaScript engines, and simply weren't able to find any that even met the following basic requirements:&lt;ol&gt;&lt;li&gt;Low-latency chart generation&lt;/li&gt;&lt;li&gt;Anti-aliased rendering&lt;/li&gt;&lt;li&gt;Data binning&lt;/li&gt;&lt;li&gt;Display of minimum, maximum and average values&lt;/li&gt;&lt;/ol&gt;Having looked at many poorly rendered charts, I set very high visual standards for our chart rendering classes. And since many of our graphs would be displaying data collected over weeks, months and years, a single graph could often require the processing of hundreds of thousands to millions of data points. As one might imagine, this was a non-trivial problem.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Min-Max-Average&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Attempting to plot these values directly would never be efficient or responsive for the end user, so we grouped data points into time-based bins, calculated the minimum, maximum and average value of each bin, and rendered the bin values. Since each chart had the same number of bins, this allowed us to optimize our data processing and storage, while still allowing high resolution display of data, and allowing users to easily zoom into areas of interest.&lt;br /&gt;&lt;br /&gt;This is best illustrated with an example:&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_FN1WQhtIvYA/SQ_x2b0-cKI/AAAAAAAAABc/_tZ_YMrhu9A/s1600-h/2008-11-03+Chart.png"&gt;&lt;img style="display:block; margin:0px auto 10px; text-align:center;cursor:pointer; cursor:hand;width: 378px; height: 154px;" src="http://1.bp.blogspot.com/_FN1WQhtIvYA/SQ_x2b0-cKI/AAAAAAAAABc/_tZ_YMrhu9A/s400/2008-11-03+Chart.png" border="0" alt=""id="BLOGGER_PHOTO_ID_5264692406904582306" /&gt;&lt;/a&gt;&lt;br /&gt;This graph, which shows system memory usage over time, has a dark green line, which indicates the average memory usage, and the light shaded region shows the minimum and maximum range within each bin.&lt;br /&gt;&lt;br /&gt;This allows deviations from the average to be easily identified visually. For example, one can see that memory usage dropped significantly on the evening of September 28th, and at one point, fell to almost 210 MBytes. This would not be visible with a chart that only displayed the average value, as most charts tend to do.&lt;br /&gt;&lt;br /&gt;We've found that displaying this information is very important. Often it is the deviations from the average that are the most important to focus on, and this method of displaying them allows one to quickly identify and zoom in on the areas of interest.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Graphing Guidelines&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;I have long been looking for a well-documented set of guidelines for how to render graphs, but until now, I had not found anything that met the bill.&lt;br /&gt;&lt;br /&gt;In August 2008, Microsoft's &lt;a href="http://www.mscui.com/Default.aspx"&gt;Microsoft Health Common User Interface Group&lt;/a&gt; released a new guidance document, titled &lt;a href="http://www.mscui.com/DesignGuide/TablesGraphsDisplay.aspx"&gt;Displaying Graphs and Tables&lt;/a&gt;. This document is excellent, and I would encourage everyone involved with information visualization related to graphing to read this document. It fits exactly what I had been looking for, and is very well written.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Some Additional Graphing References&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Here are some references that I have read and found useful when designing graph-based information visualizations:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://books.google.com/books?id=cdYEAQAACAAJ"&gt;Beautiful Evidence&lt;/a&gt;, by Edward R. Tufte&lt;br /&gt;&lt;a href="http://books.google.com/books?id=AVEKAAAACAAJ"&gt;The Elements of Graphing Data&lt;/a&gt;, by William S. Cleveland&lt;br /&gt;&lt;a href="http://books.google.com/books?id=_kRX4LoFfGQC"&gt;The Grammar of Graphics&lt;/a&gt; by Leland Wilkinson&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-7059219735823201401?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/7059219735823201401/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=7059219735823201401' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7059219735823201401'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7059219735823201401'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/11/on-graphing-data.html' title='On Graphing Data'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_FN1WQhtIvYA/SQ_x2b0-cKI/AAAAAAAAABc/_tZ_YMrhu9A/s72-c/2008-11-03+Chart.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-6392052971360864026</id><published>2008-10-27T22:21:00.000-07:00</published><updated>2008-10-27T22:50:39.857-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Design Patterns'/><category scheme='http://www.blogger.com/atom/ns#' term='Microsoft'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Microsoft Azure is Cloud Storage</title><content type='html'>Microsoft today announced their Cloud Computing initiative, the &lt;a href="http://www.azure.com/"&gt;Azure Services Platform&lt;/a&gt;. While many of the commentators have been comparing it with Amazon's EC2 offerings, and with VMWare based hosting, Azure is as much about cloud storage as it is about computing.&lt;br /&gt;&lt;br /&gt;Quoting from Microsoft's &lt;a href="http://port25.technet.com/archive/2008/10/27/developers-developers-developers.aspx"&gt;Port 25 Blog&lt;/a&gt;:&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Both Windows Azure applications and on-premises applications can access the Windows Azure storage service, and both do it in the same way: using a RESTful approach. The underlying data store is not Microsoft SQL Server, however. In fact, Windows Azure storage isn't a relational system, and its query language isn't SQL. Because it's primarily designed to support applications built on Windows Azure, it provides simpler, more scalable kinds of storage. Accordingly, it allows storing binary large objects (blobs), provides queues for communication between components of Windows Azure applications, and even offers a form of tables with a straightforward query language, Chappell says.&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;This is worth emphasizing. Azure provides global cloud-based object storage. And it takes it one step further, by providing active objects such as queues and tables. The presence of queues, message busses and other persistent data structures are a real game-changer, as they form the location-independent "glue" by which to hold together large-scale loosely-coupled applications that are best suited for cloud-based hosting. This directly competes with Amazon's S3, and by creating a platform that runs .Net and other interpreted languages directly without the weight of a full OS running in a VM, it should be able to scale much more elegantly.&lt;br /&gt;&lt;br /&gt;For storage object access, the API Microsoft has adopted is very similar to the HTTP API used by Bycast for the StorageGRID platform, so this continues the trend of standardization around HTTP for object storage and access.&lt;br /&gt;&lt;br /&gt;Their API documentation can be found at the below link:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://go.microsoft.com/fwlink/?LinkId=131258"&gt;http://go.microsoft.com/fwlink/?LinkId=131258&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Also of interest from the announcement are Microsoft's &lt;a href="http://www.dotnetservicesruby.com/"&gt;.Net Services for Ruby&lt;/a&gt;, which provide first class access to the Azure services for ruby applications.&lt;br /&gt;&lt;br /&gt;Now all we need is a XAM VIM that talks to Azure. Their containers map closely to XSets, blobs to XStreams, and stype data can be placed in tables and thus be queried. They've even provided methods to handle cache coherency for multiple simultaneous writers. Very interesting...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-6392052971360864026?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/6392052971360864026/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=6392052971360864026' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/6392052971360864026'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/6392052971360864026'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/10/microsoft-azure-is-cloud-storage.html' title='Microsoft Azure is Cloud Storage'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-5121983188895038122</id><published>2008-10-23T14:03:00.000-07:00</published><updated>2008-10-23T14:08:00.486-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Design Patterns'/><category scheme='http://www.blogger.com/atom/ns#' term='XAM'/><category scheme='http://www.blogger.com/atom/ns#' term='Blog Responses'/><title type='text'>Discussions on the Web</title><content type='html'>&lt;b&gt;&lt;a href="http://www.communities.hp.com/online/blogs/information-faster/archive/2008/10/19/xam-and-content-based-search.aspx"&gt;XAM and content-based search&lt;/a&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;A post about the advantages of including support for full-text search in XAM implementations, and my response regarding the pros and cons of providing this functionality at the storage layer instead of the application layer.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://steve-yegge.blogspot.com/2008/10/universal-design-pattern.html"&gt;The Universal Design Pattern&lt;/a&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;A discussion of a commonly used design pattern where objects have properties that are both inherited and overridden, and my response regarding the similarities of the XAM objects storage model, and it's use for persistent storage of serialized objects created when using this design pattern.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-5121983188895038122?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/5121983188895038122/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=5121983188895038122' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5121983188895038122'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/5121983188895038122'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/10/discussions-on-web.html' title='Discussions on the Web'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-3078384949890459178</id><published>2008-10-19T15:37:00.001-07:00</published><updated>2008-10-19T16:33:46.101-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Archiving'/><category scheme='http://www.blogger.com/atom/ns#' term='Scalability'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Web 2.0'/><title type='text'>Web 2.0 and Archiving</title><content type='html'>A few days ago, &lt;a href="http://www.facebook.com/note.php?note_id=30695603919"&gt;Facebook announced that they had reached the milestone of having 10 billion images stored in their servers&lt;/a&gt;. This translates to over 40 billion files under management, as each image is stored at four different resolutions. In total, all of these files requires 1 PB of storage space, and most likely consumes 2 PB of physical storage, assuming their data is mirrored at a minimum of two sites for availability purposes.&lt;br /&gt;&lt;br /&gt;If we take those 40 billion files and 1 PB of storage consumed, we can figure out that the average file size is around 25 Kbytes. All of these files need to be stored, managed, distributed and replicated — exactly the solution space for an archive. And this is why I believe that the Web 2.0 problem space has been often viewed as an archiving problem.&lt;br /&gt;&lt;br /&gt;But the Web 2.0 problem is larger. It is not the number of objects stored that makes this an engineering challenge, but the number of objects accessed: Facebook serves 15 billion images per day, thus, serving more files each day then the number of unique images they have stored! At peak load, they are serving over 300,000 images per second. Even though a majority of the images accessed each day fall in a small (&lt;0.01%) subset of the total image set, the images that are popular change daily, and the storage system must handle these "hot files" efficiently in order for the system not to collapse under the load.&lt;br /&gt;&lt;br /&gt;Thus, while it is true to say that Web 2.0 sites require an archive (for all those infrequently or never again accessed objects), due to the retrieval requirements, an archive is just one part of the infrastructure required to serve a Web 2.0 workload.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-3078384949890459178?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/3078384949890459178/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=3078384949890459178' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3078384949890459178'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/3078384949890459178'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/10/web-20-and-archiving.html' title='Web 2.0 and Archiving'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-9180251250229992592</id><published>2008-10-07T16:58:00.000-07:00</published><updated>2008-10-07T17:05:18.170-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Research'/><category scheme='http://www.blogger.com/atom/ns#' term='Distributed Computing'/><title type='text'>Extraordinary Concurrency as the Only Game in Town</title><content type='html'>DARPA released a new report today providing a detailed analysis of the challenges involved in scaling high-performance computing up to the exascale levels, three orders of magnitude higher then today's petascale supercomputers.&lt;br /&gt;&lt;br /&gt;The report is available for download at the below URL:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.cse.nd.edu/research/tech_reports/"&gt;TR2008-13: Exascale Computing Study: Technology Challenges in Achieving Exascale Systems&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;This report covers many fascinating issues, ranging from advancements in semiconductor technologies to interconnects to packaging and all the way up the stack to the software that would run on such a computing system.&lt;br /&gt;&lt;br /&gt;What jumped out to me is the degree to which concurrent programming models have become so critical to scaling computing, and the degree to which this is still an open problem. Single threaded programming has reached the end of the performance road, and the only way to improve is to parallelize. This is directly in agreement with the research in concurrent programming and runtime systems that I have been running for the past decade, and I'll be sharing more thoughts related to my findings and future research over the coming months.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Resilient Exascale Systems&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;A computing system with millions of loosely coupled processing elements running billions of concurrent processes will be continuously encountering failures at the hardware, interconnect and software levels. A successful software environment must automatically and transparently recover from such faults without passing the burden of complexity on to the programmer.&lt;br /&gt;&lt;br /&gt;Ultimately, the only successful approach will be a dispersal-driven replicated checkpoint-and-vote architecture which allows a trade-off to be made between reliability and efficiency. This approach offloads the complexity of error detection and recovery from the programmer, and allows the system to automatically optimize system efficiency (a higher voting degree trades compute efficiency for time efficiency, and a lower voting degree trades time efficiency for compute efficiency). &lt;br /&gt;&lt;br /&gt;This presence of such a resiliency layer also allows the use of fabrication techniques that have far higher defect densities and error rates then traditional system fabrication, as near-perfection is no longer required. This is a key to enable feature size and voltage to continue to be reduced, to provide high-density arrays of compute elements, and to support larger dies and stacked wafers.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Storage for Exascale Systems&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;One of the most fundamental changes we need to make is to stop separating storage resources from computing resources. CPU power is continuing to increase, and storage density is continuing to increase, but the bandwidth and latency between the two is not.&lt;br /&gt;&lt;br /&gt;Instead of moving the data to the compute element, we need to move the computation to the data. The first step is to replace the traditional demand-paged memory hierarchy with generic compute elements that can act as programmable data movers, create vastly larger register stores, and consider such data movement as a subset of a more general message passing approach. Instead of just sending data to fixed computational processes, we need to turn processes into messages themselves, thus allowing processes to also be sent to the location(s) where data is stored.&lt;br /&gt;&lt;br /&gt;Ultimately, persistent storage (for checkpoints/state) and archival storage needs to be automated and under the control of dedicated system software, not specified as part of the software. Checkpoint information needs to be streamed to higher-density storage, as does parallel I/O used to read and write datasets being processed. This allows us to move to a model where storage is just process state (and thus, ultimately, just messages), and the files as we know today are just views into the computing system, much like reports are views into a database.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-9180251250229992592?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/9180251250229992592/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=9180251250229992592' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/9180251250229992592'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/9180251250229992592'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/10/extraordinary-concurrency-as-only-game.html' title='Extraordinary Concurrency as the Only Game in Town'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-8746477117696515744</id><published>2008-10-02T15:47:00.000-07:00</published><updated>2008-10-07T17:10:21.599-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='IBM'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Hardware'/><category scheme='http://www.blogger.com/atom/ns#' term='Reliability'/><title type='text'>Blades for Storage</title><content type='html'>Over the last decade, I've had a significant interest in the benefits of blade server architectures. In early 2000, I worked on a development project that used CompactPCI cards from ZiaTech (later acquired by Intel) where we used the CompactPCI bus as a high-performance low-latency system-to-system interconnect. Several years later, we also evaluated blade systems from IBM and HP to see if we could leverage some of the benefits of blade systems for StorageGRID deployments. But, at every point, there were major downsides, namely, cost, bandwidth and storage interconnectivity.&lt;div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Thus, until now, I've never found a product that provides the storage connectivity that allows a blade system to complete economically with DAS attached shelves linked to 1U servers. That changed this week, with IBM announcing the new &lt;a href="http://www-03.ibm.com/systems/bladecenter/hardware/chassis/blades/index.html"&gt;BladeCenter S Chassis&lt;/a&gt;.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Enabling Features of the BladeCenter S&lt;br /&gt;&lt;/span&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;There are three features in this product that allow it to be a perfect foundation for software-based storage systems:&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Feature #1 - SAS/SATA RAID modules, which allow you to hang up to eight SAS/SATA shelves off each BladeCenter chassis. With 12 TB of SATA storage per shelf, that allows you to have 96 TB of raw addressable storage per BladeCenter. This feature is unique to the IBM offering at this time, and is the key enabling feature of the product for storage software solutions.&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;Feature #2 - Internal SAS/SATA storage, with two bays, each holding 6 3.5" drives (up to 6 TB SATA, 3 TB SAS). This is perfect for system and database disks, and allows all of the control storage to be co-located with the compute blades.&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;Feature #3 - 10GigE switch modules and blade NICs give you the bandwidth you need to provide high-performance storage services to external clients without bottlenecks.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;&lt;span class="Apple-style-span" style="font-weight: bold;"&gt;Scalable Hardware for Storage Software&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;With the BladeCenter chassis starting at less than $2,500, this allows a common architecture and hardware platform that extends from entry-level systems consisting of two HA blades with internal storage all the way up to multi-chassis systems managing hundreds of terabytes of storage. This also reduces support costs, as you can leverage a common pool of replacement parts.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;The blade architecture allows the compute hardware to be field replaceable, which reduces MTTR. And as the attached storage is assignable across blades, there is the possibility to design a system that has a hot standby blade that dynamically reassigns the storage associated with the failed blade, and resumes providing services.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;If I were to design a new software storage product, this would be the foundation hardware I would choose. The software would run in VMs, granted, but this hardware finally provides the cost/performance/supportability tradeoffs that I view are critical for such a product to succeed, and provides sufficient built-in storage to allow a common hardware platform to extend down to entry-level systems.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-8746477117696515744?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/8746477117696515744/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=8746477117696515744' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/8746477117696515744'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/8746477117696515744'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/10/blades-for-storage.html' title='Blades for Storage'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-4807874230451573451</id><published>2008-09-23T17:08:00.000-07:00</published><updated>2008-09-23T17:13:14.762-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Usability'/><category scheme='http://www.blogger.com/atom/ns#' term='Security'/><category scheme='http://www.blogger.com/atom/ns#' term='Search Engines'/><category scheme='http://www.blogger.com/atom/ns#' term='Trust'/><category scheme='http://www.blogger.com/atom/ns#' term='Google'/><category scheme='http://www.blogger.com/atom/ns#' term='User Interface'/><title type='text'>Google Trust</title><content type='html'>Search providers such as Google, Yahoo and Microsoft are in a unique position to provide indicators of trustworthiness of a given web site, as they act as a trusted intermediary between end-users and their desired destinations. Currently, there are currently no clear methods by which an end-user can assess if a given site in a search result is indeed controlled by the organization that it claims to represent. If there were a way that a user could identify which sites had verified identities, this would result in vast improvements in the trust relationships between end-users and the organizations behind any given web site.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The User Experience&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Let us consider a hypothetical user expierence: A user navigates to Google and enters a company name. They are presented with a listing of search results, often including the official company web site, other web sites for companies with a similar name, sites reviewing products by the company and sometimes sites maskarading as the official site.&lt;br /&gt;&lt;br /&gt;In the search results listing, each site where the identity of the organization that controls the site has been verified includes a special icon known as a "trust mark". This icon indicates that Google has established a chain of trust that allows the identity of the organization responsible for the content on that site to be verified.&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_FN1WQhtIvYA/SNmFBQGsIEI/AAAAAAAAAAs/-IrBzno-uv4/s1600-h/Picture+1.png" style="text-decoration: none;"&gt;&lt;img style="text-decoration: underline;display: block; margin-top: 0px; margin-right: auto; margin-bottom: 10px; margin-left: auto; text-align: center; cursor: pointer; " src="http://1.bp.blogspot.com/_FN1WQhtIvYA/SNmFBQGsIEI/AAAAAAAAAAs/-IrBzno-uv4/s320/Picture+1.png" border="0" alt="" id="BLOGGER_PHOTO_ID_5249373097226412098" /&gt;&lt;/a&gt;&lt;div style="text-align: center;"&gt;&lt;span class="Apple-style-span" style="font-size: x-small;"&gt;Figure 1: An example UI from Safari indicating the validity of a certificate.&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="text-align: center;"&gt;&lt;span class="Apple-style-span" style="font-size: 10px;"&gt;The green check icon is a good example of a visual representation of a trust mark.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;div style="text-align: center;"&gt;&lt;br /&gt;&lt;/div&gt;The presence of the trust mark may be sufficient for the user to navigate to the site in confidence, or they may click on the trust mark, showing a page containing legal information about the entity, including their location (using Google Maps, of course). This information would especially be useful for disambiguating different  companies with similar names.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;The Technology&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Standard web certificates are already used for secure transactions and providing information about the authenticity of a secured web site. But these are limited to the secure sections of web sites, such as pages for authentication and payment processing. Most web sites do not use SSL/TLS for the bulk of their web site due to the computational cost of processing HTTPS transactions when compared to standard HTTP.&lt;br /&gt;&lt;br /&gt;However, the same certificates used to provide HTTPS could also be used for indicating a degree of trust. By placing the certificate as a file in the root path of the web site, the Google crawler could retrieve a "certificates.txt" file, much like the current "robots.txt" file. As most certificates contain the top level domain name, Google would be able to verify the chain of trust of the certificate, check to make sure that the URL it was crawling matched the URL in the certificate, and then display the trust mark and associated information.&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;/div&gt;&lt;div&gt;As this approach leverages existing infrastructure, does not require any new protocols, and allows web sites operators with existing certificates to immediately use them for this purpose, this would facilitate rapid adoption of this technique.&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-4807874230451573451?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/4807874230451573451/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=4807874230451573451' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4807874230451573451'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4807874230451573451'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/09/google-trust.html' title='Google Trust'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_FN1WQhtIvYA/SNmFBQGsIEI/AAAAAAAAAAs/-IrBzno-uv4/s72-c/Picture+1.png' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-4635209071111868855</id><published>2008-09-15T15:38:00.000-07:00</published><updated>2008-09-15T15:40:46.845-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Security'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='VMs'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Lightning Between the Clouds</title><content type='html'>Today's VMWare announcements about &lt;a href="http://www.vmware.com/company/news/releases/vcloud_vmworld08.html"&gt;vCloud&lt;/a&gt; are the first concrete announcement of a product that enables migrating VM sessions across corporate boundaries. With the success of VMotion at a technical level, the concept of outsourcing your DR centre was an obvious next steps, and vCloud formalizes the ability to define these relationships in code.&lt;br /&gt;&lt;br /&gt;However, there are many significant technical and non-technical risks associated with such a service offering, including:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Storage Synchronization - In order to migrate VM sessions, the storage backing the VM session must also be synchronized across the two organizational entities. This plays well into EMC's hands to bundle location-independent storage along with VM functionality.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Security - VM sessions contain sensitive in-memory data, such as encryption keys, that (in a well-designed system) never makes it to disk. This is on top of the security issues associated with the storage associated with the sessions.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Multi-tenancy - An outsourcing provider will most likely be running VM sessions for multiple customers on a common infrastructure. Thus, network isolation, in additional to storage isolation and partitioning, become major issues that are not present when all resources are within a single enterprise.&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;Management - Resources need to be billed, QoS monitored, SLA's tracked and enforced, and loads predicted and managed. All this will have to work seamlessly across both the customer's and outsourcer's infrastructure. The back-office part of such a service is always under-estimated, and is critical to get right in order for a service to succeed.&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;Solving these problems is a pretty tall order, and I have to commend VMWare for their vision. This is version 2.0 of the VM revolution, and things are starting to get really interesting.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-4635209071111868855?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/4635209071111868855/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=4635209071111868855' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4635209071111868855'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4635209071111868855'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/09/lightning-between-clouds.html' title='Lightning Between the Clouds'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-4547773720407021405</id><published>2008-09-12T13:23:00.000-07:00</published><updated>2008-09-12T13:30:32.219-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Security'/><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><category scheme='http://www.blogger.com/atom/ns#' term='Compliance'/><category scheme='http://www.blogger.com/atom/ns#' term='Time'/><title type='text'>Time for Compliance</title><content type='html'>Many aspects of compliance storage rely on trusted time. These include timestamps indicating when an object or file was stored, retention durations that indicate when files must not be deleted or modified, and audit records indicating when operations were performed against the storage system. All of these timestamps must be accurate, and, more importantly, must be resistant against attack in order to satisfy the multitude of compliance regulations, such as Sarbanes-Oxley and HIPAA.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;A Question of Time&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;When evaluating such a storage system, here are ten good questions to ask your vendor:&lt;ol&gt;&lt;li&gt;How and when is the clock set?&lt;/li&gt;&lt;li&gt;Who can set or adjust the clock?&lt;/li&gt;&lt;li&gt;Are changes to the clock audited?&lt;/li&gt;&lt;li&gt;How much can the clock drift over time?&lt;/li&gt;&lt;li&gt;If the clock is synchronized, is the synchronization chain trustworthy?&lt;/li&gt;&lt;li&gt;Is clock synchronization traceable to the NIST?&lt;/li&gt;&lt;li&gt;If clock synchronization is no longer possible, how does the system react?&lt;/li&gt;&lt;li&gt;When clock synchronization is regained, how does the system react? &lt;/li&gt;&lt;li&gt;What protections are present to prevent tampering with the clock at the system level?&lt;/li&gt;&lt;li&gt;What protections are present to prevent tampering with the clock at the network level?&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;b&gt;Two Architectures&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Generally, two architectures have emerged, one that involves a completely sealed system that is capable of maintaining accurate time with drift less than one minute per year for the life of the system, and one that involves network-based transactions that cryptographically prove that a given event happened at a given time.&lt;br /&gt;&lt;br /&gt;The advantages of the first architecture include strong resistance against tampering, and low maintenance requirements. However, the downside to such an architecture is the requirement for custom hardware, both to keep accurate time (the clocks in typical servers range from largely inaccurate to downright embarrassing), and to provides the means to physically secure the hardware from prying eyes and screwdrivers. Because this requires custom enclosures and maintenance contracts (who do you trust to have the keys to the rack?) this typically lends itself to solutions from larger storage hardware vendors. And, after all, if you are spending hundreds of thousands to millions of dollars on something, it better well be able to keep accurate time.&lt;br /&gt;&lt;br /&gt;The second architecture is unfortunately far more complex and difficult to design and implement correctly. In a software-only solution, very little can be relied upon to be trusted. After all, a standard x86 server is only one boot-disk away from unfettered tampering, and it's difficult to detect if you are running under a hypervisor. Thus, such systems must rely on complex network transactions to determine accurate times of events, often resulting in increased transactional latency. Unless these time transactions are designed and tested to ensure that a malicious time source or compromised node is unable to alter the timestamps and compliance durations, this can be a significant point of weakness.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Beware NTP&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;One protocol to keep a watch out for is NTP. A malicious NTP server, combined with a poisoned DNS cache and the quick throw of a circuit breaker might result in all your compliance data being unprotected ahead of schedule, or even worse, automatically erased from the system. And given that NTP security is rarely used and not well regarded, it is almost a certainty that it forms a weak link in the chain of trust.&lt;br /&gt;&lt;br /&gt;Many systems that use NTP just use it to set the server and operating system clock, which they then trust blindly. For a given server, this clock can be easily altered, and in order to obtain trusted timestamps, information from multiple sources that can not all be easily compromised must be used.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Time is of the Essence&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;Time is often overlooked when evaluating compliance storage, but is a fundamental aspect of the compliance process. After all, in a court of law, if the timestamps of events cannot be proven to be accurate, and retention durations cannot be shown to be enforced, that expensive compliance system may end up being even more expensive.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-4547773720407021405?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/4547773720407021405/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=4547773720407021405' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4547773720407021405'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/4547773720407021405'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/09/time-for-compliance.html' title='Time for Compliance'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-7576367555429635145</id><published>2008-09-09T18:19:00.000-07:00</published><updated>2008-09-12T13:29:51.511-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Wikipedia'/><category scheme='http://www.blogger.com/atom/ns#' term='Distributed Computing'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Raining on the Cloud</title><content type='html'>A thorn in my side, as of late, has been the Wikipedia article on &lt;a href="http://en.wikipedia.org/wiki/Cloud_computing"&gt;Cloud Computing&lt;/a&gt;. Describing yet another newly coined buzzword for distributed computing, this article contains many examples of of the worst of Wikipedia, and reminds me of some of the articles I have been subjected to by SoA fundamentalists (and and before them, that of the CORBA-cultists, etc).&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Cloud, as in Network Cloud&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;The concept of Cloud Computing originated as a analogy to the network cloud, a mainstay of whiteboard and Visio diagrams everywhere. Thus, in order to understand what it means, one must consider what a network cloud means. Fortunately, this is simple to answer and relatively un-contentious: all a network cloud means is "the stuff we don't have to worry about". It's infrastructure. It's the stuff that we can let the network and/or the networking people figure out how to make work, and by ignoring those details, allows us focus on the problem at hand.&lt;br /&gt;&lt;br /&gt;If we take this concept of the network cloud and apply it to computing, we end up with "The practice of using known resources to provide computational services as a component of solving a larger problem." Just like that network cloud in the diagram indicates that we don't care how the packets get from site A to B, cloud computing allows us to not worry about how and where computation is performed.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Distributed Computing, Renamed, Yet Again&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;When viewed from this perspective, cloud computing is just yet another flavour of distributed computing, one where computational services are provided over a network, typically the Internet. The fact that the users of these services does not have to own, control, manage or even be aware of how the service is provided, is important, but not ground-breaking. The only key difference is that the service contract and information hiding resulting from a well-defined and managed service allows application complexity to be built on top of the services without having to worry about their implementation or operation.&lt;br /&gt;&lt;br /&gt;When it all works, that is...&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-7576367555429635145?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/7576367555429635145/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=7576367555429635145' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7576367555429635145'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7576367555429635145'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/09/raining-on-cloud.html' title='Raining on the Cloud'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-6060498081821905291.post-7878120688087706748</id><published>2008-09-03T16:48:00.000-07:00</published><updated>2008-09-03T16:54:25.942-07:00</updated><title type='text'>Vanishing into the Infrastructure</title><content type='html'>Throughout our history, it is our infrastructure that we build our civilization upon. From roads to electrical power grids, from the telephone network to the soon to be ubiquitous Internet, it is the technologies that we no longer think about that enable our way of life. For it is when technology fades into the infrastructure that things become interesting:&lt;ul&gt;&lt;li&gt;This is when the next generation of technology emerges.&lt;/li&gt;&lt;li&gt;This is where the majority of the dollars under the adoption curve are found.&lt;/li&gt;&lt;li&gt;And most importantly, this is when the benefits to human existence become widespread and far-reaching.&lt;/li&gt;&lt;/ul&gt;Fading into the infrastructure is not easy, nor predictable. After all, it involves technical and business challenges that far exceed the complexities of any original innovation. But for those who thrive on these challenges, it offers opportunities to make a far-reaching difference.&lt;br /&gt;&lt;br /&gt;Welcome.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/6060498081821905291-7878120688087706748?l=intotheinfrastructure.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://intotheinfrastructure.blogspot.com/feeds/7878120688087706748/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6060498081821905291&amp;postID=7878120688087706748' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7878120688087706748'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6060498081821905291/posts/default/7878120688087706748'/><link rel='alternate' type='text/html' href='http://intotheinfrastructure.blogspot.com/2008/09/fading-into-infrastructure.html' title='Vanishing into the Infrastructure'/><author><name>David Slik</name><uri>http://www.blogger.com/profile/12631991927286374244</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='32' height='32' src='http://3.bp.blogspot.com/_FN1WQhtIvYA/SLXZ0xesweI/AAAAAAAAAAQ/_vrID88Aqls/S220/2008-08-27+Icon.jpg'/></author><thr:total>0</thr:total></entry></feed>
