Vanishing Into the Infrastructure: September 2008

2008-09-23

Google Trust

Search providers such as Google, Yahoo and Microsoft are in a unique position to provide indicators of trustworthiness of a given web site, as they act as a trusted intermediary between end-users and their desired destinations. Currently, there are currently no clear methods by which an end-user can assess if a given site in a search result is indeed controlled by the organization that it claims to represent. If there were a way that a user could identify which sites had verified identities, this would result in vast improvements in the trust relationships between end-users and the organizations behind any given web site.

The User Experience

Let us consider a hypothetical user expierence: A user navigates to Google and enters a company name. They are presented with a listing of search results, often including the official company web site, other web sites for companies with a similar name, sites reviewing products by the company and sometimes sites maskarading as the official site.

In the search results listing, each site where the identity of the organization that controls the site has been verified includes a special icon known as a "trust mark". This icon indicates that Google has established a chain of trust that allows the identity of the organization responsible for the content on that site to be verified.

Figure 1: An example UI from Safari indicating the validity of a certificate.

The green check icon is a good example of a visual representation of a trust mark.

The presence of the trust mark may be sufficient for the user to navigate to the site in confidence, or they may click on the trust mark, showing a page containing legal information about the entity, including their location (using Google Maps, of course). This information would especially be useful for disambiguating different companies with similar names.

The Technology

Standard web certificates are already used for secure transactions and providing information about the authenticity of a secured web site. But these are limited to the secure sections of web sites, such as pages for authentication and payment processing. Most web sites do not use SSL/TLS for the bulk of their web site due to the computational cost of processing HTTPS transactions when compared to standard HTTP.

However, the same certificates used to provide HTTPS could also be used for indicating a degree of trust. By placing the certificate as a file in the root path of the web site, the Google crawler could retrieve a "certificates.txt" file, much like the current "robots.txt" file. As most certificates contain the top level domain name, Google would be able to verify the chain of trust of the certificate, check to make sure that the URL it was crawling matched the URL in the certificate, and then display the trust mark and associated information.

As this approach leverages existing infrastructure, does not require any new protocols, and allows web sites operators with existing certificates to immediately use them for this purpose, this would facilitate rapid adoption of this technique.

2008-09-15

Lightning Between the Clouds

Today's VMWare announcements about vCloud are the first concrete announcement of a product that enables migrating VM sessions across corporate boundaries. With the success of VMotion at a technical level, the concept of outsourcing your DR centre was an obvious next steps, and vCloud formalizes the ability to define these relationships in code.

However, there are many significant technical and non-technical risks associated with such a service offering, including:

Storage Synchronization - In order to migrate VM sessions, the storage backing the VM session must also be synchronized across the two organizational entities. This plays well into EMC's hands to bundle location-independent storage along with VM functionality.

Security - VM sessions contain sensitive in-memory data, such as encryption keys, that (in a well-designed system) never makes it to disk. This is on top of the security issues associated with the storage associated with the sessions.

Multi-tenancy - An outsourcing provider will most likely be running VM sessions for multiple customers on a common infrastructure. Thus, network isolation, in additional to storage isolation and partitioning, become major issues that are not present when all resources are within a single enterprise.

Management - Resources need to be billed, QoS monitored, SLA's tracked and enforced, and loads predicted and managed. All this will have to work seamlessly across both the customer's and outsourcer's infrastructure. The back-office part of such a service is always under-estimated, and is critical to get right in order for a service to succeed.

Solving these problems is a pretty tall order, and I have to commend VMWare for their vision. This is version 2.0 of the VM revolution, and things are starting to get really interesting.

2008-09-12

Time for Compliance

Many aspects of compliance storage rely on trusted time. These include timestamps indicating when an object or file was stored, retention durations that indicate when files must not be deleted or modified, and audit records indicating when operations were performed against the storage system. All of these timestamps must be accurate, and, more importantly, must be resistant against attack in order to satisfy the multitude of compliance regulations, such as Sarbanes-Oxley and HIPAA.

A Question of Time

When evaluating such a storage system, here are ten good questions to ask your vendor:

How and when is the clock set?
Who can set or adjust the clock?
Are changes to the clock audited?
How much can the clock drift over time?
If the clock is synchronized, is the synchronization chain trustworthy?
Is clock synchronization traceable to the NIST?
If clock synchronization is no longer possible, how does the system react?
When clock synchronization is regained, how does the system react?
What protections are present to prevent tampering with the clock at the system level?
What protections are present to prevent tampering with the clock at the network level?

Two Architectures

Generally, two architectures have emerged, one that involves a completely sealed system that is capable of maintaining accurate time with drift less than one minute per year for the life of the system, and one that involves network-based transactions that cryptographically prove that a given event happened at a given time.

The advantages of the first architecture include strong resistance against tampering, and low maintenance requirements. However, the downside to such an architecture is the requirement for custom hardware, both to keep accurate time (the clocks in typical servers range from largely inaccurate to downright embarrassing), and to provides the means to physically secure the hardware from prying eyes and screwdrivers. Because this requires custom enclosures and maintenance contracts (who do you trust to have the keys to the rack?) this typically lends itself to solutions from larger storage hardware vendors. And, after all, if you are spending hundreds of thousands to millions of dollars on something, it better well be able to keep accurate time.

The second architecture is unfortunately far more complex and difficult to design and implement correctly. In a software-only solution, very little can be relied upon to be trusted. After all, a standard x86 server is only one boot-disk away from unfettered tampering, and it's difficult to detect if you are running under a hypervisor. Thus, such systems must rely on complex network transactions to determine accurate times of events, often resulting in increased transactional latency. Unless these time transactions are designed and tested to ensure that a malicious time source or compromised node is unable to alter the timestamps and compliance durations, this can be a significant point of weakness.

Beware NTP

One protocol to keep a watch out for is NTP. A malicious NTP server, combined with a poisoned DNS cache and the quick throw of a circuit breaker might result in all your compliance data being unprotected ahead of schedule, or even worse, automatically erased from the system. And given that NTP security is rarely used and not well regarded, it is almost a certainty that it forms a weak link in the chain of trust.

Many systems that use NTP just use it to set the server and operating system clock, which they then trust blindly. For a given server, this clock can be easily altered, and in order to obtain trusted timestamps, information from multiple sources that can not all be easily compromised must be used.

Time is of the Essence

Time is often overlooked when evaluating compliance storage, but is a fundamental aspect of the compliance process. After all, in a court of law, if the timestamps of events cannot be proven to be accurate, and retention durations cannot be shown to be enforced, that expensive compliance system may end up being even more expensive.

2008-09-09

Raining on the Cloud

A thorn in my side, as of late, has been the Wikipedia article on Cloud Computing. Describing yet another newly coined buzzword for distributed computing, this article contains many examples of of the worst of Wikipedia, and reminds me of some of the articles I have been subjected to by SoA fundamentalists (and and before them, that of the CORBA-cultists, etc).

Cloud, as in Network Cloud

The concept of Cloud Computing originated as a analogy to the network cloud, a mainstay of whiteboard and Visio diagrams everywhere. Thus, in order to understand what it means, one must consider what a network cloud means. Fortunately, this is simple to answer and relatively un-contentious: all a network cloud means is "the stuff we don't have to worry about". It's infrastructure. It's the stuff that we can let the network and/or the networking people figure out how to make work, and by ignoring those details, allows us focus on the problem at hand.

If we take this concept of the network cloud and apply it to computing, we end up with "The practice of using known resources to provide computational services as a component of solving a larger problem." Just like that network cloud in the diagram indicates that we don't care how the packets get from site A to B, cloud computing allows us to not worry about how and where computation is performed.

Distributed Computing, Renamed, Yet Again

When viewed from this perspective, cloud computing is just yet another flavour of distributed computing, one where computational services are provided over a network, typically the Internet. The fact that the users of these services does not have to own, control, manage or even be aware of how the service is provided, is important, but not ground-breaking. The only key difference is that the service contract and information hiding resulting from a well-defined and managed service allows application complexity to be built on top of the services without having to worry about their implementation or operation.

When it all works, that is...

2008-09-03