When to use S3

In response to my previous post, jeredfloyd of Permabit asked about when S3 would be useful use as storage for our customers.

Do you feel S3 has the reliability and availability for your customers today? I love the concept, but I've so far been scared off by horror stories of downtime. Also, what about security concerns?

These are good questions, and I'm going to elaborate on these concerns and where we see S3 as providing value to our customer.

The Bottom Line

I wouldn't use or recommend S3 for anything other than a low-grade secondary replica location for redundancy purposes. Having said that, the levels of reliability and accessibility that I've seen are already higher than what my experiences have been with tape libraries.

Bring Your Own Security

From a security standpoint, I wouldn't put anything on S3 that hasn't been encrypted and wrapped with an integrity verification layer, as we do in StorageGRID. And if the data is encrypted, there is less of a concern about deleting it if you can't get to it any more. Just throw away the keys.

As you can't implement secure wipe using their API, even if you overwrite the data, so you would also want to be sure that you're not storing really sensitive information there, even with today's standard encryption algorithms and key strengths.

S3 Isn't Inexpensive

One of the things that I want to emphasize is that based on our analysis of their economics, if you are storing data for long periods of time, it's far cheaper to just add storage nodes with SATA shelves.

Tape isn't cheaper until you're looking at 50+ TB libraries. For infrequently accessed data and redundancy copies (you need to make more when putting them on tape, since it's not as reliable as disk), it quickly becomes very economical for large capacity deployments.

Despite this, S3 Still Has Value

Having said this, even with these concerns, I see several situations where S3 support brings real value for our customers:
  1. If you're really small (less than 50 TB), adding storage capacity is still pretty expensive as a percentage of your yearly budget because our customers typically add in 10TB or larger increments. Using S3 as an overflow pool (keeping one or two copies locally on disk, and using S3 as your second or third copy) lets you defer that purchase for a little while, and when you do make that purchase, you can automatically migrate all the data on S3 off onto your new storage resource.
  2. If it takes you too long to purchase hardware, or your budgetary cycle for capital purchases takes too long, or even just an unexpected load where there just isn't time to provision more storage, you can shift second or third copies off onto S3 to free up space, and expense it to the business as a opex or project cost.
  3. If you have a short-term storage need, and don't want to invest in hardware yet, just put it off onto S3. It will cost a little more per TB, but since wouldn't be able to amortize the storage costs of in house hardware across the typical three-year lifespan of that hardware, it works out to be cheaper in the end.
  4. If you're almost full, you've ignored the alarms telling you that you don't have enough space on other nodes to repair your storage redundancy if you loose a node, and you don't have any storage ready to replace a failed storage, S3 would be a good "last resort" option for creating new replicas to restore your desired level of redundancy.
So, to summarize, I'd use it for a sort-term storage resource to defer capital costs, a short-term emergency storage resource to keep you going, and for storage of short-term data. And in all cases, I wouldn't have the only copy in the grid on S3.

Based on these use-cases, it would be of most value to smaller IT shops with smaller systems. As you get into larger archives and storage systems (200+ TB), many of these situations will never come up.

Regardless of your size, having S3 as a choice as a storage tier gives administrators another tool to handle different situations, and that flexibility can be quite useful. Ultimately, it's up to them to decide if the costs (and bandwidth usage) makes sense for them.

No comments: