The Demise of Tape is Overrated

Christopher Poelker, in a blog post on ComputerWorld titled, Is Tape Really Dead, makes a series of assertions about the superiority of disk over tape. Now to give credit where it is due, he is talking about tape's role in backup, and does conclude that tape still has a role to play. Unfortunately, many of his statements made along the way are simply inaccurate.

Chris states:

Everyone is aware of the limitations of tape solutions.

Sequential accessRandom access
Relatively slowFast
Shipped offsiteElectronically vaulted
Once a day processPeriodic or continuous
High operational touchAutomated
Inexpensive mediaMore expensive

Let's look at each of these in turn:

Sequential Access vs. Random Access
What Chris is getting at here is seek latency. Both tape and disk provide full random access, only disk is faster at it than tape. However, as hard disk capacities have increased while access bandwidth remains largely constant, from a software architecture standpoint, disks look more and more like tape. This is what is leading to the collapse of RAID 5 as a means to protect data, and, in my opinion, is what will ultimately lead to the death of disk. But more on this in a subsequent blog post.

Relatively slow vs. Fast
Tape systems, when properly used, can provide extremely high levels of throughput performance, into the 10 Gigabit/sec ranges.

Individual tape drives can already stream sequentially accessed data faster than most hard disks (120 MBytes/sec), and LTO5 will increase this lead further. When randomly accessing data spread across a tape or disk, the disk will outperform tape due to lower seek latencies. And, of course, if seek latencies are important to you, you should be looking at flash.

Shipped offsite vs. Electronically vaulted
Disk drives are far more vulnerable to damage than tapes, and simply don't have the flexibility to be able to be shipped around the same way. "Electronic vaulting" often equals expensive WAN data transfers and higher costs for power and equipment.

Once a day process vs. Periodic or continuous
This would be true if we're talking about a pure tape solution, but tape-based systems have been deployed along side disk in a storage hierarchy for decades.

High operational touch vs. Automated
Baby-sitting ten thousand disks isn't low operational touch either. Disks fail continuously, and the wrong swaps can destroy an entire RAID set. Many large archives run with tens of thousands to hundreds of thousands of tapes with very little operator intervention, and modern automated libraries are highly reliable and fault tolerant.

No dedupe vs. dedupe
This is false, as dedupe is yet another data compression technique, and applies equally well to tape as it does to disk. Again, the use of dedupe on tape in backup and archiving systems goes back decades.

Inexpensive media vs. More Expensive
We're thrown a bone here in the cost department, but what isn't considered is in addition to the consumables (disks and tapes), the disk subsystems themselves must be replaced on a far more frequent basis than tape libraries. With tape libraries, the drives can be swapped out and the tapes migrated to newer, higher capacity media without having to replace the entire library.

This also does not take into account the far higher opex costs of power and heat required for disk-based solutions.

Despite the near continuous siren call of the "Tape is Dead" crowd, tape provides significant value, often higher value for the dollar than disk, and has a long life before it. And, in many ways, it is spinning disk that should be more worried about its life in the coming decade.


andrew said...

some interesting points, however I would take issue with your dedupe claims on tape. The seek latency you mention makes this practically impossible - the only way that tape can be realistically used is a direct copy of deduped data with a hash index. Recovery is done by moving contents back onto disk and then reflating.

Nice blog by the way...

David Slik said...

You are correct that the latency is multiplied when one has to scatter-gather to resolve deduplicated blocks into the requested data. However, this can be somewhat mitigated by placing similar content onto the same tape, thus reducing the latency.

Ultimately, it is a trade-off between storage efficiency and retrieval latency, and de-dupe on tape would only make sense when low retrieval latency is not a key requirement.