In the scope of natural resources, data, as the "new" natural resource is as fresh as they come. However, the relative newness of data as a true resource is not unnoticed by the predominant data storage medium. Introduced in 1952, tape sits alone in the storage arena as the only historically analytic “new” technology.
A brief history of tape demonstrates that all of the critical data through modern computational business has been nearly solely carried through time on tape. Early in data acquisition tape was the only medium with the data through put and capacity to hold the Gigabytes of data being generated. Through the eighties and early Nineties tape continued to be the most reliable infrastructure for the most critical financial and scientific data. Like it or not all of this data has been the big data for each generation. Currently the Cloud and off-premise managed infrastructure is gaining momentum for both consumers and enterprises.
Tape in the cloud is also gaining momentum, IDC has reported that more than 2% of all energy produced in the United States is used to power the “Open Compute” infrastructures used for Cloud and Social media. The cost of both powering and cooling these systems has lead to entirely new specialists in the IT field. Clipper Group reports that Tape is up to 26 times less expensive in a 9 year TCO study when compared to a spinning disk infrastructure. The power alone for the TCO of disk is more than the entire Tape infrastructure. MAID, the process of powering down disk drives not in operation, has been less than successful in real applications.
Being fair to disk, it does have advantages for concurrency and time to data. Bearing in mind that industry analysts in several studies have determined that between 60 and 80% of all data is untouched after the first 90 days to creation, it is easy to see why concurrency and millisecond access time can be managed even with the retention of critical data.
Looking at an analysis by Robert Fontana and Gary Decad [see diagram 1], an analysis for a 50 year TCO has to take in to account that disk is facing serious industry challenges to over come atomic level limitations. At the same time tape at the current densities has no atomic limitations for, what could be interpolated, to be more than 50 years.
Another consideration of a 50 year TCO is the overall management of the data. Management includes planned migrations of the data, dependability of data in the infrastructure and any manual intervention that is needed in the period of ownership. The common misconception is that tape has to be continuously handled to make the TCO calculation better than disk, in the past this was true only because of the scale of the infrastructures. Modern tape systems can easily manage 120PetaBytes of online data in a single library string (This is calculated without compression in a TS3500 and TS1150 10TB drives). Few infrastructures today require 120 PB of data nearline and available.
But what about the risks of data integrity? Every time I turn around I see some off-premise export application or appliance that randomly brings up that data is at risk on tape much more than anywhere else., Poppycock!! There I said it. With the exception of physical transportation of media, tape is both better for bit error and dependability than disk. The bit error rate of TS1150 and LTO6 is 1E20^, several orders of magnitude above any disk drive available today. So how does that compare to the stated dependability of the cloud providers like Amazon glacier that claim a durability of 99.999999999% (nine 9’s)? A single copy of data on tape offers 12 - 9’s of durability. If we assume that there is at least 2 copies of the data for a disk system to get the 9 - 9’s than the same number of copies of data on tape will yield 24 - 9’s of durability!!!
Speaking of management of the data, it is important to acknowledge up front that electronic data must be planned for migration. All modern digital data including the majority of movie productions, should look at how to easily manage data migrations from generation to generation and from one format to another. Lets be very clear modern data is not paper copies or film canisters and it is not practical to ever think the digital industries will go back to those mediums, just due to sure volumes. I am not going to layout the plans of migration, but will point out migration is not only preservation, but also financial. EVERY storage medium must have a migration strategy, data will not survive without management, even on traditional mediums of film and paper.
The chart below shows both the space and the cost per GB for mediums in the last 30 years a good case for why planned migrations, even with capital expense being a consideration, is financially astute.
*Note that in some instances the introduction date is the commercialized availability date, not the first possible date of introduction.
It is also important to understand that in a 50 TCO study and comparison we normalize that existing technologies will continue with new technologies in storage entering at the extreme high end of the performance optimization for storage.
This blog will also for simplistic assume that there is 0 growth in the data being stored. This is to ensure that we are looking at the future cost of the present data, not the accounting for the total storage cost of future data. I will also make the assumption that the data being stored today must be retained for 50 years with no defensible deletion.
How important are data migration strategies? Very important. Long term capacity optimized storage planning is the only part of IT management that is solutionable only through forward looking expectations beyond the reasonable life of the current technologies.
To calculate the TCO we will lay out a couple of clear calculation assumptions:
- The rate of $ per GB will continue to decrease at a fairly constant rate based on recent trends for both tape and disk (for the 50 year period)
- Disk power consumption will go down with the increase in capacity so migrations are actually beneficial to the TCO
- Tape Migration cycles are 10 years
- Disk migration cycles are 3 years
- Cost of migration is a fixed cost of the TCO per GB *2 for the period of storage
- Initial measurement will be for 1 PetaByte of data
Table 2 is a representation of averaged TCO per GB for each future period
Using the assumptions set forth and straight line calculations the total TCO for a 50 year retention is in Table 3
Table 3 Cost of ownership by period for tape and Disk.
Tape Migration Cycle is 10 years
Now take cost out of the picture, what is the pain in your organization for managing the extreme long term retention of data? Most IT professionals deal with the issues as they arise. The problem with this way of thinking is that by the time the thought of migration of the data at rest crosses the mid of the ever busy IT professional it may be to late. If the data was on disk, are the disks still accessible and is the data still readable?
No matter how much it is pressed that Disk can last 20 years, it is more likely that after no more than 5 years of inactivity of disk, the data will not be accessible in whole. Tape on the other hand has the ability to last 30 years without losing data access.
Just like paper has lasted for hundreds and even thousands of years, data on tape will continue to be the way to get extreme long life out of data. If Sail boat racing sails can be made to last through the rigors of salt water, light and heat, then the same materials in Digital tape should be trusted to last 10-30 years in properly acclimatized environments.
Decad, Fontana, Metzler: The Impact of Areal Density and Millions of Square Inches (MSI) of Produced Memory on Petabyte Shipments for TAPE, NAND Flash, and HDD Storage Class