In the scope of natural resources, data, as the "new" natural resource is as fresh as they come. However, the relative newness of data as a true resource is not unnoticed by the predominant data storage medium. Introduced in 1952, tape sits alone in the storage arena as the only historically analytic “new” technology.
A brief history of tape demonstrates that all of the critical data through modern computational business has been nearly solely carried through time on tape. Early in data acquisition tape was the only medium with the data through put and capacity to hold the Gigabytes of data being generated. Through the eighties and early Nineties tape continued to be the most reliable infrastructure for the most critical financial and scientific data. Like it or not all of this data has been the big data for each generation. Currently the Cloud and off-premise managed infrastructure is gaining momentum for both consumers and enterprises.
Tape in the cloud is also gaining momentum, IDC has reported that more than 2% of all energy produced in the United States is used to power the “Open Compute” infrastructures used for Cloud and Social media. The cost of both powering and cooling these systems has lead to entirely new specialists in the IT field. Clipper Group reports that Tape is up to 26 times less expensive in a 9 year TCO study when compared to a spinning disk infrastructure. The power alone for the TCO of disk is more than the entire Tape infrastructure. MAID, the process of powering down disk drives not in operation, has been less than successful in real applications.
Being fair to disk, it does have advantages for concurrency and time to data. Bearing in mind that industry analysts in several studies have determined that between 60 and 80% of all data is untouched after the first 90 days to creation, it is easy to see why concurrency and millisecond access time can be managed even with the retention of critical data.
Looking at an analysis by Robert Fontana and Gary Decad [see diagram 1], an analysis for a 50 year TCO has to take in to account that disk is facing serious industry challenges to over come atomic level limitations. At the same time tape at the current densities has no atomic limitations for, what could be interpolated, to be more than 50 years.
Another consideration of a 50 year TCO is the overall management of the data. Management includes planned migrations of the data, dependability of data in the infrastructure and any manual intervention that is needed in the period of ownership. The common misconception is that tape has to be continuously handled to make the TCO calculation better than disk, in the past this was true only because of the scale of the infrastructures. Modern tape systems can easily manage 120PetaBytes of online data in a single library string (This is calculated without compression in a TS3500 and TS1150 10TB drives). Few infrastructures today require 120 PB of data nearline and available.
But what about the risks of data integrity? Every time I turn around I see some off-premise export application or appliance that randomly brings up that data is at risk on tape much more than anywhere else., Poppycock!! There I said it. With the exception of physical transportation of media, tape is both better for bit error and dependability than disk. The bit error rate of TS1150 and LTO6 is 1E20^, several orders of magnitude above any disk drive available today. So how does that compare to the stated dependability of the cloud providers like Amazon glacier that claim a durability of 99.999999999% (nine 9’s)? A single copy of data on tape offers 12 - 9’s of durability. If we assume that there is at least 2 copies of the data for a disk system to get the 9 - 9’s than the same number of copies of data on tape will yield 24 - 9’s of durability!!!
Speaking of management of the data, it is important to acknowledge up front that electronic data must be planned for migration. All modern digital data including the majority of movie productions, should look at how to easily manage data migrations from generation to generation and from one format to another. Lets be very clear modern data is not paper copies or film canisters and it is not practical to ever think the digital industries will go back to those mediums, just due to sure volumes. I am not going to layout the plans of migration, but will point out migration is not only preservation, but also financial. EVERY storage medium must have a migration strategy, data will not survive without management, even on traditional mediums of film and paper.
The chart below shows both the space and the cost per GB for mediums in the last 30 years a good case for why planned migrations, even with capital expense being a consideration, is financially astute.
*Note that in some instances the introduction date is the commercialized availability date, not the first possible date of introduction.
In my next post I will use these discussion points to bring forward a 50 year TCO comparison of the industry leading long term retention infrastructures. These will include in some instances the planned migration and effectual benefits or downfalls associated.
Decad, Fontana, Metzler: The Impact of Areal Density and Millions of Square Inches (MSI) of Produced Memory on Petabyte Shipments for TAPE, NAND Flash, and HDD Storage Class