The explosion of growth in digital data has long driven the requirement for more storage even for small businesses. Along with this explosion have come the requirements by Governments and businesses alike to retain data for longer and longer periods. The long term archiving of data falls into 2 categories, regulatory Archive ("have to keep") and Asset Archives ("want to keep").
Current digital data growth drives organizations to find ways to save on storage costs for the near term (inactive data prior to archive) and archive data. The need to reduce costs include, costs for power, cooling, floor space, and cost of administration. Beginning in 2008 pure disk vendors began pointing to disk as the archive medium of choice. Sighting decreasing cost per GB and a high compound annual growth rate for density, pure disk archive sales accelerated at an average of 20% per year through 2011.
By 2011 customers had realized that the cost of continuously expanding spinning disk for data that was rarely touched was not feasible. Customers quickly began to run out of power in their datacenters. Customers also began to see that they could no longer cool these environments. These customers began looking for traditional storage mediums like tape, many of them settling to the traditional back-up/archive methodologies.
Methodologies for Tape usages began to change dramatically in 2010 with the design and introduction of IBM Linear Tape File System (LTFS). LTFS is an open standard writing of tape that allows usage of the tapes in a file system. Data on the tape displays to the host system as any other POSIX file system displays data. This is significant in archive storage in that it allows upper level software changes to be made without impacting the ability to access data. It also allows business to transport, share or sell data without concern for the format on data exchange.
Tape has long been the king of back-ups and deep archives and with the introduction of LTFS and Enterprise level management systems it meets the primary concerns of customers requiring scalable tiered data. Customers will realize the ease of scalability, improved I/O and transactional performance, improved data availability, and faster deployment and provisioning times. All of this with a transparent usage of LTFS formatted tape.
Introducing LTFS Enterprise Edition (EE).
Based on IBM’s wildly successful LTFS Library Edition (LE) the latest edition to the LTFS product line combines the ease of use of LTFS with the scalability, manageability and performance of IBM General Parallel File System (GPFS). LTFS EE can be installed into an existing GPFS infrastructure or installed as a stand-alone x-Series GSS installation. GPFS has long led the research industry with High Availability, High I/O capability, but has used a traditional “back-up” method for moving data to tape. This has meant that GPFS administrators had to work closely with users to decide when to move data out of the tiered disk storage and on to tape. It also meant that the GPFS administrator had to rely on or become an administrator for the tape application. The extra administration responsibilities have often kept smaller implementations from flourishing. LTFS EE resolves the complications and creates a transparent tape pool that works directly with GPFS policies.
LTFS EE also allows the tape tier of GPFS to be expanded to meet the customer needs. As storage capacity, I/O and data availability requirements grow, the customer can easily expand the environment without changing the solutions already in place. Archive capacity is expanded by adding media and provisioning it in the LTFS pool of GPFS without impacting the availability of data already in the pool. This extends to IBM’s capability to add capacity to automation infrastructure with minimal down time.
I/O and data availability (concurrency of data access) can be expanded by adding addition tape drives and/or servers in the LTFS EE pool. Performance of the tape drives and of the LTFS EE server in the LTFS pool are load balanced by the LTFS EE software to ensure the highest possible data rate and data availability.
Transparent usage of the tape Pool.
The migration of data from GPFS disk pools to the LTFS EE tape tier are handled using existing GPFS policy migration capabilities. Administrators of current GPFS clusters need only to add policies that indicate how and when data is to be moved to the LTFS EE pool. The current capabilities of GPFS are robust enough to handle all tiering schemas. These schemas include, but are not limited to, migrating data by size, type, time unmodified or even file name. The schemas also allow data to be tiered, replicated and even duplicated in a manner to meet modern back-up requirements.
Policy based migration means that the customers will only see the file system they are attached to, while data is moved to the appropriate storage for the value of the data. The GPFS plus LTFS EE name space saves real dollars in storage and power, while saving administration headache by keeping data in a single namespace structure.
Ingesting new data or needing to move massive amounts of data as is required when establishing a new data center
The LTFS EE solution is the single most effective way to ingest large or continuous inflows of data. Any media in a Standardized LTFS form can be imported into the GPFS namespace, with the files residing on tape or moved to a disk tier. The data structure is not change from the standardized LTFS format. The media is updated with the symlinks to allow usage within the GPFS namespace. Data and media in the GPFS/LTFS EE pool can be exported from the namespace and used to create a secondary location within the namespace or in any other LTFS system without modification by LTFS EE or the receiving system. This operability allows deep-Archive of data that may never be accessed or that must be kept for a regulatory timeframe, while maintaining the ability to access the data without complete re-import into the LTFS EE solution. This allows complete interoperability of the open format data.
Smarter Storage means ease of usage.
GPFS plus LTFS EE creates a flat file system that allows applications to seamlessly integrate with the tiered storage solution. Most applications require some type of file system to store and manipulate data. What is often not taken into account is the amount of storage that is required to manage the data produced. This is true with traditional applications that manage finances, manufacturing, or sales/marketing, but also new applications and infrastructures like data analytic applications.
Traditional applications usually have contained data that grows by time frame or project. LTFS EE allows the data from these applications to be migrated to tape without the knowledge of the application or user. The data is still accessible by the application with no modification to the data or the application.
Big Data/Analytic applications process huge amounts of data during analytic sessions but require even more storage to store the data sets that are needed before processing. This means that more data is at rest than is being processed. Data at rest can now be seamlessly managed on tape with only the dataset being analyzed spinning on disk or flash.
Data is available throughout the entire namespace meaning disparate sites have access to data whether it is on disk or on LTFS EE. Disparate site data can be aggregated without intervention of applications outside the namespace. This allows central monitoring, analytics and archiving with reduced administration and hardware costs.
The Right Choice
There are hundreds of scalable solutions on the market today that claim to have the most scalability or are ideally suited for archive. Most will claim to be the most cost effective way to archive data. The right choice is the choice that offers the best of all capabilities i.e., scalability, performance, manageability, seamless archive capability, and allows data value storage management. GPFS with LTFS EE is the only industry solution that meets all of these points with proven, consistent track records. It is the first truly transparent tape solution for data management.