On 19 March 2010, IBM will release Tivoli Storage Manager V6.2, the next in a long line of enhancements to the leader in enterprise-wide data protection, unified recovery management and effective data reduction. Highlighting this release is the addition of source (client-side) data deduplication, tighter integration with TSM FastBack, enhanced support for virtual server environments, automatic deployment of Windows client upgrades, and improved automation and performance of back-end data management processing.
Source (Client-Side) Data Deduplication: Eliminates sending over the network the chunks of data that are already managed by the TSM Server, speeding backups and reducing bandwidth requirements. This is an excellent remote office backup solution for offices with a small number of servers, where the addition of a separate backup server is not justifiable.
Tighter integration with TSM FastBack: Further delivering on Unified Recovery Management, TSM FastBack servers and TSM FastBack for Workstations clients can now be managed directly by the TSM Admin Center. From a single user interface, you can now manage your end-to-end data protection and recovery infrastructure.
Enhanced support for Virtual Server Environments: Support for the VMware vStorage API and VSS snapshots in Microsoft Hyper-V give customers more choices and greater control of protecting and restoring data on virtual servers.
Automatic deployment of Windows Client Upgrades: Configurable policies and schedules for pushing client updates will help reduce administration time as well as the risk of errors that can occur when manually updating a large number of client systems.
Improved Automation and Performance of Back-End Data Management Processes: The TSM Server can now simultaneously perform several data management processes, including data migration, copy pool backup and copy active data; this can result in freeing up server resources for other tasks, including additional or more frequent backup job processing.
The count down is on... with only 2 weeks left to Pulse 2010, I wanted to give you and update on additional perks you'll have access to if you register and attend. Meet the Experts! Talk one-on-one with Product Experts
Booth 80: SAN Volume Controller and Tivoli Storage Productivity Center – storage virtualization, storage resource management, data discovery
Optimizing Infrastructure: Smarter Systems, Storage and Information Retention Zone
Booth 92: IBM Information Archive and IBM Smart Archiving Strategy
Booth 93: IBM XIV: Storage Reinvented
Booth 95: IBM System Storage DS8000 Series
Delivering Business Value with Smarter Services
Booth 79: IBM Storage Enterprise Resource Planner
Check out my previous blog,The Pulse Roadmap to Storage Expertise, for information on some of the sessions that you can attend. Use the on-line agenda tool
to build your agenda and view all the sessions available (requires only
an IBM.com password - you do NOT have to be a Pulse registered attendee
to create a Pulse schedule online).
Share Your Story This year at Pulse 2010 we are scheduling video tape interviews with clients who are willing to share their thoughts on what they are doing to achieve visibility, control, and automation in their infrastructure. We will be filming client videos at Pulse starting Sunday, February 21, through Wednesday, February 24. The content will be used to produce short videos that we will leverage to tell the needs clients are addressing in their organizations. Our customers have been sharing their stories throughout 2009 as you can see below. Interested in participating? Notify me at firstname.lastname@example.org
With only 4 weeks until Pulse 2010 - The Premier Service Management Event - Optimizing the World's Infrastructure, I thought it might be helpful to provide some details around the sessions and activities that will be available to all of you storage and information infrastructure enthusiasts out there.... Here are a few sessions that you can attend each day. Sign up for these sessions and others today (requires only an IBM.com password - you do NOT have to be a Pulse registered attendee to create a Pulse schedule online)! (Mon, 22/Feb)
The Data Juggernaut Meets IBM -- Storage & Information Infrastructure Track Kickoff
How Principal Financial Group Upgraded to TSM 6 in a Veritas Clustered Environment
Sprint Storage Virtualization Success with SVC
How France Telecom Benefits from SVC Management and Thin Provisioning
TSM 6 Upgrade Experience at Brookshire Grocery Company
How Pacific Northern Gas and Tourism Australia achieved near instant recovery while reducing costs and risks with TSM and TSM FastBack
How A Major Dutch Insurance Company Got the Most from Its Storage Environment with SVC and TPC
How OhioHealth and VCU Health Systems Leverage IBM Data Protection Software and Storage Systems to Scale for Growth
A Technical Look Inside IBM's Next-Generation Archive Appliance -- the IBM Information Archive
AT&T Automates Server and Storage Provisioning with Tivoli Provisioning Manager
Reduce your Data Storage Footprint to help Survive the Data Tidal Wave
Implementing TSM FastBack at the US Department of State
The Oakwood Healthcare System's Virtualization Story
Shipping Portal INTTRA Supports the Global Supply Chain with a World-Class IT Infrastructure from IBM
Solving the Business Challenge with Excellence: An IT TotalSolutions Approach Success Case
Go to the on-line agenda tool to see additional Storage and Information Infrastructure sessions that may be of interest to you. There are also sessions in the Expo Theater Stream. Register and attend Pulse to take full advantage of all that will be offered:
Data Reduction Chapter 9: Surviving the tidal wave of data with IBM data reduction solutions
I hope everyone had a safe and enjoyable holiday, and I’m looking forward to an exciting and prosperous new year. I’d like to take this opportunity to summarize the topics I’ve been covering in this series of data reduction blogs, and give new readers links to the specific topics that you might be interested in.
Please ask yourself these questions:
Am I experiencing a tidal wave of data that is making it difficult to meet my backup windows and adequately protect all data? Are there increasing service level requirements and corporate governance mandates? Do I need to manage more data for longer periods of time with flat or shrinking budgets? Chapter 1 and Chapter 2
Am I creating large amounts of duplicate data by performing periodic full backups? Chapter 3
Do I know what data I have, where it is, and whether I need to keep it? Do I have pools of orphan, temporary or non-business data taking up valuable space? Chapter 4
Can I automatically migrate older, less-frequently accessed data to secondary tiers of storage to help reduce the overall costs of capacity and the amount of active data that I have to manage? Chapter 5
Am I taking advantage of cost-effective archiving technologies, again to reduce primary storage requirements and help meet information retention mandates? Chapter 6
How can data deduplication solutions help to reduce my data storage footprint? Which deduplication approach is right for me? Chapter 7 and Chapter 8
Through this series, we’ve shown that IBM is the only vendor with a comprehensive set of data reduction solutions that can be applied at multiple points throughout the data creation and management lifecycle. IBM’s broad portfolio of data reduction solutions gives us the freedom to solve your data storage and management issues with the most effective technology for your particular situation. And IBM is continuing to invest in research and development to further develop and deliver the advanced features our customers are requesting.
Data Reduction Chapter 8: Deduplication with Tivoli Storage Manager 6, FastBack and ProtecTIER
So far in this series, we’ve detailed the challenges that the tidal wave of data is placing on storage administrators, and how a smarter, more holistic and comprehensive approach to data reduction is needed to survive in a way that let’s you do more with less.
We covered eliminating the largest source of duplicate data (full backups) and automating the migration, archiving and deletion of older data. Then, in chapter 7, we covered the basics of data deduplication. Now we’ll detail the differences between IBM’s deduplication offerings, and when to best use each.
Let’s talk first about the deduplication capabilities of Tivoli Storage Manager (TSM). This feature is included at no additional charge for TSM 6 Extended Edition customers. This solution can help to reduce recovery times by enabling you to store more backup data and recovery points on disk rather than tape. It works with the data from all sources – via normal backups, data imported via the TSM API, as well as archive and HSM data. TSM deduplicates your disk-based data pools as a post-process, so there is no impact on backup performance. After running, it automatically reclaims the storage that has been freed up.
TSM already eliminates the most common cause of duplicate data – full backups – so the reduction ratios you can expect from TSM’s deduplication solution are fairly modest – the average is about 40%. But when combined with its progressive incremental backup approach and built-in data compression, TSM’s effective data reduction rate is extremely competitive with any other solution on the market, as has been detailed in a commissioned report written by Enterprise Strategy Group (ESG), available here (fair warning – registration required – sorry):
Announced today, Tivoli Storage Manager FastBack v6.1 also includes target-side data deduplication to help reduce the capacity required in the FastBack backup repository, adding to its value as the leading near-instant recovery solution on the market for business critical Windows servers and remote/branch offices. Also announced today was Linux support and tighter integration with the Tivoli Storage Manager Integrated Solutions Console (ISC), delivering on IBM’s vision of true enterprise-wide Unified Recovery Management.
IBM System Storage ProtecTIER is a technology leader in performance, scalability, data integrity and reliability. In true apple to apple comparisons this solution is the fastest on the market in real customer environments. A single ProtecTIER system can easily scale in both performance (1000MB/sec) AND capacity (1PB of deduplicated data). ProtecTIER is one of the few solutions that doesn’t rely on a hash algorithm and performs a byte level differential to ensure data is a duplicate for enterprise class data integrity. And ProtecTIER features all IBM best of breed components versus inexpensive OEM'd parts found in competitive products.
ProtecTIER has been proven in very large production environments and is supported worldwide by IBM’s services operations. The TS7650 ProtecTIER Deduplication Family ranges from small (7TB) to medium (18TB) to large-scale (36TB) appliances. And the TS7650G gateway offerings allow you to add the storage of your choice, up to 1PB. Active-Active cluster configurations also provide high availability capabilities.
Review - Choosing TSM or ProtecTIER for Data Deduplication
While TSM works very well in ProtecTIER environments, you wouldn’t use both TSM deduplication and ProtecTIER deduplication simultaneously. That would require twice as much work for no additional benefit. So when should you choose one over the other? Both solutions offer the benefits of target side deduplication: greatly reduced storage capacity requirements (especially when using TSM’s progressive incremental backup). You’ll have lower operational costs, energy usage and Total Cost of Ownership. You also get faster recoveries with more data on disk.
Use TSM 6 built-in data deduplication when you desire that deduplication operations be completely integrated within TSM. You want the benefits of deduplication without the costs of separate hardware or software – it ships for free with TSM 6 Extended Edition. Or you desire end to end data lifecycle management with minimized data store requirements.
Use ProtecTIER when: • You need the highest performance up to 1000 MB/sec or more • You have a large amount of data and need scalable capacity and performance • You need inline deduplication to avoid the operational impact of post processing • You are deduplicating across multiple TSM (or other backup) servers • You don’t have TSM and are performing weekly full backups.
To learn more, please visit the Data Reduction Solutions web page and stay tuned for chapter 9, where we’ll summarize IBM’s holistic approach to data reduction and show you how we can help you survive the tidal wave of data.
"The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions."
As discussed earlier chapters, data deduplication is a hot technology that is used to reduce data storage capacity requirements. If you employ smart choices in backup and data management processes, you might not need data deduplication. But if you keep all of your inactive and unimportant data on your production storage systems, and use backup software that forces you to perform repetitive full backups of all that static data, then data deduplication can provide you with a huge benefit.
The basic idea behind data deduplication is to store just one copy of any data object, and place pointers to the single copy wherever duplicates are eliminated. Some solutions do this at a file level, so that the files have to be exactly the same to be deduplicated. This is often called single-instance storage (SIS). Other solutions deduplicate data at a fixed or variable block length. IBM’s solutions use a blended approach based on the size of the data—file-based for smaller files, and variable block for larger files.
Most deduplication solutions run a checksum algorithm against the selected data to create a hash signature, then check to see if that signature has ever been seen before. If it has, the data is discarded and a pointer to the already stored data is put in its place. A small number of high-end solutions perform a complete byte-level differential comparison of the data to remove all potential for “data collisions,” where two distinct data blocks may share the same hash signature.
Data deduplication can and does occur at many points in the data creation and management life cycle. In general, these points of deduplication can be broken into source-side, where the data is created, and target-side, where it is stored and managed. Backup applications, for example, can perform source-side deduplication by not transferring data that has previously been backed up over the LAN or WAN, saving on bandwidth.
On the target side, the most popular use of deduplication is in virtual tape libraries, or VTLs. These disk-based systems emulate tape libraries and drives, but apply deduplication to store equivalent amounts of data on disk very cost-effectively while providing performance advantages over tape. Performing deduplication on tape-based systems is considered to be a bad idea, given the portable nature of tapes and the need to recycle them over time; it would be very difficult to guarantee that you maintain the original data for all of the pointers that are out there.
Today, IBM offers two compelling data deduplication solutions. The Extended Edition of Tivoli Storage Manager 6 includes deduplication capabilities to eliminate duplicate data that has been backed up from multiple production systems. Again, TSM’s progressive-incremental backup methodology does not create massive amounts of duplicate data, so the deduplication is only effective when the same data exists on different systems.
The other solution is the IBM System Storage ProtecTIER® family of deduplication systems for reducing data coming from multiple sources, including Tivoli Storage Manager servers, backups from other backup systems, or archive software solutions.
A lot of customers ask when they should use TSM deduplication and when they should use ProtecTIER. I’ll cover this question in detail in my next blog, but the simple answer is:
Use TSM deduplication when you have a single TSM server; you want to improve TSM recovery times by storing more backup data on disk; or there isn’t a large amount of duplicate data across the systems protected by multiple TSM servers.
Use the IBM System Storage ProtecTIER TS7650 Deduplication solutions when: you have multiple TSM servers; you have other sources of backup and archive data; or you are using other (non-IBM) backup products that perform periodic full backups.
To learn more, please visit the Data Reduction Solutions web page and stay tuned for chapter 8, where I’ll talk about choosing between Tivoli Storage Manager and IBM System Storage ProtecTIER for your data deduplication needs in greater depth.
"The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions."
I’m back with the next installment on ideas for helping you to reduce the amount of storage capacity you need for an ever-increasing amount of data, and the amount of time you spend managing it. The last chapter covered transparently automating the migration of data from primary storage to secondary systems. An extension of this thought is archiving.
Archiving is another important data reduction technique for certain types of data. One example of this would be financial reporting data (such as weekly, monthly, quarterly, annual data), that needs to be retained for future trending, requirements or auditing, but does not need to consume valuable disk space where live data should reside. Historical medical records and customer statements also often fit into this category.
Archiving is for long-term record retention. It differs from backup in that it keeps files for a specific amount of time (where backup keeps a certain number of versions of a file) while removing the data from the primary production storage systems completely.
Key features of IBM archiving solutions include:
Long-term storage on cost-effective media.
Point-in-time copies that provide revision history and enable auditability.
Data deduplication to remove redundant copies of data.
Retention period and “retention hold” policy enforcement.
Fast expiration processing.
Using IBM archiving solutions for records retention can help you:
Speed file-server recovery times by moving archived files and file archive copies to a hierarchy of lower-cost storage.
Reduce backup times and resource usage by focusing on active files only.
Locate historical information easily using archived files that are indexed with descriptive metadata.
IBM offers a choice of solutions for archiving, depending on customer preferences and the applications involved.
Tivoli Storage Manager 6includes an archiving capability directly integrated into its client backup software. It is policy based, allowing the administrator to set retention times. If the requirement for how long a file must be retained changes, all the administrator has to do is update the policy, and the solution will retroactively update the already archived files; there is no need to restore and re-archive, as some competitive offerings require. Tivoli Storage Manager also offers the option of integrating data from many different applications into your archive repository, and the archive repository can be a virtualized pool of heterogeneous storage systems.
IBM Information Archive, which contains a specialized version of Tivoli Storage Manager called IBM System Storage™ Archive Manager, is a standalone archive appliance that ingests data directly from more than 40 applications including messaging, healthcare and medical imaging, design and engineering, document management, and others.
Database archiving with IBM Optim and Tivoli Storage Manager
IBM Optim™ Data Growth Solution is a unique database archiving solution that transparently migrates unneeded records from database tables to secondary storage. Like Tivoli Storage Manager’s space management and archive solutions, Optim provides database and storage administrators with a range of cost and performance benefits.
There are also benefits to using Tivoli Storage Manager in conjunction with Optim, which works seamlessly with Tivoli Storage Manager’s application program interface (API) to move archived database records directly into Tivoli Storage Manager’s storage hierarchy.
Optim can also be used with other file-based backup/restore products; however, this involves a two-step process to first archive the data and then back it up. When used with Tivoli Storage Manager, Optim automatically archives database records and then uses the API to store/archive data in a Tivoli Storage Manager storage pool hierarchy. With any other file-based backup/restore product, Optim uses standard file operations to store/archive data in a disk-based file system, and then the backup product can backup the file to supported backup media.
Using Optim and Tivoli Storage Manager together allows you to:
Archive data directly to disk or tape or have Optim use Tivoli Storage Manager to automatically migrate it to tape.
Back up Optim archive data incrementally to a Tivoli Storage Manager storage pool that can be managed by Tivoli Storage Manager for local availability, disaster recovery or remote vaulting.
To learn more, please visit the Data Reduction Solutions web page and stay tuned for chapter 7, where we’ll talk about data deduplication and compression as the next options in an effective, holistic approach to reducing your overall data storage footprint.
"The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions."
Data Reduction Chapter 5 - Automated Data Migration
In previous chapters, we’ve talked about the need to reduce your data storage footprint in order to help survive the tidal wave of data, and the first steps in doing so include eliminating unnecessary duplication of data, and then categorizing your data so you can make smarter decisions on where to store it, and for how long.
In this chapter, we take the next step by automating these data management policies through three distinct processes: migration, archival, and expiration. The net result of these processes is to remove unneeded data from your production storage systems, which will reduce or delay your need to acquire more expensive hardware and reduce administrative costs, all without impacting key operational processes.
In the old days of computing and storage management, the concept of transparently moving data from one tier of storage to another was called hierarchical storage management, or HSM. Given IBM’s heritage in mainframes, we still use that term today. More recently, this concept morphed into Information Lifecycle Management (ILM), but it’s the same basic principle – move older, less-frequently accessed data off your most expensive storage devices onto slower, less costly storage media.
HSM and ILM solutions work transparently in the background, automatically selecting and moving files from primary to secondary tiers of storage based on the policy criteria that you set, such as file size or length of time since a file has been opened. They leave a pointer, or stub file, where the data was originally stored so that users and applications don’t need to worry about where the data was moved; the software transparently reroutes the request for any moved files. These solutions automatically move data to the proper media based upon policies you set, freeing up valuable disk space for active files and providing automated access to the migrated files when needed.
Data migration solutions help customers get control of, and efficiently manage, data growth and its associated storage costs by providing automated space management. These solutions should provide the following key features:
• Storage pool “virtualization” helps maximize utilization of the managed storage resources. • Restore management is optimized based on the location of the data in the hierarchy. • Migration is transparent to the users and to applications. • Migrations are scheduled to minimize network traffic during peak hours. • Automatic migrations occur outside the backup window. • By setting proper threshold limits, annoying ‘out of disk space’ messages can be eliminated.
The IBM Tivoli Storage Manager (TSM) family includes two solutions for automating the migration of data between multiple tiers of storage. TSM 6 for Space Management is for AIX, HP-UX, Solaris and Linux data, while TSM HSM for Windows is for Windows servers.
Tivoli Storage Manager data migration solutions not only help you clean up your primary storage systems to help them run more efficiently, they can also be used to easily move data to new storage technologies as they are deployed. Migrating files to Tivoli Storage Manager also helps expedite restores, because there is no need to restore migrated files in the event of a disaster.
The benefits of Hierarchical Storage Management or Information Lifecycle Management include:
• Improve response times of file servers by off-loading inactive data • Slow or even stop the growth of your production storage environment • Use existing storage assets more efficiently • Reduce backup times and resource usage by focusing on active files only • Eliminate manual file system clean-up activities
In the next chapter, we’ll look at HSM’s big brother – archiving.
The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions.
Data Reduction Chapter 4: Categorize your data for migration & deletion
In the last chapter, we discussed eliminating the one of biggest causes of data growth the duplication of large amounts of data every time you perform a full backup. In this chapter, well explore the benefits of determining what different types of data you have and categorizing it so that you can manage it most effectively. This will help you set up policies to migrate of less frequently-accessed data to lower-cost tiers of storage, and to delete the data that you no longer need or want. By cleaning out your production storage, you will shorten your backup cycles, and improve application performance.
The next option for reducing the data storage footprint is to assess the different types of data and where they are in the data life cycle. If your organization is like most, you have all your unstructured data in flat file systems, which are probably full of data that you rarely, if ever, need to access. This may include data you are no longer required by law or policy to keep, but that you havent deletedsuch as old e-mails and memosthat could prove costly if discovered in legal proceedings.
The goal is to identify what data can be moved to less expensive tiers of storage, and what data can be deleted entirely from the environment. This will reduce the need to buy more primary storage capacity and make it easier to manage and protect what you have. Backup and restore performance will improve, and it will be easier to prove that you are meeting data retention and expiration policies.
IBM offers IBM Tivoli Storage Productivity Center for Data for this purpose. This solution reports on where your data is, sorted by access or saved dates, who owns it, the application that created it, and numerous other filters. From the intelligence you gain from these reports, you can set meaningful policies in your data management software to automatically take the appropriate action on data that shouldnt be clogging up your primary systems. Tivoli Storage Productivity Center for Data can also help identify and eliminate duplicate data, orphan data, temporary data and non-business data.
To learn more, please visit the Data Reduction Solutions web page and stay tuned for chapter 5, where well talk about automating the migration, archival and expiration of your data.
The "Ask the Experts Online Jam" is a valuable opportunity for the YOU to connect with 75+ real world IBM experts on 30+ Tivoli products. These experts, many from IBM development, are recruited to answer your questions for a concentrated period of 12 hours. (8am eastern - 8pm eastern USA)
Step 1: You have a question - usually fairly technical; Step 2: You find the expert that is best suited to answer the question by browsing for an expert by pre-defined category and product specific; Step 3: You fill in a field on the "Ask the Experts online Jam" web application to submit the question. Step 4: You receive an email answer to you question(s) and the Ask the Expert JAM web application is updated for other members to see.
Ask questions to over 75+ IBM experts on the following 30+ topics:
Datacenter Management tools: IBM Tivoli Monitoring, IBM Tivoli Composite Application Manager for Transactions and WebSphere/J2EE, Tivoli Application Dependency Discovery Manager, Tivoli Provisioning Manager, Tivoli Service Request Manager, Network, Service Assurance and Events: Tivoli Netcool Impact, Tivoli Netcool Performance Flow Analyzer, Tivoli Netcool Performance Manager, Tivoli Netcool/OMNIbus, Tivoli network Manager, Tivoli Network Manager (Precision and NetView/d), Asset Management: Asset Management for IT and Enterprise, Enterprise Asset Management Trends and IBM Maximo Industry Solutions, Security: Tivoli Access Manager, Tivoli Identity Manager, Tivoli Federated Identity Manager, Tivoli Enterprise Acces Manager Single Sign On, Tivoli Compliance Insight Manager, Tivoli Directory Server, Tivoli Key Lifecycle Manager, Tivoli Security Information and Event Manager, Tivoli Security Policy Manager, Storage: Tivoli Storage Flash Copy Manager on AIX and Windows, Tivoli Storage Manager, Tivoli Storage Productivity Center, Tivoli Storage Mangaer (TSM) Fastback, z/OS: Netview for z/OS, OMEGAMON, Tivoli Security for Systems z: Tivoli zSecure Suite
Click here for more information. I personally will be available from 8am to 2pm covering IBM Tivoli Storage FlashCopy Manager on Windows but there will also be many other storage experts available for the entire 12 hours. Please join us!
Data Reduction Chapter 3: Avoiding data duplication
In chapter 2, we noted that one of the largest contributors to data growth is data backup and recovery software that forces you to perform periodic full backups. To recap, when you perform a full backup this weekend, youre duplicating almost everything you backed up last weekend.
Not only does that take a lot of storage capacity, but it also takes a long time and these problems only get worse as you create more new data. (Its no wonder that data deduplication products are so popular; they were designed to eliminate all of this duplicate data. And when they claim to reduce your backup storage footprint by 90 percent or more, this is exactly the data that theyre talking about.)
But what if you never had to perform a full backup again after the initial one? If you backed up only the new and changed data always you wouldnt be creating all that duplicate data that needs an expensive deduplication solution to undo. Shorter backup windows, less storage required, and reduced storage acquisition costs would all be benefits of eliminating that weekly full backup. So would faster restore times, since deduplicated data wouldnt need to be re-hydrated in order to be useful.
IBM has smarter solutions that can help prevent the need to perform full backups. The products in the IBM Tivoli® Storage Manager portfolio of recovery management solutions all provide incremental-forever backups.
IBM Tivoli Storage Manager backs up the files that have changed since the last backup; for larger files, such as huge databases, it can perform sub-file backups, copying only the sections of the file that changed.
IBM Tivoli Storage Manager FastBack takes it to the next level, by backing up only the individual blocks of data that change as they are written to disk; and because it performs backups without impacting applications, it can perform more frequent backups, which means less data at risk of loss.
These are the common backup methodologies and how they compare on backup and restore processing:
Full + incremental
Backup This requires a full backup and then incremental backups over time usually a full backup each weekend with incremental backups for the following six days. Only data that has changed from the day before is transferred to tape. Then at the end of the week another full backup must be run.
Restore The full backup must be restored, then each days incremental data applied to it. This means that if you have a full backup and three incremental backups of the same file, it will be restored 4 times. It is a waste of time and money, and introduces risk.
Full + differential
Backup This requires a full backup and then differential backups over time usually a full backup each weekend with differential backups for the following six days. This means that all data that has changed since the last full backup will be backed up. If you assume a 10 percent daily change rate, then you will backup 100 percent (full) on the first day, 10 percent on the second, 20 percent on the third, 30 percent on the fourth, 40 percent on the fifth, 50 percent on the sixth, and 60 percent on the seventh. That means that you are backing up 260 percent of your data every week! Youll need 10 times your production capacity for just a month of backups.
Restore You would restore the full backup and then the last differential up to the date you were restoring to. This is faster and more reliable than the Full + Incremental model, but at the cost of much more storage capacity.
Backup This requires a full backup the first time you back up, and then only incremental backups. There are no extra transfers of data, which saves network bandwidth and transfer time, makes backup and restore faster, and can save thousands of dollars in disk and tape costs.
Restore You select the point-in-time that you want to restore from, and then restore the necessary files just once. This is much faster than with the other two methods.
An internal enterprise-class relational database enables Tivoli Storage Manager to perform progressive incremental backups because it tracks each individual file and knows exactly how your computer looked on each day. When a restore is required, only the version of the file needed is restored. Unlike other file-based backup solutions that require you to run periodic (usually weekly) full backups to ensure reasonable recovery times, Tivoli Storage Managers unique progressive incremental backup methodology never requires you to run another full backup after the first one is done to set the base. The result can be a savings of many terabytes of backup capacity every month.
The analysis shown in the figure above starts with 2TB of data and adds or changes 200GB per day. The assumption is that a full backup has already been performed to set the base.
Full + differential, in yellow, shows that once per week, a full backup is performed, and then on each day between the full backups, all the new data is copied since the last full backup was performed. In this scenario, 26TB of capacity would be needed to store one month of backups.
Full + incremental, in blue, shows that once per week, a full backup is performed, and then on each day between the full backups, only the new data since the last backup is copied. In this scenario, 14TB of capacity would be needed to store one month of backups.
Tivoli Storage Managers progressive incremental approach, in red, never requires subsequent full backups. As a result, only 7TB of capacity is needed to store one month of backups in this scenario.
To learn more, please visit the Data Reduction Solutions web page and stay tuned for chapter 4chapter 4, where well cover the discovery and categorization of data to help move it intelligently throughout its lifecycle.
Data Reduction Chapter 2: Surviving the tidal wave of data - options for data reduction
In chapter 1, we discussed the struggles that storage administrators are having with the tidal wave of data. In this chapter, well begin talking about how data reduction technologies can help you survive and even thrive in the face of these challenges.
IBM takes a holistic approach to data reduction, unlike competitors that offer point solutions to problems that they may in fact be causing. For example, a huge contributor to data growth is the repeated duplication of large amounts of data every time you perform a full backup.
So, one option is to avoid data growth from unnecessary data duplication, by only backing up data that has changed since the last backup. This addresses the cause of the problem, not the symptom. For example, if you have a 5 percent per week data change rate, 95 percent of your data didnt change this week. If you perform a full backup on that this weekend, youre duplicating almost everything you backed up last weekend. Not only does that take a lot of storage capacity, but it also takes a long time and these problems only get worse as you create more new data. Its no wonder that data deduplication products are so popular they were designed to eliminate all this duplicate data. And when they claim to reduce your backup storage footprint by 95 percent or more, this is exactly the data that theyre talking about.
Another option is to determine what different types of data you have and categorize it so that you can manage it most effectively, by moving less frequently-accessed data to lower-cost tiers of storage, and by deleting data that you no longer need or want. This will shorten your backup cycles and improve application performance, as well as reduce or delay the need to buy more primary storage capacity.
A third option is to put automated processes in place, based on policies that meet business requirements and/or service level agreements, to migrate, archive and delete data. There are several actions that can be taken on your data files based on criteria such as age, how long it has been since last access, which application created it, etc. These automated solutions can include: Transparent migration of data from production storage systems to a hierarchy of secondary systems; the data remains on-line and available without any modifications to applications. Archival of data, removing it completely from production systems and storing it in secure storage where retention policies can be set and managed. Expiration of data, deleting it from all storage once it no longer needed or to meet corporate governance policies.
The last option is to compress and deduplicate the data you end up putting into your data protection and retention systems. Data deduplication is the most popular technology in this category, and well discuss it and the other technologies mentioned above in greater detail in future chapters of this blog.
Data Reduction Chapter 1: The challenges posed by the tidal wave of data
We're storing and using more data than ever before. The volume of data is growing exponentially, government regulations are expanding and competitive pressures are increasing forcing us to retain more of our data for longer periods of time. But our budgets are flat or being cut. And as we become more dependent on digital information, the costs of losing any of it are increasingly painful. The bottom line, of course, is that we need to do a better job of managing our data assets, and as these assets grow and our budgets shrink, we need to do more with less. So we need smarter solutions.
Storage administrators are on the front lines of the Tidal Wave of Data battle. Some of the challenges from data growth that administrators are struggling with include:
- It takes longer to perform backups; often not completing within backup window allowances; some data is not being adequately protected - It takes longer to perform recoveries; increased downtime equals lost revenue opportunity; data that isnt protected cant be recovered - Cant keep buying more storage; running out of floor space / electrical & cooling capacity; administration and management costs are exploding - New data sources are complicating the problems; new applications coming on-line; mergers and acquisitions are increasing the number of supported systems
IBM can help you build a dynamic storage management infrastructure that will enable you to cope with all of these challenges. We have solutions to help reduce your data storage footprint, and the goals that we set out in these solutions are: to reduce your capital and operational costs; to improve your application availability and service levels; and to help you mitigate the risks associated with losing data and a rapidly changing environment.
With these solutions you should: need less storage; have less data to manage; experience less downtime; and be more competitive. To learn more, please visit the Data Reduction Solutions web page and stay tuned for Chapter 2, where we will outline a holistic and comprehensive approach to data reduction.
Get ready for Pulse 2010, February 21-24 at the MGM Grand Hotel in Las Vegas. Pulse 2010 will be one of the most important storage and service management conference of the year, and one that will deliver the information you need to hear directly from your peers, our partners and your IBM Storage team. The conference will include an impressive storage management agenda covering everything from emerging storage technologies, architectures, back and recovery to archiving, and managing storage in virtualized data centers and server environments. Once again we are very excited to have your peers share best practices from multiple industries, geographies and companies of various sizes.
As your business and data centers continue to evolve, we continue to evolve and adapt our storage and information infrastructure management solutions to meet your growing needs and facilitate your journey to a dynamic storage infrastructure with innovative products and services that matter to your bottom line. Pulse 2010 provides us the opportunity to showcase our commitment to you, and you will see first hand how IBM's increased investment in Storage development has produced an aggressive and exciting roadmap that will expand and enhance our capabilities.
Detailed communications on the hotel and Call for Presentations will be coming your way shortly. The key to successful event is your participation and we hope you play an active role in the agenda. Please visit Pulse 2010 website for more details.