Data Reduction Chapter 3: Avoiding data duplication
In chapter 2, we noted that one of the largest contributors to data growth is data backup and recovery software that forces you to perform periodic full backups. To recap, when you perform a full backup this weekend, youre duplicating almost everything you backed up last weekend.
Not only does that take a lot of storage capacity, but it also takes a long time and these problems only get worse as you create more new data. (Its no wonder that data deduplication products are so popular; they were designed to eliminate all of this duplicate data. And when they claim to reduce your backup storage footprint by 90 percent or more, this is exactly the data that theyre talking about.)
But what if you never had to perform a full backup again after the initial one? If you backed up only the new and changed data always you wouldnt be creating all that duplicate data that needs an expensive deduplication solution to undo. Shorter backup windows, less storage required, and reduced storage acquisition costs would all be benefits of eliminating that weekly full backup. So would faster restore times, since deduplicated data wouldnt need to be re-hydrated in order to be useful.
IBM has smarter solutions that can help prevent the need to perform full backups. The products in the IBM Tivoli® Storage Manager portfolio of recovery management solutions all provide incremental-forever backups.
- IBM Tivoli Storage Manager backs up the files that have changed since the last backup; for larger files, such as huge databases, it can perform sub-file backups, copying only the sections of the file that changed.
- IBM Tivoli Storage Manager FastBack takes it to the next level, by backing up only the individual blocks of data that change as they are written to disk; and because it performs backups without impacting applications, it can perform more frequent backups, which means less data at risk of loss.
- IBM Tivoli Continuous Data Protection for Files continuously protects the data on desktop and laptop computers, again copying only the files that are new or changed.
These are the common backup methodologies and how they compare on backup and restore processing:
Full + incremental
Backup This requires a full backup and then incremental backups over time usually a full backup each weekend with incremental backups for the following six days. Only data that has changed from the day before is transferred to tape. Then at the end of the week another full backup must be run.
Restore The full backup must be restored, then each days incremental data applied to it. This means that if you have a full backup and three incremental backups of the same file, it will be restored 4 times. It is a waste of time and money, and introduces risk.
Full + differential
Backup This requires a full backup and then differential backups over time usually a full backup each weekend with differential backups for the following six days. This means that all data that has changed since the last full backup will be backed up. If you assume a 10 percent daily change rate, then you will backup 100 percent (full) on the first day, 10 percent on the second, 20 percent on the third, 30 percent on the fourth, 40 percent on the fifth, 50 percent on the sixth, and 60 percent on the seventh. That means that you are backing up 260 percent of your data every week! Youll need 10 times your production capacity for just a month of backups.
Restore You would restore the full backup and then the last differential up to the date you were restoring to. This is faster and more reliable than the Full + Incremental model, but at the cost of much more storage capacity.
Backup This requires a full backup the first time you back up, and then only incremental backups. There are no extra transfers of data, which saves network bandwidth and transfer time, makes backup and restore faster, and can save thousands of dollars in disk and tape costs.
Restore You select the point-in-time that you want to restore from, and then restore the necessary files just once. This is much faster than with the other two methods.
An internal enterprise-class relational database enables Tivoli Storage Manager to perform progressive incremental backups because it tracks each individual file and knows exactly how your computer looked on each day. When a restore is required, only the version of the file needed is restored. Unlike other file-based backup solutions that require you to run periodic (usually weekly) full backups to ensure reasonable recovery times, Tivoli Storage Managers unique progressive incremental backup methodology never requires you to run another full backup after the first one is done to set the base. The result can be a savings of many terabytes of backup capacity every month.
The analysis shown in the figure above starts with 2TB of data and adds or changes 200GB per day. The assumption is that a full backup has already been performed to set the base.
- Full + differential, in yellow, shows that once per week, a full backup is performed, and then on each day between the full backups, all the new data is copied since the last full backup was performed. In this scenario, 26TB of capacity would be needed to store one month of backups.
- Full + incremental, in blue, shows that once per week, a full backup is performed, and then on each day between the full backups, only the new data since the last backup is copied. In this scenario, 14TB of capacity would be needed to store one month of backups.
- Tivoli Storage Managers progressive incremental approach, in red, never requires subsequent full backups. As a result, only 7TB of capacity is needed to store one month of backups in this scenario.
To learn more, please visit the Data Reduction Solutions web page and stay tuned for chapter 4chapter 4, where well cover the discovery and categorization of data to help move it intelligently throughout its lifecycle.