Those are great questions.
Additionally, you should consider asking yourself these questions
that relate to, "What's the Value of this Data to the
1. Do you have a plan for recovery of that data if lost or
2. How fast is that data growing and how are you dealing with the growth?
3. How are you providing increasing service levels with lower cost?
By attending the Storage and Information Infrastructure track at
Pulse 2010, you'll find the answers to the questions I've added
along with answers to any additional questions you may have
concerning your storage, data, and information management.
Take a look at the video below and see how Tivoliman Tames the Data Juggernaut
New Product Announced Dec. 15, 2009 IBM Tivoli Storage Manager FastBack for Workstations is an automated, continuous data protection and recovery software solution for desktop and laptop computers, with central management for thousands of systems, and integration with other Tivoli Storage Management offerings. Here is the URL for this bookmark: http://www-01.ibm.com/software/tivoli/products/storage-mgr-fastback-workstation/
Data Reduction Chapter 8: Deduplication with Tivoli Storage Manager 6, FastBack and ProtecTIER
So far in this series, we’ve detailed the challenges that the tidal wave of data is placing on storage administrators, and how a smarter, more holistic and comprehensive approach to data reduction is needed to survive in a way that let’s you do more with less.
We covered eliminating the largest source of duplicate data (full backups) and automating the migration, archiving and deletion of older data. Then, in chapter 7, we covered the basics of data deduplication. Now we’ll detail the differences between IBM’s deduplication offerings, and when to best use each.
Let’s talk first about the deduplication capabilities of Tivoli Storage Manager (TSM). This feature is included at no additional charge for TSM 6 Extended Edition customers. This solution can help to reduce recovery times by enabling you to store more backup data and recovery points on disk rather than tape. It works with the data from all sources – via normal backups, data imported via the TSM API, as well as archive and HSM data. TSM deduplicates your disk-based data pools as a post-process, so there is no impact on backup performance. After running, it automatically reclaims the storage that has been freed up.
TSM already eliminates the most common cause of duplicate data – full backups – so the reduction ratios you can expect from TSM’s deduplication solution are fairly modest – the average is about 40%. But when combined with its progressive incremental backup approach and built-in data compression, TSM’s effective data reduction rate is extremely competitive with any other solution on the market, as has been detailed in a commissioned report written by Enterprise Strategy Group (ESG), available here (fair warning – registration required – sorry):
Announced today, Tivoli Storage Manager FastBack v6.1 also includes target-side data deduplication to help reduce the capacity required in the FastBack backup repository, adding to its value as the leading near-instant recovery solution on the market for business critical Windows servers and remote/branch offices. Also announced today was Linux support and tighter integration with the Tivoli Storage Manager Integrated Solutions Console (ISC), delivering on IBM’s vision of true enterprise-wide Unified Recovery Management.
IBM System Storage ProtecTIER is a technology leader in performance, scalability, data integrity and reliability. In true apple to apple comparisons this solution is the fastest on the market in real customer environments. A single ProtecTIER system can easily scale in both performance (1000MB/sec) AND capacity (1PB of deduplicated data). ProtecTIER is one of the few solutions that doesn’t rely on a hash algorithm and performs a byte level differential to ensure data is a duplicate for enterprise class data integrity. And ProtecTIER features all IBM best of breed components versus inexpensive OEM'd parts found in competitive products.
ProtecTIER has been proven in very large production environments and is supported worldwide by IBM’s services operations. The TS7650 ProtecTIER Deduplication Family ranges from small (7TB) to medium (18TB) to large-scale (36TB) appliances. And the TS7650G gateway offerings allow you to add the storage of your choice, up to 1PB. Active-Active cluster configurations also provide high availability capabilities.
Review - Choosing TSM or ProtecTIER for Data Deduplication
While TSM works very well in ProtecTIER environments, you wouldn’t use both TSM deduplication and ProtecTIER deduplication simultaneously. That would require twice as much work for no additional benefit. So when should you choose one over the other? Both solutions offer the benefits of target side deduplication: greatly reduced storage capacity requirements (especially when using TSM’s progressive incremental backup). You’ll have lower operational costs, energy usage and Total Cost of Ownership. You also get faster recoveries with more data on disk.
Use TSM 6 built-in data deduplication when you desire that deduplication operations be completely integrated within TSM. You want the benefits of deduplication without the costs of separate hardware or software – it ships for free with TSM 6 Extended Edition. Or you desire end to end data lifecycle management with minimized data store requirements.
Use ProtecTIER when: • You need the highest performance up to 1000 MB/sec or more • You have a large amount of data and need scalable capacity and performance • You need inline deduplication to avoid the operational impact of post processing • You are deduplicating across multiple TSM (or other backup) servers • You don’t have TSM and are performing weekly full backups.
To learn more, please visit the Data Reduction Solutions web page and stay tuned for chapter 9, where we’ll summarize IBM’s holistic approach to data reduction and show you how we can help you survive the tidal wave of data.
"The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions."
Have you played around with IBM Tivoli Storage FlashCopy Manager on Windows yet? If not, maybe it's time to take a look.
When you think of FlashCopy Manager, think of snapshots. FlashCopy Manager provides fast application-aware backups and restores leveraging advanced snapshot technologies. I have been writing software as a developer for IBM Tivoli Storage Manager for almost 20 years now and this technology is one that is changing the industry. Yes, snapshots have been around for a while, but it isn't until the last few years that applications are really starting to embrace them, and in some cases, even require them for their backup needs. There is just too much data to process, too much overhead to back them up, and too little time. People want their applications to serve email and provide access to database tables, not spend their precious cycles on backups. FlashCopy Manager helps address these issues.
FlashCopy Manager follows up on the heels of IBM Tivoli Storage Manager for Copy Services (TSM for CS) which provided snapshot support for Microsoft SQL Server and Microsoft Exchange Server using Microsoft's Volume Shadow Copy Service (VSS). The really cool thing is that you do not need to have a TSM Server in order to use FlashCopy Manager to manage your snapshots. It will work completely stand-alone if you want. But, if you have a TSM Server already, you can use it to extend the power of FlashCopy Manager even more.
What is VSS? VSS is Microsoft's snapshot architecture. It provides the infrastructure for applications, storage vendors, and backup vendors to be able to perform snapshots in a federated and efficient way. Microsoft thinks VSS and snapshots are important enough to require any new software releases that come out of Redmond to be able to be backed up and restored using VSS. If you are running Microsoft Exchange Server or Microsoft SQL Server, you should take a look at snapshots. Microsoft has been supporting snapshots with Exchange and SQL for years, but Microsoft Exchange Server 2010 is kicking it up a notch. Microsoft Exchange Server 2010 is only supporting backups through VSS. Yes, you heard it right, Microsoft does not support legacy style (streaming) backups with Exchange Server 2010. So, if you are planning a move to Exchange Server 2010, it really behooves you to start looking at Microsoft's Volume Shadow Copy Service (VSS), how it works, and the benefits and complexities it brings with it.
Microsoft's Volume Shadow Copy Service (VSS) is complex and involves multiple moving parts. It will pay for you to invest some time to understand more about it. I have put together some links that will help you get started:
As discussed earlier chapters, data deduplication is a hot technology that is used to reduce data storage capacity requirements. If you employ smart choices in backup and data management processes, you might not need data deduplication. But if you keep all of your inactive and unimportant data on your production storage systems, and use backup software that forces you to perform repetitive full backups of all that static data, then data deduplication can provide you with a huge benefit.
The basic idea behind data deduplication is to store just one copy of any data object, and place pointers to the single copy wherever duplicates are eliminated. Some solutions do this at a file level, so that the files have to be exactly the same to be deduplicated. This is often called single-instance storage (SIS). Other solutions deduplicate data at a fixed or variable block length. IBM’s solutions use a blended approach based on the size of the data—file-based for smaller files, and variable block for larger files.
Most deduplication solutions run a checksum algorithm against the selected data to create a hash signature, then check to see if that signature has ever been seen before. If it has, the data is discarded and a pointer to the already stored data is put in its place. A small number of high-end solutions perform a complete byte-level differential comparison of the data to remove all potential for “data collisions,” where two distinct data blocks may share the same hash signature.
Data deduplication can and does occur at many points in the data creation and management life cycle. In general, these points of deduplication can be broken into source-side, where the data is created, and target-side, where it is stored and managed. Backup applications, for example, can perform source-side deduplication by not transferring data that has previously been backed up over the LAN or WAN, saving on bandwidth.
On the target side, the most popular use of deduplication is in virtual tape libraries, or VTLs. These disk-based systems emulate tape libraries and drives, but apply deduplication to store equivalent amounts of data on disk very cost-effectively while providing performance advantages over tape. Performing deduplication on tape-based systems is considered to be a bad idea, given the portable nature of tapes and the need to recycle them over time; it would be very difficult to guarantee that you maintain the original data for all of the pointers that are out there.
Today, IBM offers two compelling data deduplication solutions. The Extended Edition of Tivoli Storage Manager 6 includes deduplication capabilities to eliminate duplicate data that has been backed up from multiple production systems. Again, TSM’s progressive-incremental backup methodology does not create massive amounts of duplicate data, so the deduplication is only effective when the same data exists on different systems.
The other solution is the IBM System Storage ProtecTIER® family of deduplication systems for reducing data coming from multiple sources, including Tivoli Storage Manager servers, backups from other backup systems, or archive software solutions.
A lot of customers ask when they should use TSM deduplication and when they should use ProtecTIER. I’ll cover this question in detail in my next blog, but the simple answer is:
Use TSM deduplication when you have a single TSM server; you want to improve TSM recovery times by storing more backup data on disk; or there isn’t a large amount of duplicate data across the systems protected by multiple TSM servers.
Use the IBM System Storage ProtecTIER TS7650 Deduplication solutions when: you have multiple TSM servers; you have other sources of backup and archive data; or you are using other (non-IBM) backup products that perform periodic full backups.
To learn more, please visit the Data Reduction Solutions web page and stay tuned for chapter 8, where I’ll talk about choosing between Tivoli Storage Manager and IBM System Storage ProtecTIER for your data deduplication needs in greater depth.
"The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions."
You've probably heard your mother say "you never get a second chance to make a first impression". So, since today marks my first entry into the blogosphere, I wanted to hit a home run, providing not only some interesting perspective, but also some hard facts that readers can use to potentially save some time and money.
If you have been paying much attention to developments in storage and computing infrastructure in the last few years, you have noticed a significant trend toward virtualization. Servers aren't servers any more, they are virtual machines. Tapes aren't tapes any more, they are virtual tape libraries like the IBM TS7650 ProtecTIER Deduplication Appliance. And in the area of disk virtualization, the most widely adopted approach is the IBM SAN Volume Controller (SVC).
Up until now, disk virtualization has been an enterprise-wide thought. Storage managers who are tasked with taking care of hundreds of TB's, and often PB's of disks have for years turned to SVC to help eliminate the pain of migrating data between arrays. For these administrators, disk virtualization with SVC has also helped provide a common set of management interfaces and proceedures across storage from different vendors, and has helped to create a common set of services like thin provisioning, snapshotting, and mirroring across different tiers of storage.
Not every storage manager, though, is responsible for PB's, or even hundreds of TB's of storage. Most administrators are just looking for an affordable and 'easy to manage' means of satisfying the next request for more storage on Exchange, or SAP, or... About a month ago, IBM introduced some important changes in its mid-range disk virtualization product, SVC EE, designed with these storage managers in mind.
Perhaps the best way to describe these changes is with a picture... (Click on the picture to enlarge) One of the challenges with traditional disk arrays is that they are relatively inflexible. Think about it... the arrays that have a lot of function (thin provisioning, excellent snapshotting, mirroring, etc.) are generally large, monolithic things that can take up a lot of real estate and burn a lot of power before you get to the first byte of storage. On the other hand, the arrays that are more modular -- allowing incremental growth -- generally don't offer the best software capabilities. And what's more, all of them generally charge an arm and a leg for the software capabilities they do offer.
The important thing IBM did was to package its virtual controller software in an affordable form factor and price it in such a way that mid-sized administrators can build and grow their storage infrastructure modularly. Do you need more disk capacity for a new application? Add an IBM DS3400 SAS disk enclosure. Do you have plenty of capacity but just want some more performance or connectivity? Add an SVC 8A4 controller pair. Do you have plenty of performance but just want some more capacity for archiving? Add a DS3400 SATA disk enclosure. With this sort of modular approach to scaling, the incremental cost of adding capacity can be greatly reduced.
Regardless how you choose to grow your virtual disk system, there are a valuable set of services that are all included in the base software license (e.g. no extra charge). They include:
Transparent data migration from other arrays in your datacenter to improve appliacation availability
Thin provisioning so you can get more effective use out of your storage assets
FlashCopy (IBM's name for snapshot copies) to cut down the time required for application backup and cloning. This is the newest addition to the list of included features. Prior to a month ago, SVC EE FlashCopy carried a separate price.
Although I have used IBM DS3400 disk encousures in my example, a virtual disk system of unlimited size can be constructed using any number of IBM DS3400, DS4000 or DS5000 family disks. SVC EE can also virtualize up to 250 disks from other IBM or non-IBM disk systems.
Lower incremental cost for adding capacity. Efficient SAS and SATA disks. A valuable set of software functions included in the base price. Common management from the smallest configuration to the largest. Would that help save some time and money?