As an IBM marketing manager, my job includes writing about storage technology. This post is about more than technology, though. It’s about a new breakthrough capability for managing storage costs and service levels.
I recently met with IBM Distinguished Engineer, Mike Sylvia, who has been working on a Business Transformation project to enable automated right tiering for storage in IBM data centers. Right tiering is the notion that data should be hosted on the optimal storage tier to balance cost and performance requirements.
Mike explained that applications tend to be hosted on top tier storage. When he analyzed actual usage patterns, Mike found most data can be effectively hosted on lower cost storage. Mike’s project put numbers to a problem that is often hidden from view and, until now, nearly impossible to solve.
Hosting data on the wrong storage tier turns out to be a huge efficiency problem. Mike predicts IBM will save $13 million over 3 years in one data center, by periodically moving data to the right tier. During the pilot, users saw their cost for storage drop by 50% per TB on average. This is big.
Like many advancements, IBM’s automated right tiering capability is accomplished by integrating existing technology. Mike Sylvia’s project combines storage virtualization, storage management automation and analytics. Today, IBM offers the technology in a bundled solution called SmartCloud Virtual Storage Center.
How does it work?
Step 1. IBM’s storage virtualization controller collects detailed usage metrics about storage it manages throughout the data center, without impacting application performance.
Step 2. IBM’s Storage Analytics Engine studies usage patterns over time to understand performance requirements.
Step 3: Storage tier recommendations are generated in reports that can be shared with application owners and IT management.
Step 4: Storage virtualization enables online data migration, with no disruption to applications or users.
Repeat: Usage patterns change over time, of course, so right tiering becomes an ongoing process.
Why does it work?
Automated right tiering delivers the efficiency benefits of Information Lifecycle Management without the headaches and hidden costs. Automated right tiering has significant benefits for both data owners and IT leaders, so everyone wins.
For example, application and database owners can gain the following benefits:
Applications can move to top tier storage when they need it, without waiting for a maintenance window.
Average storage costs drop significantly, without a drop in services.
IT leaders benefit, too. For example:
Storage tier decisions are based on analysis of actual usage patterns, not predictions. Storage performance management tasks are eliminated.
Data can quickly and easily be moved back to its original storage tier if requested, without incurring an outage.
IBM automated right tiering works with most storage systems, so deployment is nondisruptive.
The technology that enables automated right tiering has significant additional benefits, such as the ability to eliminate scheduled outages for storage system maintenance.
Problem solved. How has your organization addressed the storage right tiering challenge?
Watch a video of Mike Sylvia describing his automated right tiering project at the IBM Edge conference in June, 2012.
Listen to IBM storage virtualization expert and master inventor, Barry Whyte’s 2-part webcast called, “Storage Virtualization - IBM SVC – Benefits”, in April, 2012
Visit IBM’s Virtualized SAP Demo and other smarter solutions at VMworld August 26-30, 2012 in San Francisco
IBM has bundled automated right tiering technology into a new solution called SmartCloud Virtual Storage Center, available through IBM sellers and Business Partners.
What do you think of when you see the name Riverbed
? For those of you not familiar, Riverbed is an IBM partner and the leader in Wide Area Network Optimization. These days, Riverbed offers more than just WAN OP solutions. Riverbed products improve IT infrastructure, speed up application performance, reduce bandwidth utilization, and offer solutions to securely leverage cloud storage. For enterprises looking to implement strategic initiatives such as virtualization, consolidation, cloud computing, and disaster recovery, Riverbed delivers optimum performance for globally connected enterprises without compromising the end user experience.
When organizations consolidate IT and move to cloud environments, the distance created between users and their data often results in high-latency and reduced bandwidth. Riverbed WAN optimization, network performance management, and cloud storage solutions enable enterprises to overcome these drawbacks. Riverbed makes it easy to understand, optimize, and accelerate IT, so that organizations can build a fast, fluid, and dynamic IT architecture.
Steelhead® appliances from Riverbed, Virtual Steelhead(TM), and Steelhead Mobile can increase network throughput and application performance by up to 100 times. Riverbed Cascade® provides enterprise-wide network and application visibility and analysis for both enterprise customers and service providers. Riverbed Whitewater® cloud storage gateways revolutionize data protection by leveraging cloud storage. And Stingray Traffic Manager® provides unprecedented scale and flexibility to deliver applications across the widest range of environments. All in all, Riverbed offers end to end solutions to analyze, accelerate and optimize an organization’s IT infrastructure without compromising performance for the end user no matter how far away they reside from the data center.
Stop by the Riverbed booth E105 at IBM PULSE 2012
to see the latest in IT performance solutions.
"The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions."
and IBM enjoy a strong partnership which, thanks in part to Riverbed’s Whitewater
cloud storage gateways, extends to IBM’s storage management software ecosystem. Whitewater leverages public cloud storage to reduce backup and administration costs, improve disaster recovery readiness and provide secure off-site storage for critical backup data, providing LAN-like access to public cloud storage in a drop-in appliance.
What does this mean for the Riverbed/IBM
partnership? A seamless integration with existing IBM Tivoli Storage Manager
backup infrastructure and cloud-storage providers, paving the way to extracting more value from existing storage, application and network investments. Tivoli Storage Manager administrators can leverage Whitewater’s local caching and public cloud storage abilities to propel them into the next generation of storage and disaster recovery, leaving classic disk- and tape- based devices (and their operational and maintenance costs) behind. Together, Riverbed and IBM offer a best-of-breed solution which slashes costs and enables almost unlimited scalability, taking full advantage of the flexibility and cost savings offered by storage-cloud services.
Riverbed will be demonstrating how fast it can move TSM data to public cloud storage at IBM Pulse 2012
in Las Vegas, March 4-6. At the show, come by booth E-105 to ask for a Whitewater demonstration and learn more about how Riverbed can optimize and extend your TSM environment as well as accelerate your WAN with the Riverbed Steelhead product family.
"The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions."
Every year I try to publish a set of storage trends that I believe most IT shops are trying to address and where technologies exist to help resolve. Here are my thoughts for 2012...
1) Storage breakthroughs
nipping the “Digital Dark Age” in the bud
Since the early 1990’s, an increasing proportion of data
created and used has been in the form of digital data. Today, the world
produces more than 1.8 zettabytes of digital information a year. Yet, digital storage can in many ways be more perishable
than paper. Disks corrode, bits “rot” and hardware becomes obsolete. This
presents a real concern of a “Digital Dark Age” where digital storage
techniques and formats created today may not be viable in the future as the
technology originally used becomes antiquated. We’ve seen this happen—take the floppy disk for example. A
storage tool that was so ubiquitous people still click on this enduring icon to
“save” their digital work and any word, presentation or spreadsheet
documents—yet most Millennials have never seen it in person. But new research shows storage mediums can be vastly
denser than they are today. While new form factors such as solid state disks
will help us provide more stable longer-term preservation of data, and the
promise of "the cloud" allows access to data anywhere, anytime. Recently, IBM researchers combined the benefits of magnetic hard
drives and solid-state memory to overcome challenges of growing memory demand
and shrinking devices. Called Racetrack memory, this breakthrough could lead to
a new type of data-centric computing that allows massive amounts of stored
information to be accessed in less than a billionth of a second. This storage research challenges previous theoretical
limits to data storage—ensuring our digital universe will always be preserved.
2) Data curation will provide
structure in midst of the data deluge
Now that we have the capability to preserve our digital
universe, we need to find a way to make it useful. We need to take the next
step past data preservation to data curation. Data curation is the active and ongoing management of data
through its lifecycle. This smarter data categorization adds value to data that
will help glean new opportunities, improve the sharing of information and
preserve data for later re-use. Social media is a great example of the power of curated
data. Sites like FaceBook, Google+, Pinterest, etc. compile our digital lives
and gives their users a platform to organize their content. However, there's also a lot of work involved in selecting,
appraising and organizing data to make them accessible and interpretable. The
key is bringing data sets together, organizing them and linking them to related
documents and tools. If data can be stored in a way that provides context,
organizations can find new and useful ways to use that data.
3) Storage analytics will open
new business insights
With data curation allowing organizations the platform to
better utilize their data, analytics will help turn that data into intelligence
and, ultimately, knowledge. With the information that historical trending analytics
and infrastructure analytics provides, you can index and search in a more
intelligent way than ever before. By doing analytics on stored data, in backup
and archive, you can draw business insight from that data, no matter where it
exists. The application of IBM Watson technology for healthcare
provides a good example. Watson collects data from many sources and is able to
analyze the meaning and context. By processing vast amounts of information and
using analytics, it can suggest options targeted to a patient's circumstances,
can assist decision makers, such as physicians and nurses, in identifying the
most likely diagnosis and treatment options for their patients. Through intelligent storage and data retrieval systems, we
can learn more with the information we have today to improve service to
customers or open new revenue streams by leveraging data in new ways.
4) Storage becomes a celebrity
– new business needs are pushing storage into the spotlight
As our digital and data-driven universe expands, certain
industries are able to reach new levels of innovation by having the capacity to
house, organize and instantaneously access information. For example, Hollywood is known for its big budget
blockbusters, but it’s the big storage demands required by new formats such as
digital, CGI, 3D and high definition that’s impacting not just the bottom line,
but studios’ ability to produce these types of movies. Data sets for movies
have become so large it’s at the petabyte level. Filmmakers are beginning to trade in film reels for SSDs
as just one day’s worth of filming can generate hundreds of terabytes of data.
The popularity of these high data-generating formats means studios are looking
for new storage technologies that can handle the demand. The healthcare industry may even be facing an even bigger
data dilemma than the entertainment business. Take a look at the Institute
University of Leipzig, in Germany, which has a major genetic study called LIFE
to examine disease in populations. LIFE is cataloging genetic profiles of
several thousand patients to pinpoint gene mutations and specific proteins.
This process alone generates multiple terabytes of data. Even one 300-bed hospital may generate 30 terabytes of
data per year. Those figures will only grow with higher-resolution medical
imaging, and new tools or services such as making electronic healthcare records
5) Intervention...The Data
In this era of Big Data, more is always better, right? Not
so – especially when every byte of data costs money to store and protect. Businesses are turning into data hoarders and spending too
much time and money collecting useless or bad data, potentially leading to
misguided business decisions. This practice can be changed with simple policy
decisions and implementing existing capabilities in technologies that exist in
smarter storage, but companies are hesitant to delete any data (and many times
duplicate data) due to the fear of needing specific data down the line for
business analytics or compliance purposes. Part of the solution starts with eliminating the copies.
Nearly 75% of the data that exists today is a copy (IDC). By deleting and
disabling redundant information, organizations are investing in data quality
and availability for content that matters to the business. Consider the effect
of unneeded data, costing money by replicating throughout an organization’s information
systems. This outdated data can also potentially be accessed for fraud.
the quality of data is not costly—not getting it right is.
In response to: Enabling Private IT for Storage Cloud -- Part II (management controls)
To see a transcript of the live chat held on Friday, September 30th
about this topic visit this link:
And don't forget to listen to the 'open mic' conversation about
Storage Hypervisors with IBM's Ron Riffe, the author of this blog
series, and ESG analyst, Mark Peters:
This is part 3 of a 3 part post on how somebody responsible for a private storage environment could save their company a pile of money by implementing cloud storage techniques. Part I
introduced the concept of a storage hypervisor as a first step in transitioning traditional IT into a private cloud storage environment. Part II
explained how a storage service catalog, self-service provisioning, and usage-based chargeback can be used to drive down cost. In this 3rd post, I’m going to share some thoughts that should help you be smarter about choosing a storage hypervisor.
The first step is to remind ourselves what we’re trying to accomplish with a storage hypervisor. From our experience deploying over 7000 storage hypervisors, the starting point for most folks is improved efficiency and data mobility. Remember, the basic idea behind hypervisors (server or storage) is that they allow you to gather up physical resources into a pool, and then consume virtual slices of that pool until it’s all gone (this is how you get the really high utilization). The kicker comes from being able to non-disruptively move those slices around. In the case of a storage hypervisor, people are looking for the freedom to move a slice (or virtual volume) from tier to tier, from vendor to vendor, and more recently, from site to site all while the applications are online and accessing the data.
To pull off this level of mobility – in servers or storage – it’s important that the hypervisor not be dependant on the underlying physical hardware for anything except capacity (compute capacity in the case of a server hypervisor like VMware, storage capacity in the case of a storage hypervisor). Think about it… Wouldn’t it be odd to have a pair of VMware ESX hosts in a cluster, one running on IBM hardware and one on HP hardware, and be told that you couldn’t vMotion a virtual machine between the two because some feature of your virtual machine would just stop working? If you tie a virtual machine to a specific piece of hardware in order to take advantage of the function in that hardware, it sort of defeats the whole point of mobility. The same thing applies to storage hypervisors. Virtual volumes that are dependant on a particular physical disk array for some function, say mirroring or snapshotting for example, aren’t really mobile from tier to tier or vendor to vendor any more.
But it’s more than just a philosophical issue, there’s real money at stake (you may want to read what comes next a couple of times). In Part II of this post I discussed using a storage service catalog as a means of defining specific service level needs for your different categories of data. These service levels covered the gamut from capacity efficiency and I/O performance (for you techies that’s RAID levels, thin provisioning, use of solid state disks, etc), to data access resilience and disaster protection (multi-pathing, snapshotting, mirroring…). The reason so many datacenters have an over abundance of tier-1 disk arrays on the floor is because, historically, if you wanted to take advantage of things like thin provisioning, application-integrated snapshot, robust mirroring for disaster recovery, high performance for database workloads, access to solid-state disk, etc… you had to buy tier-1 ‘array capacity’ to get access to these tier-1 ‘storage services’ (did you catch the subtle difference?) Now, I don’t have anything against tier-1 disk arrays (my company sells a really good one). In fact, they have a great reputation for availability (a lot of the bulk in these units are sophisticated, redundant electronics that keep the thing available all the time). But with a good storage hypervisor, tier-1 ‘storage services’ are no longer tied to tier-1 ‘array capacity’ because the service levels are provided by the hypervisor. Capacity…is capacity…and you can choose any kind you want. Many clients we work with are discovering the huge cost savings that can be realized by continuing to deliver tier-1 service (from the hypervisor), only doing it on lower-tier disk arrays. As I noted in Part II of this post, we’ve seen clients shift their mix of ‘array capacity’ from 70% tier-1 to 70% lower-tier arrays while continuing to deliver tier-1 ‘storage services’ to their data. This YouTube video
describes an example of that at Sprint.
Smart idea #1: Be careful to understand what, if any, dependency a storage hypervisor has on the capability of an underlying disk array to deliver function to your virtual volumes (like thin provisioning, compression, snapshotting, mirroring, etc.)
Next thought. There are three rather interrelated solution categories in the area of dealing with outages and protecting data.
- Disaster avoidance (“I know the hurricane is coming, let’s move the datacenter further inland”)
- Disaster recovery (“oh oh, the hurricane hit, and my datacenter is dead”)
- Data protection (“oops, I goofed up my data and I need to recover”)
IT managers we talk to have been successfully dealing with disaster recovery (for the techies, that’s array mirroring along with recovery automation tools like VMware Site Recovery Manager
(SRM), IBM PowerHA
, or others) and data protection (that’s array snapshotting along with specific connectors for databases, email systems etc as well as connectors to enterprise backup managers like Tivoli Storage Manager) for years. This third area of disaster avoidance has really gained steam because storage hypervisors now allow you to access the same data at two locations giving you the ability to do an inter-site application migration with things like VMware vMotion
, PowerVM Live Partition Mobility
(LPM), or others. When you are expecting a disaster, disaster avoidance let’s you transparently get out of the way. But it doesn’t magically keep all the other unexpected bad things from happening. You’ll still want to be prepared with disaster recovery and data protection. And if you are implementing a storage hypervisor, you shouldn’t be forced to choose.
Smart idea #2: Remembering smart idea #1, be sure that your storage hypervisor has its own ability to provide for disaster avoidance (inter-site mobility), disaster recovery (mirroring that’s integrated with recovery automation tools) and data protection (snapshotting that’s integrated with applications and backup tools).
One final thought. A storage hypervisor isn’t an island unto itself. Like a server hypervisor, it exists in a broader datacenter. Administrators need to be able to see it in the context of the disk arrays it manages, the servers (or virtual machines) that use its virtual volumes, the applications that need backups or clones, the disaster recovery automation that’s coordinating recovery of servers, storage, networks… You get the picture. When the challenges of day-to-day operations happen (and they do happen most every day)…
- …the storage network planner needs to look at the logical data path from a virtual machine to its physical server, through the SAN switch, to the storage hypervisor and finally to the physical disk array. He’ll need that storage hypervisor to be integrated with a SAN topology tool.
- …an application owner calls up with a performance issue that he’s blaming on ‘the storage’. You’ll need to be able to isolate performance across the whole data path (including the part of the path that goes through the storage hypervisor).
- …an application owner wants a consistent snapshot of his application to use as a backup copy (or a test clone). You’ll need a connector that talks to both the application and the storage hypervisor to identify the virtual volumes that need to be snapshotted, prepare the application for the snapshot, and then provide the application owner with an inventory of snapshots he can use to recover from.
- …you make the move toward cloud techniques in your private datacenter – implementing a storage service catalog, self-service provisioning, and usage-based chargeback. You’ll need a storage hypervisor that can be auto provisioned and that can provide the metrics on who is using how much storage.
Smart idea #3: Make a list of all the day-to-day operational things you do today with physical storage, and the things you hope to automate in the future, and be careful to understand if your storage hypervisor is sufficiently instrumented and integrated – or if it’s creating a new island to be separately managed.
And now a word from our sponsors :-) IBM offers the worlds most widely deployed storage hypervisor. With over 7000 deployments, hundreds in the newer inter-site disaster avoidance configuration, we’ve had a lot of opportunity to build some depth. As you evaluate using cloud storage techniques in your private enterprise, you’ll find things I talked about in this blog series available in IBM products today. They can help you save your company a pile of money (and make you look like a genius while you’re doing it).
Storage hypervisor platform: IBM System Storage SAN Volume Controller
(SVC)Storage hypervisor management, storage service catalog, and self-service provisioning: Tivoli Storage Productivity Center Standard Edition
(TPC SE)Usage-based chargeback: Tivoli Usage and Accounting Manager
Thanks for staying with me through this blog series – hope you find it valuable!
The conversation continues!
This is part 2 of a 3 part post on how somebody responsible for a private storage environment could save their company a pile of money by implementing cloud storage techniques. Part I
introduced the concept of a storage hypervisor as a first step in transitioning traditional IT into a private cloud storage environment. In this 2nd post, I’m going to explain some of the key storage cloud management controls that can be used to drive down cost.
Storage services are standardized. When it comes to shopping, I avoid (at almost all costs) actually going to the store. You can keep all the time and frustration of traffic, fighting for a parking place, wondering aimlessly through aisles of choices, and standing in checkout lines. I’ll take the simplicity and speed of a good online catalog any day.
The idea of shopping from a catalog isn’t new and the cost efficiency it offers to the supplier isn’t new either. Public storage cloud service providers seized on the catalog idea quickly as both a means of providing a clear description of available services to their clients, and of controlling costs. Here’s the idea… I can go to a public cloud storage provider like Amazon S3, Nirvanix, Google Storage for Developers, or any of a host of other providers, give them my credit card, and get some storage capacity. Now, the “kind” of storage capacity I get depends on the service level I choose from their catalog. These folks each offer a small few different service level options. Amazon S3, for example, offers Standard Storage or Reduced Redundancy Storage (can you guess which one costs less?).
Most of today’s private IT environments represent the complete other end of the pendulum swing – total customization. Every application owner, every business unit, every department wants to have complete flexibility to customize their storage services in any way they want. This expectation is one of the reasons so many private IT environments have such a heavy mix of tier-1 storage. Since there is no structure around the kind of requests that are coming in, the only way to be prepared is to have a disk array that could service anything that shows up. Not very efficient… There has to be a middle ground.
Enter the private storage cloud with its storage service catalog. In the consultative service engagements
we’ve done, we have found that most private enterprises have something like fifteen-ish distinct data types (things like database, e-mail, video, shared files, home directories, etc). A simple storage service catalog would describe the specific service levels needed by each of these data types. Let’s take “Database” and build out the scenario.
The first thing you’ll need is a place to create your catalog of storage services. IBM Tivoli Storage Productivity Center Standard Edition is a good option (man, what a mouth full – let’s just call it TPC SE for short… hmm, I’ll probably get fired for that :-) You’re going to use the wizard to create a new “Database” catalog entry.
Now, for each catalog entry, there are a variety of service levels that can be defined that cover things like capacity efficiency, I/O performance, data access resilience, and disaster protection. By this point you’re probably rolling your eyes because you know your application owners… and they’re going to want every byte of their data to have the highest available service in each of these areas (and you wonder why you have so much tier-1 storage). A little bit further into this post we’re going to talk about the wonder of usage-based chargeback, but we’re getting ahead of ourselves. For now, let’s assume you’re having a coherent conversation with your application owners and are able to define realistic needs for your database data. Maybe something like this…
From there, you’re back to the wizard. Actually defining the attributes of the catalog entry is a little mundane (lot’s of propeller head knobs and dials to turn), but once you’re done – you’re done! – and life get’s really efficient. So, let’s get the mundane stuff out of the way. First are the capacity efficiency and I/O performance attributes (be sure and notice that for “Database” we are telling the catalog we want virtual volumes – from a storage hypervisor. There will be a test in a paragraph or so :-)
Then the data access resilience attributes.
And finally the disaster protection attributes.
I told you it was a little mundane. But now come the exciting results that really drive cost out of the environment and save a huge amount of administrative time.
First is capital expense. You’re running mostly tier-1 disk arrays today. You have just finished defining the fifteen-ish catalog entries your company is going to use. Some, like “Database”, call for storage services that are often associated with tier-1 disk arrays. Most others don’t. With a little intelligent forecasting, you should be able to determine exactly how much tier-1 storage capability you really need, and how much lower-tier storage you can start using We’ve seen clients shift their mix from 70% tier-1 to 70% lower-tier storage (pretty significant capital expense shift). And if the thought of moving all that existing data from tier-1 to a lower tier makes you shudder, refer back to Part I of this post and look again at the data mobility provided by a good storage hypervisor (Test: did you notice that for “Database” we told the catalog we wanted virtual volumes – from a storage hypervisor…).
The second big savings is in operational expense (keep reading).
Storage provisioning is self-service.
Most public storage services are targeted at end users like you and me who bring our credit card and provision some storage. Private storage clouds are a little different. Administrators we talk to aren’t generally ready to let all their application owners and departments have the freedom to provision new storage on their own without any control. In most cases, new capacity requests still need to stop off at the IT administration group. But once the request gets there, life for the IT administrator is sweet!
Here comes the request from an application owner for 500GB of new “Database” capacity (one of the options available in the storage service catalog) to be attached to some server. After appropriate approvals, the administrator can simply enter the three important pieces of information (type of storage = “Database”, quantity = 500GB, name of the system authorized to access the storage) and click the “Go” button (in TPC SE it’s actually a “Run now” button) to automatically provision and attach the storage. No more complicated checklists or time consuming manual procedures.
Storage is paid per use.
It’s the little appreciated – but incredibly powerful tool in the quest to drive cost out of the environment. When end users are aware of the impact of their consumption and service level choices, they tend to make more efficient choices. Conversely (we all know what happens here), when there’s no correlation between service level choices and end user visibility to cost… well… you have a lot of tier-1 storage on the floor.
A chargeback tool like IBM Tivoli Usage and Accounting Manager (TUAM) completes the story we have been building…
- You negotiate a set of storage service levels (like “Database”) with your application owners and business units.
- You create the storage service catalog entry for “Database”
- Your end users request some new “Database” capacity be assigned to a particular server.
- You push the “Run now” button and the capacity is auto-provisioned.
- Your end user receives an invoice (complete with individual line items for each class of service in which they are consuming capacity).
- You’re in the cloud now!
Stay tuned for Part III
of this post where I’ll explore some technical thoughts you’ll want to consider when picking a storage hypervisor. The conversation is building!
Earlier this week, fellow IBM blogger Tony Pearson joined the conversation with a perspective on Storage Hypervisor integration with VMware
. And IBM blogger Rich Vining added a perspective on VMware Data Protection with a Storage Hypervisor
. To cap it off, we just completed our first live group chat with over 30 IT managers, industry analysts, independent bloggers, and IBM storage experts. Join the conversation!
The virtual dialogue on this topic will continue in another live group chat on September 30, 2011 from 12 noon to 1pm Eastern Time.
To borrow a phrase from a fellow blogger… Interest from customers on cloud storage is very, very hot, and that’s been keeping us very, very busy. The interest underscores the fact that public storage cloud providers have sent a “cost shockwave” through the industry and customers are taking notice.
While CIO’s may still be too concerned about security and service levels to put much real corporate information in the public cloud, they have taken notice that these service providers are offering storage capacity at prices that are often lower than what they are paying for their own private storage. Sure, a service provider theoretically has more economy of scale and so could demand a better price from their hardware vendors, but they also have some profit margin to build into their “service”. There has to be more to it. The customers I talk to are wondering what these service providers are doing to operate at those costs – and if any of their techniques can be applied in a private storage environment.
The situation begs the question “what is it that differentiates these public storage clouds from the traditional private storage environments that most clients operate?” From our experience with customers, there are four significant differences.
- Storage resources are virtualized from multiple arrays, vendors, and datacenters – pooled together and accessed anywhere.
(as opposed to physical array-boundary limitations)
- Storage services are standardized – selected from a storage service catalog.
(as opposed to customized configuration)
- Storage provisioning is self-service – administrators use automation to allocate capacity from the catalog.
(as opposed to manual component-level provisioning)
- Storage usage is paid per use – end users are aware of the impact of their consumption and service level choices.
(as opposed to paid from a central IT budget)
In this post, I’m going to try to explain these four concepts in sufficient detail that somebody responsible for a private storage environment could walk away with some practical recommendations that could save their company a pile of money. Most of this isn’t really original (the concepts have been around for a while), but so few enterprises operate this way that the person who introduces their company to these ideas often looks like a genius (and who doesn’t like that!!). It’s a long topic, so I’ve broken it into 3 posts.
In Part I of this post:
I’ll explain the value of virtualizing storage resources. Hint: you’ve likely already done it to your server resources with some sort of server hypervisor like VMware vSphere
, or IBM PowerVM
, or Microsoft Hyper-V
… so now let’s look at what you get from doing it to your storage resources with a storage hypervisor
In Part II of this post:
I’m going to explain how public storage clouds use management controls like service catalogs, self-service provisioning, and pay-per-use to drive down their costs. I’ll also try to offer some practical ideas for using these techniques in a private enterprise setting to gain similar efficiencies.
In Part III of this post:
I’m going to explore some technical thoughts you’ll want to consider when picking a storage hypervisor.
Ready to jump in?
Storage resources are virtualized. Do you remember back when applications ran on machines that really were physical servers (all that “physical” stuff that kept everything in one place and slowed all your processes down)? Most folks are rapidly putting those days behind them. Server hypervisors and the virtual machines they manage have improved efficiency (no more wasted compute resources), freed up mobility, and ushered in a whole new “cloud” language.
Well, the same ideas apply to storage. As administrators catch on to these ideas, it won’t be long before we’ll be asking the question “Do you remember back when virtual machines used disks that really were physical arrays (all that “physical” stuff that kept everything in one place and slowed all your processes down)?”
In August, Gartner published a paper
that observed “Heterogeneous storage virtualization devices can consolidate a diverse storage infrastructure around a common access, management and provisioning point, and offer a bridge from traditional storage infrastructures to a private cloud storage environment” (there’s that “cloud” language). So, if I’m going to use a storage hypervisor as a first step toward cloud enabling my private storage environment, what differences should I expect? (good question, we get that one all the time!)
Perhaps the most obvious expectations are improved efficiency and data mobility. The basic idea behind hypervisors (server or storage) is that they allow you to gather up physical resources into a pool, and then consume virtual slices of that pool until it’s all gone (this is how you get the really high utilization). The kicker comes from being able to non-disruptively move those slices around. In the case of a storage hypervisor, you can move a slice (or virtual volume) from tier to tier, from vendor to vendor, and now, from site to site all while the applications are online and accessing the data. This opens up all kinds of use cases that have been described as “cloud”. One of the coolest is inter-site application migration. Just recently, a hurricane hit the eastern cost of the United States. If your datacenter had been in the projected path of that hurricane and if you had implemented both a server hypervisor (let’s say VMware vSphere for your Intel servers and IBM PowerVM for your Power systems), and a storage hypervisor platform (let’s say IBM SVC), then here’s what you might have said: “Hey, the hurricane is coming, let’s move operations to another datacenter further inland…” IBM SVC Stretched-cluster allows you to access the same data at both locations giving you the ability to do an inter-site VMware vMotion
and PowerVM Live Partition Mobility
(LPM) move – non-disruptively. As far as the end users are concerned, their applications are running in a private cloud. For you… you avoided a disaster and got to sleep well that weekend.
But storage hypervisors are more, much more than just virtual slices and data mobility. Remember, we’re trying to think like a service provider who is driving cost out of the equation. Sure, we’re getting high utilization from allocating virtual slices, but are we being as smart as we could be about allocating those slices? A good storage hypervisor helps you be smart.
- Thin provisioning: You have a client that asks for 500GB of new capacity. You’re going to give it to him as thin provisioned virtual capacity which is a fancy way of saying you’re not going to actually back it with real physical storage until he writes real data on it. That helps you keep cost down.
- Compression: Same guy also asks to keep several snapshot copies of his data for recovery purposes. You’re going to start by giving him thin provisioned capacity for those snapshots, but you’re also going to compress whatever data those snapshots produce – again adding to your efficiency.
- Agnostic about vendors: Because you’re providing virtual storage resources from a storage service catalog (we’ll talk more about that in Part II of this post), you have the freedom to shift the physical storage you operate from all tier-1 to a more efficient mix of lower tiers, and while you’re doing it you can create a little competition among as many disk array vendors as you like to get the best price / support.
- Smart about tiers: If you shut your eyes real tight and think about the concept of a “virtual” disk that is mobile across arrays and tiers, you’ll quickly start asking questions about having the storage hypervisor watch for I/O patterns on blocks within that virtual disk that would benefit from higher tier capacity, like solid-state (SSD) or flash disk for example. A good storage hypervisor will automate the detection of such patterns and move hot data blocks to these highest tiers of storage if you have them.
Are you getting the picture of why so many enterprises are beginning to agree with Gartner that a storage hypervisor can be a great first step in transitioning traditional IT into a private cloud storage environment? Application owners come to you for storage capacity because you’re responsible for the storage at your company. In the old days, if they requested 500GB of capacity, you allocated 500GB off of some tier-1 physical array – and there it sat. But then you discovered storage hypervisors! Now you tell that application owner he has 500GB of capacity… What he really has is a 500GB virtual volume that is thin provisioned, compressed, and backed by lower-tier disks. When he has a few data blocks that get really hot, the storage hypervisor dynamically moves just those blocks to higher tier storage like SSD’s. His virtual disk can be accessed anywhere across vendors, tiers and even datacenters. And in the background you have changed the vendor storage he is actually sitting on twice because you found a better supplier. But he doesn’t know any of this because he only sees the 500GB virtual volume you gave him. It’s “in the cloud”.
Stay tuned for Part II
of this post…
Join the conversation!
The virtual dialogue on this topic will continue in a live group chat
on September 23, 2011 from 12 noon to 1pm Eastern Time
. Join some of the Top 20 storage bloggers, key industry analysts and IBM Storage subject matter experts to discuss storage hypervisors and get questions answered about improving your private storage environment.