What’s the difference between Big Data and just having a lot of data? Big Data has so much volume, variety or velocity that you have to change the way you think about basic functions, such as data storage, management and protection. Often, you don’t notice a problem with Big Data until something breaks. More organizations are exploring Big Data, and a discussion about infrastructure needs can help drive project success
Last week at IBM Interconnect2015, Bernard Shen from Re-Store shared his experiences implementing Big Data solutions for business and scientific environments. Bernard began with a discussion of the similarities between scientific and business workloads, when they run at a large scale. As business data grows into the petabyte range, much can be learned from scientific workloads that use scale-out supercomputer environments. There are differences, to be sure (file sizes, use cases, etc.), but the similarities are much greater.
When data grows to multiple petabytes, traditional data management and data protection paradigms break down. For example, the time required to find, backup or restore files can be unacceptable using ordinary processes and infrastructures. Supercomputer infrastructures provide a better model, because they’re designed to provide excellent service levels, regardless of data size, by using grid architectures that expand without architectural bottlenecks.
In multi-petabyte environments, small decisions can have big financial impacts. Bernard shared an example of a Life Sciences project needing multiple petabytes of storage. After understanding how the data would be used, Re-Store recommended a flash system for production workloads, funded by storing inactive data on lower cost storage. The solution delivers better performance than expected and saves millions of dollars, compared to the original proposal. Sometimes, it pays to get a second opinion.
Bernard also shared Best Practices for backup and restores in very large environments. Re-Store has experience with remote mirroring, snapshots, incremental backup, and Hierarchical Space Management. By using a combination of techniques, clients can reduce total costs without sacrificing data protection goals.
If your Big Data project uses all-disk storage, you may be spending too much and getting too little in return. Are you ready for a different approach? You could be spending more on innovation, and less on infrastructure.
Learn about Big Data infrastructure solutions from Re-Store and IBM at Caris Life Sciences and University of Colorado. Watch InterConnect General Session Keynotes via InterConnectGO. To stay connected, follow the conversation by using the hashtag #IBMInterConnect or #IBMStorage on Twitter.
About the Author
Mike Barton is a Worldwide Storage Marketing Manager at IBM. Prior to 2007, Mike was a Technical Manager and Principal IT Specialist for IBM, and a Sales Rep and Principal IT Specialist for Sybase. He holds ITIL Foundation and Gartner Group TCO Certifications. Mike has been with IBM over 15 years and has over 25 years of Information Technology experience. The opinions expressed herein are his own.