By Steve Hurley, co-author, IBM System x Reference Architecture for Hadoop: InfoSphere BigInsights
Hadoop has become an important part of an enterprise analytic environment. According to the IDC, enterprises spent $12.6 billion on Big Data solutions in 2013. Hadoop can be used as an archive for the tremendous amounts of data generated in today’s complex environments, as an environment for exploratory analytics, as a flexible platform for ETL (extract, transform, load), or in any of a number of possible uses. Consider these workload types:
Enterprise performance management (plan and analyze)
Operational intelligence (sense and response)
Exploration and discovery (model and predict)
The possibilities have vast range, because Hadoop provides a framework for working with tremendous quantities of data in a highly parallel and scalable manner.
IBM InfoSphere BigInsights builds on the Open Source Hadoop foundation by adding a number of valuable features, such as BigSheets and BigSQL. These features enhance time to value and provide the ability for existing applications to take advantage of Hadoop through existing interfaces, such as SQL.
Many enterprises that have been exploring BigInsights within development environments are now deploying into production. And, as BigInsights environments scale out, the challenges of getting the hardware environment right becomes clear.
Infrastructure matters. The potential impact of infrastructure grows as the scale of a solution grows. Design decisions made early on can be difficult to undo and have long-term impacts on performance and manageability. Imbalances in storage, network, compute, or memory across a BigInsights cluster can create bottlenecks that drive down performance and diminish the value of a BigInsights solution. Those imbalances can also drive up costs as unnecessary capacity is purchased to overcome those bottlenecks. A BigInsights infrastructure that grows without a clear scale-out plan quickly becomes unmanageable.
The IBM System x Reference Architecture for Hadoop: InfoSphere BigInsights addresses the challenges of architecting a BigInsights infrastructure. In May of 2014, this reference architecture was updated to support the x3650 M4 BD, significantly boosting performance up to 25%. The reference architecture takes a modular approach by predefining management nodes, data nodes, and edge nodes. These nodes are also interconnected based upon a predefined network architecture.
The reference architecture recommends the following IBM System x servers:
(Note: Big Data Reference Architectures and support are also offered with other ISVs)
These high-performance servers are based on the Intel Xeon Processor E5-2650 v2 with 8 cores and operating at 2.6 GHz. This processor sits in the sweet-spot between price and performance and is ideal for a Big Data solution. These servers support processors with up to 12 cores if needed.
The IBM System x3650 M4 BD server offers a cost-effective, high-capacity storage solution with an energy-smart design, leadership virtualization, and powerful systems management. The standout feature of the x3650 M4 BD server is its support for up to 14 3.5-inch drives (each up to 4 TB, 56 TB in total) in a space-efficient 2U rack design.
These predefined nodes provide a baseline configuration that can be easily modified to meet the per-node capacity, compute, and memory needs of a wide variety of workloads. The predefined network architecture is designed to maximize and balance data flow within the cluster. As a BigInsights solution grows, additional nodes can be added as needed to meet new requirements.
Additionally, IBM Platform Cluster Manager provides GUI-based framework for effectively managing a BigInsights infrastructure environment. Platform Cluster Manager enables administrators to easily roll out changes across the entire hardware environment, such as BIOS or operating system updates. The health and performance of the infrastructure is easily monitored through Platform Cluster Manager. Additionally, Platform Cluster Manager can aid in the deployment of new nodes within a BigInsights hardware environment.
Though the hardware challenges of a BigInsights cluster can quickly grow, the IBM System x Reference Architecture for Hadoop: InfoSphere BigInsights can help an enterprise get off on the right architectural foot, meet future scale-out requirements, and preserve the balanced hardware environment that is critical to optimal BigInsights performance.
For more information read the IBM Redpaper, IBM System x Reference Architecture for Hadoop: InfoSphere BigInsights and visit Big Data Solutions on System x.