How IBM Champions tackle big data, part 1 of 2
Delaney Turner 270003RQ8K Delaney.Turner@ca.ibm.com | | Tags:  information-insights ibmsoftware
0 Comments | 9,257 Visits
How are big data initiatives managed? Who sets the direction? How do the business and IT sides of the equation collaborate? You'll learn a lot about big data down at Information On Demand in October, but for some early answers to these questions I spoke to two IBM Champions with deep expertise in Big Data. Alex Philp is Founder and Chief Technology Officer of TerraEchos. Ivo-Paul Tummers is CEO of Jibes.
Part 1 of our discussion is below. We''ll have part two next week.
It seems that with every announcement we invent a new word to describe the amount of data out there. At some point, these numbers get so big as to be abstract. Ivo-Paul, what are your clients seeing? How much more data are they working with now than two or three years ago?
Ivo-Paul Tummers: Typically our customers see their data volumes growing 20-30% per year. One of the reasons is that large amounts of data are generated purely by keying it in or through physical transactions. But an increasing amount of data is generated by machines and by devices and people moving through networks. For example: GEO data and automatic scanning. Also, we are attributing a lot of data in the execution of processes in transaction on top of what used to be much more light- weight data sets. This is just the growth of the data that is generated by internal company systems. When we also consider data from the ecosystems of our customer, then this growth and diversity is even higher.
Alex, is that what you’re seeing with your clients as well?
Alex Philp: It’s definitely explosive growth. Our customers right now represent large U.S. government organizations. We’re dealing more and more with an explosion of unstructured data. With audio and full-motion video, we’re getting into petabytes a day. The big debate is If we’re collecting it, where are we storing it? What are we analyzing and how are we sharing it? Our niche in the big data problem is the filtering and distilling on the real-time side – figuring out how to operate on the data in real-time and deciding what to store for analysis later.
The other aspect of big data is the variety – unstructured data is the next source of insight and analytics – video, audio, things that don’t come in rows and columns. Could you go into more data about how you handle big data for your clients? What capabilities are you using?
Alex Philp: Our customers live on networks, and so we’re trying to structure the conversations with clients around the “3VI Over Network” model. They’re dealing with data coming from the sensor web or “The Internet of Things.” Many of our clients are trying to figure out what part of their IT infrastructure can handle this data. How do we store it? What metadata management systems do we use? How do we index and pre-index video, or elements of data coming out of video? How do we separate the stuff that’s meaningful from the stuff that’s not so meaningful? Then we really get into deciding on the right analytics in the right place. It’s finding the right algorithm against data at rest and data in motion, both offline and online. We’re trying to come up with workflows that increase our customers’ efficiency so they can be proactive in using and exploiting the information. It’s multi-faceted. It’s hardware, software and analytics and workflow optimization.
Ivo-Paul, what’s your take on the variety aspect of big data?
Ivo-Paul Tummers: First, I agree completely with Alex. We’re more in the retail space. The diversity of the data is indeed a bigger challenge than the sheer volume. But the real value is in combing your internal data with as many data sets as possible. The reasoning behind this is simple; your internal data only helps you to get insight into your own performance, not your relative performance. Another factor is that markets are increasingly customer-centric world (opposed to product-centric) which is reflected in the dominance and success of brands and dominant e-tailers like Amazon and Asos. They compete on customer experience and trust. Capturing customer data when it is relevant is key. We see many companies that recognize this changed reality, but they don’t always always know how to compete in this new arena. Luckily, we do see that at a first assessment there is actually more data that people think there is. And you can already make a first simple step in unlocking this potential by connecting these data sets. For example: mailing lists, customer counter records, webshop logs, cash register data, Google analytics data and geographic data can be combined. However, we are still lacking the data quality. This is not necessarily complex, but is a lot of work and slows you down. So bottom-line; the amount of data and the number of nodes (connected datasets) is a real value driver. The real challenge is to to turn all this data into into actionable intelligence.
The last aspect of big data is velocity. I’ve often said that an organization can only move as quickly as its data and today data moves very quickly indeed. Where is the data moving most quickly in your clients' organizations? What challenges does this increased speed pose to their decision-making?
Ivo-Paul Tummers: This is a bit of a paradox; more data to process in increasingly shorter time frames. Across our customer base we see that in retail the pace in which information flows is high. And the value of the data decreases fast, so you need to act fast. This is retail through all verticals, so from banking to clothing. The challenge is in getting the data processed into useful information and routing this to the right people at the right moment. In order to do this connecting datasets is just a precondition and the real work is in filtering, classifying, enriching, routing and actually applying the information. We see this in many of our customer processes ranging from buying decisions (impacting on your stock position and working capital) to driving conversion by targeted offerings to specific customers. The speed at which information travels through the value chain has a deep impact and changes the dynamics of even the most stable internal business processes.
Alex, what are you seeing?
Alex Philp: We’re seeing these trends spill into the commercial sector as well. We’re being challenged to come up with hardware and network configurations that can anticipate 100GB throughput. It’s not uncommon for customers to say, “We want to get 80 to 90 to 100 gigabits in and out of a machine every second. How do we do that? ”We’re working with IBM and others to find the right combination of hardware – FPJ, signal processors, GPU, CPU, power architectures – and the right right network configurations to get that out. We might be dealing with anywhere from 300 to 500 megabytes of data. So we need to figure out how to run computationally intensive algorithms – a Bayesian or Markov algorithm - on data in motion. How do you filter down to the core? The interesting discussion I’m having with customers now gets back to what Ivo-Paul was talking about – the interdisciplinary approach to data and analytics.
Second, a lot of customers are asking us to tunnel horizontally through their data, connect up attributes and link it to data at rest. Some people call this “perpetual analytics.” We can’t afford to stop, so our models are being tuned by our observations. Also, everyone wants to see into the future, customers are trying to get into predictive analytics. We’re combining databases with data warehouses with real-time analytics and trying to find that sweet spot. Not everything is in motion or high-velocity, but you need to balance both attributes in that “3VI Over Network” equation.