Information visualization: The flipside of information analytics
Delaney Turner 270003RQ8K Delaney.Turner@ca.ibm.com | | Tags:  ibmsoftware
0 Comments | 9,398 Visits
The following is the sixth and final installment in our series on Advanced Data Visualization. Over the past three months, IBM visualization experts have explored new and emerging visual techniques and the underlying technologies you can deploy to better understand your data to transform insights into better business outcomes.
Frank van Ham is a well-known research scientist and an IBM Master Inventor with over a decade in experience in designing and deploying interactive information visualization. Some of his past projects include Many Eyes, a site for collaborative visualization and SequoiaView, a visual disk browser. Dr. van Ham currently works with the IBM Business Analytics division on integrating visualization into IBM's product portfolio.
Thanks to digital sensors, storage and processors we now live in a world that produces and stores a staggering amount of data, the vast majority of it in digital form. We often hear that all of this data has the power to transform many information-heavy industries, from health care to financial. However, most of this data is not useful in itself. The hardest challenge in dealing with so called 'big data' is not about scale or infrastructure, but about finding ways to refine it into useful information. In this blog post I will argue that it takes the combination of unique strengths from both humans and machines to successfully tackle this problem, and that visualization is the medium that ties the two together.
One possible route to attack 'big data' is to use the same computing power that has allowed us to gather all this data in the first place. We can use computer algorithms to refine this data for us and then present it in an understandable way. There's a plethora of algorithms that allow us to extract higher level features from raw data. For example, clustering algorithms allow us to identify larger groups of items that share a common property, statistical methods allow us to describe a the data in a set in terms of more abstract features and data mining methods allows us to extract often co-occuring events, for example. Commonly, we refer to the collection of all of these methods as 'data analytics'. Analytics is what allow us to wither down a large set of low-level factual observations into a smaller set of observations; however, they do not always provide us with information directly. Computers generally excel at fast and accurate data processing, but lack the context and creative skills to assemble these processed results into actionable information. In fact, solely relying on pure analytics to make decisions is dangerous for a couple of reasons:
The following two charts try to illustrate this using two simple examples. Anscombe's quartet (left) presents four datasets that all have an equal number of points and virtually equal means and variance for both variables, as well as equal regression and correlation. Relying solely on aggregate statistics to describe these datasets without visually inspecting them would be grossly misleading. On the left I've plotted a small set of two dimensional points, some closer together than others. Suppose I want to know how many clusters of points are in this dataset. A human would probably say "it depends," while a clustering algorithm would say "3" (and yet another clustering algorithm might state "10"). Deferring critical decisions to ‘blind’ analytics is dangerous in general and in all but the most straightforward cases you probably want a human in the loop to verify the results.
Unlike computer systems, the human brain is capable of putting information into context, making lateral jumps that connect two seemingly unimportant observations and provide creative hypotheses for an observed feature. This is (still) what makes us smarter than computers. In an ideal world, we would have humans place the results coming in from analytic processes in context and feed back their interpretations of the result into the processes themselves. Or in another analogy: Data mining algorithms provide us the tools to do the digging quickly, but deciding where to dig and what you do with the results is still very much a human decision. Or to quote Shyam Sankar in this TED talk : “You cannot algorithmically data mine your way to the answer. There is no ‘Find Terrorist’ button.”
To realize this tight coupling between human operator and analytic tool, we need a medium that is suited to transfer information between both quickly and efficiently. Humans in general have a hard time interpreting large amounts of abstract information in numerical form because we only have limited working memory and are not naturally used to working with numerical representations. Instead, we have evolved to take in most of the information about the world around us in a visual manner. As a result, the part of our brains that do visual processing are well suited to spot outliers and detect patterns, without having to specify in advance what the patterns is.
Information visualization is a medium that uses computer algorithms to transform abstract data into visual imagery in a smart way, such that we can take advantage of our specialized "hardware". It allows us to quickly understand what is in a set of data and how the numbers relate. Note that I’ve deliberately designated visualization as a medium, not as a technique. Just like any medium, it takes skill to use it to communicate effectively and in a pleasant manner. In this post I’ve argued that visualization should be used to communicate analytic results from a computer to a human before they are used as basis for decisions. Other uses of the same medium involve communication of data from one human to another, for example to present data to another stakeholder. We will dig deeper into this interesting area of information processing in a number of future blog posts, so stay tuned!
Continue exploring visual analytics on IBM Many Eyes
Read previous entries in this series: