Data Visualization: Simple solutions for complex data
Delaney Turner 270003RQ8K Delaney.Turner@ca.ibm.com | | Tags:  ibmsoftware information-insights
0 Comments | 4,795 Visits
The following is the second in a new six-part series on Advanced Data Visualization. Over the next three months, IBM visualization experts will explore new and emerging visual techniques and the underlying technologies you can deploy to better understand your data to transform insights into better business outcomes. You can read the first post here
Graham Wills is the lead architect for IBM’s visualization engine. He has two decades experience in research and implementation of visualization systems in areas including statistical models, geo- and temporal- visualization, large-scale networks and coordinated views. He has published widely in the field and his recent book, Visualizing Time is currently available on Amazon.
Data is not simply become larger every day, it is also becoming more complex. It is rare that any serious project concerns just one table, and it is common that the data contains structures and information that is non-tabular in nature. A common visualization requirement is to take a set of complex data sources and produce visualizations that allow domain experts (who are unlikely to be visualization experts, statisticians or comp. sci. majors) to see patterns, make deductions, and take effective action.
Einstein’s famous suggestion is to “make everything as simple as possible, but not simpler." Another similar quotation of his is “Any fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction.” The goal of visualization is to make the complex intuitive – simple. It’s not an easy goal and, while it doesn’t require an Einstein, it does require a bit of thought and experimentation. Here are some suggestions:
Use Analytics to Focus on the Important
So I turned to analytics to take the words I had read in, filter out the spoken words only and then generate an “anger score” to each word used. Neutral words scored zero, strong words like “mad” scored high, and peaceful words like “calm” were given a negative score”. I rolled all these up to give a simple resulting table – character x word count x anger.
So now I wanted a simple visualization that shows this data. A bar chart of character x word count colored by anger would do, but it was a little dry, and I wanted to appeal to people who have a strong domain knowledge about TEXT – not numbers, so I settled on the following Wordle chart:
Bar charts are fantastic for showing differences in values very clearly – aligned bars are the best to go for quantitative judgments. But I wasn’t interested in that – I wanted a display that showed QUALITATIVE effects, and this chart gives that. In the Wordle, you cannot tell who has more words – “Four” or “Ten” for example look very similar, but for analysis of a play, that is not a bad thing. The audience will not be able to make an exact count – the overall impression is what is important. The fact that they are roughly the same is a better way of stating the data than saying one has a small number more than the other. In the above figure the font heights represent the root of the number of lines, and the more saturated the red, the more angry that character is; characters who try to calm things down are shown in blue with “Foreman” the most calming of the roles.
Build on Known Learned Representations
People can and do understand complex systems. If you need to make a complex chart, then one possibility is to build on what people already know already. Very often domain experts are familiar with certain types of graphic or visualization and building on that pre-existing knowledge allows you to show something more complex that can still be immediately understood. Genome browsers are a good example of such a visualization. They show tremendous quantities of data, often at multiple scales, with plenty of ancillary data. Yet researchers in the area are very familiar with them, so visualizations based on the representations used in genome browsers can be understood by those experts more readily.
Another visualizations that most people are familiar with map visualizations, such as the one shown below.
Maps often contain many hundreds of thousands of points, joined together to form polygons that cover an extent. We often have different elements (or layers) for different features. Because we are used to them, their complexity has become second nature to use, so when we look at a hybrid Google map with layers of satellite imagery, roads, traffic, features of interest and so on, we are not overwhelmed and can use it to take important decisions (such as “can I walk along the river and get coffee on the way?”).
The visualization shown here uses just two layers. One is a set of state polygon data, where each state is colored by the state’s population as given by the 2010 census, on a heat scale. This type of chart is known as a “choropleth chart”. We are very familiar with this chart as it used to show all sorts of US region-based data, from election voting patterns to global population data.
A second element (or “layer”, in map parlance) shows points at the state centers. The points are sized and colored by the population in 1960. By compositing these two layers we can see that California has grown significantly, whereas New York state has declined a lot over the half-century. Michigan too has shrunk, relatively speaking, and Florida grown. Using the combined two sets of population data and the geo-locations, we might theorize a national shift to the south.
Two Charts are Better than One
Another solution is to divide and conquer. Rather than creating a single very complex chart, we can break the data into different parts and show multiple charts, one for each section of the data. We use interactivity or a visual cue to link the two together. For example, if we had four columns of data on the states, we might show two scatterplots with a pair of fields in each one, and color and label the points consistently between the two charts. We could even add a third field (such as population in 2010) to both charts used as size. If we used the same mapping, we would both add a fifth field to our analysis, and, at the same time, enhance the linking by making it easier to spot the same states in different charts (California is the big, red one … etc.)
A single chart that tries to show six different variables is likely to fail unless we are very clever, but by showing 2 unique and 2 common variables in two separate charts, we can show the same information, but with more visual simplicity. If we can combine them into one chart, we would do so, but such complexity is often hard to understand, and the technique of using multiple views and linking them works well.
As a final thought on this technique, it has the valuable feature of making it easier to combine visualizations from different domains. If we have a lot of correlated statistical data on a network, for example, we can create one set of charts for the stats part, and another view for the network display, and then link them together to create a set of views that break the complexity down into smaller, simpler units. Minard’s famous graphic of Napolean’s march (well worth a web search for if you are unfamiliar with it) is an example of this linked charts technique; a map visualization linked to a time series chart.
It is always easy to “add more stuff” in any powerful tool. We can take a simple chart, throw in more elements, color, size and symbol mappings and soon have a chart that allegedly shows more data, but actually hides it in a sea of consuming complexity. Good designers know that the secret to a compelling presentation is to create simple images that capture complexity. This article presents three ways to achieve that goal and allow people to make good decisions from complex data.
Continue exploring visual analytics on IBM Many Eyes
Why stop the insight with this article? Visit IBM’s hub of visual analytics, IBM Many Eyes, and join over 100,000 like-mined visualization enthusiasts, academia and professionals. The Many Eyes web community democratizes data visualization by providing a simple three step process to create and interact with a visualization using your data set. Then share or embed your visualization across the web or your social network.