Guest post from Graham Wills, IBM RAVE Chief Architect and Statistician
How do you visualize many variables all in the same format?
It's a common question. Data is complex, multifaceted and it's rare that a real-life situation can be summarized by a few simple fields in a database or columns in a table.
I recently looked at a data set that was submitted for the American Statistical Associations “DataExpo” competition. The survey data included has more than 150 fields for each respondent, and a very general exploratory goal. As I started looking, I wondered about ways of showing many fields in the same visualization.
In a previous article I showed how hard it is to use two different encodings for data (like shape and color) and make sense of it. It requires us to carefully look through the information and parse it one piece at a time.
So although it may not be useful for large amounts of data, it may be useful for smaller amounts where we do have the time to look through carefully. As a trial data set, I used data I was familiar with – on a set of cars from the ‘70s and ‘80s.
The figure to the right is a packing layout; the glyphs (one per car) are simply packed together using a simple “put the biggest in the center and work outwards” algorithm. Algorithms like this are the basis of many layouts like the Tag Cloud and Bubble charts shown on IBM ManyEyes.
In the figure we have made the following mappings:
· Size: Car weight – heavier cars are show with bigger symbols
· Color: Fuel economy in miles per gallon – red means very economical; blue, uneconomical
· Shape: U.S. cars are circles; European cars are squares; Asian cars are plus signs
By examining the view we can see some effects with a little inspection:
· Asian cars are not big
· Heavy cars are not fuel efficient
· European cars are more fuel efficient for their weight than U.S. cars
So, three fields used in the display. Not bad, but not close to 150. The problem with using different graphic properties for a shape is that there are not enough graphic properties. We could use color, size, shape, texture, and variations on those, for example, by encoding hue, saturation and brightness separately. But even then, we are going to run out soon.
A different approach was proposed in a paper by Herman Chernoff in 1973. Although proposed more in the spirit of a thought experiment, visualizations based on this paper have been created and implemented in a number of charting systems.
The basic idea is that you represent each row of data as a complex glyph (symbol) with many individual parts, and then apply graphic effects to each of the parts. Chernoff suggested faces as suitable complex glyphs that people might be able to rapidly analyze, and used scaling transformations to encode several fields of data – so eye size might map to one field, nose to another, head width and height to two more, and so on.
For the cars data set, the next visual shows a variation on Chernoff faces done using IBM’s Rapidly Adaptive Visualization Engine (RAVE) software. This software doesn’t have specific code for Chernoff faces, but it allows the definition of complex glyphs and allows the user to map data to different properties of each part of the glyph, making it suitable for a Chernoff-like representation.
I’m showing just a subset of the whole data here (the latest model year of cars) and color is used for the country of origin (U.S. in green, Europe in red, Asia in cyan). The mapping of the parts is as follows:
· Left Eye Size: Acceleration
· Right Eye Size: Horsepower
· Mouth Size: Miles Per Gallon
Sadly for U.S. drivers, they have some of the smallest smiles as their fuel economy is relatively bad. There isn’t an obvious relationship between fuel economy (smile size) and the other size fields, but we do see an interesting pattern in eyes sizes.
Although mostly balanced, U.S. cars seem often “right eye dominant,” showing more horsepower without an equivalent increase in acceleration. Also, there is one Asian car that has huge acceleration from very little horsepower. Perhaps adding car weight would explain this discrepancy?
In this figure I show all the cars in my data set. I made the glyph for the U.S. cars very slightly larger (0.001%) than the others so the layout algorithm would force them into the center, and similarly made the Asian ones very slightly smaller so the algorithm generated a helpful grouping effect.
This is a lot of faces to look at – too many to try and sort through one at a time looking for patterns. It shows the limitation of this approach – when we increase the number of columns, we have to reduce the number of rows. There’s no such thing as a free lunch.
Chernoff faces are a bit of fun. Experiments show they are not really a great way of showing data. However, the basic principle is still valid: so long as there are not too many items, we can encode data into graphical properties of parts of a complex shape, and annotate them and use these shapes to convey a lot of information.
In this final example, I have gone back to the smaller subset of cars and changed the representation using bars instead of eyes, a circle for MPG, and adding text to annotate the symbols.
These glyphs could serve as a building block for many types of layouts – an interactive list, sorted by criteria of the user’s choosing (like car price); as symbols in a scatterplot; or, as nodes in node-and-edge graph layout showing connections between cars based on social media discussions about them, or co-purchasing information.
Sometimes a funny face can lead to an important idea, and perhaps even a serious business insight.
Continue exploring advanced visualization on IBM Many Eyes
Why stop the insight with this article? Visit IBM’s visualization hub, IBM Many Eyes and join over 100,000 like-mined visualization enthusiasts, academia and professionals. Many Eyes V2 will launch at the end of March with several new enhancements that continue to deliver on site’s heritage of advancing visualization, including:
· Comprehensive site redesign that includes an updated site layout and presentation. Plus, new affinity areas to find and navigate visualizations by industry or topic, such as finance, healthcare and risk.
· Addition of the Expert Eyes blog dedicated to helping you learn how to create effective and engaging visualizations that provide maximum insight and tell a story. IBM visualization luminaries and IBM Researchers from the Center for Advanced Visualization will contribute regular thought leadership and perspectives.
· New visualization options, including a heatmap and view-in-context visualization built on IBM’s RAVE. RAVE, a declarative language based on the IBM patented ‘Grammar-of-Graphics’ approach, provides an intuitive way to create a visualization by describing what the visualization should look like not how to create it. With Many Eyes, RAVE does the work behind the scenes and you create your visualization in three steps.
Discover the newest version of Many Eyes beginning March 25 by visiting ibm.com/manyeyes
Graham Wills is the lead architect for IBM’s visualization engine. He has two decades experience in research and implementation of visualization systems in areas including statistical models, geo- and temporal- visualization, large-scale networks and coordinated views. He has published widely in the field and his recent book, Visualizing Time is currently available on Amazon.