Guest post by Noah Iliinsky, IBM visualization luminary.
This is a continuation of a series of posts covering the Four Pillars of Visualization. If you haven't done so already, please read the introductory post and the post, "Purpose: the bedrock of an effective visualization."
Now that we have determined our purpose (the why of this visualization) we can start thinking about what we want to visualize. Our task is to include the relevant data (that which we know is useful) and to leave the rest out. We have mandates for both inclusion and exclusion.
To figure out what to include, we look to our purpose to tell us the most important data points and relationships. It's possible that your purpose is stated specifically enough to guide some very specific choices in your data. However, this may not true. In that case, you can work from statements of goals towards statements of data. (In fact, this is slightly preferred, as a great purpose includes declarations of what actions should be enabled, not just what should be displayed.) Working from the goals in your purpose, you can complete the following sorts of statements:
We want this visualization to enable the following actions/decisions: _____
To do this, it needs to be able to answer these questions: _____
To answer those questions, we need to display these data types: _____
As you're selecting data to display, resist the urge to show everything all at once: remember that extra information is the same thing as noise. The more data you put in front of your customer, the longer it takes them to find the specific data that they want. This is challenging. Most of us were told for years, "show your work!" We want to demonstrate our rigor and to show off our big data. Unfortunately, showing more than is necessary is a common default. (One good solution is to create a few focused and related visualizations, rather than one huge view-of-everything visualization.)
Here are two examples of why focus is often better.
This fantastic map shows one dot for every person in the 2010 US census, color-coded by race.
Looking at the whole country, we can see some strong trends, but no specific data. If we're looking for a single data point, or even information on an entire city, it's going to be very hard to find in this view of everything.
Now, one approach to clarifying the data is to add annotations and labels, but as we can see, that doesn't really help the situation.
Instead, let's consider a more focused view. Here's a view zoomed in on Brooklyn.
At this zoom level we can clearly see interesting details: specific streets that are dividing lines or gathering places, neighborhoods that are more integrated or more segregated, single blocks that are pockets of one racial group within another. We've added value by showing less. This is powerful.
Here's one more graph, one of my favorite examples. On the one hand, it could be considered boring: it's just a bar graph of two data points. Just two. Snoozefest, right? But this graph shows that, in the 4th quarter of 2011, Apple's iPhone revenue was larger than all of Microsoft's.
The graph could have shown all kinds of things -- market cap of the companies over time, breakdown of revenue by product lines, or any number of other data types -- but it's much more compelling as it is here: the iPhone, a product that didn't exist 5 years before this graph, is bigger than Microsoft. That's an incredible story, and using more data would have diluted the impact of that simple point.
So that's how you pick the right content. Understand your most important data and relationships, include them, and don't clutter the space with data that isn't relevant.
Up next: picking the right structure to reveal your data.
For further discussion on this topic, download my recent whitepaper, “Choosing visual properties for effective visualizations.” In the whitepaper, I’ll discuss the huge number of design decisions you need to make upfront before creating your visualization that will impact on the ability of the visualization to communicate knowledge accurately and efficiently. This paper addresses one key aspect of the design process: how to choose an appropriate visual property (position, shape, size, color and others) to encode the different types of data that will be presented in the visualization.
Why stop the insight with this article? Visit IBM’s visualization hub, Many Eyes, and join more than 100,000 like-minded visualization enthusiasts, academia and professionals, including additional insights from Noah Iliinsky and other IBM visualization luminaries.
Noah Iliinsky strongly believes in the power of intentionally crafted communication. He has spent the last several years thinking, writing and speaking about best practices for designing visualizations, informed by his graduate work in user experience and interaction design. Noah Iliinsky is the co-author of Designing Data Visualizations, and technical editor of, and a contributor to, Beautiful Visualization.