Consultant, IBM Center for Applied Insights
If your enterprise is working with Big Data, or at least beginning to stick your toe in the water, and you're not thinking about the concept of "signal", you're about to make a big mistake. Identifying the signal is what will enable you to leverage Big Data effectively. And if you don't, you're going to spend a lot of time and money chasing red herrings.
Stephen Few recently penned a great blog post on the concept of "signal detection". He proposes a definition of what defines signal vs noise:
When we rely on data for decision making, what qualifies as a signal and what is merely noise? In and of themselves, data are neither. Data are merely facts. When facts are useful, they serve as signals. When they aren’t useful, data clutter the environment with distracting noise.
For data to be useful, they must:
- Address something that matters
- Promote understanding
- Provide an opportunity for action to achieve or maintain a desired state
When any of these qualities are missing, data remain noise.
I like this definition. It fits hand in hand with the concept of Marketing Science that we proposed earlier this year. Insights (aka signal) are only valuable in so far as they drive business outcomes. And if you're developing insights that influence action within your enterprise, you had better make sure that what you're looking at is actually signal.
This is where Big Data is presents challenges. In his post, Few makes the absolutely correct point that data are noisy. And when data increase dramatically in volume, velocity, and variety (aka it gets BIG), that noisiness grows right along with everything else. All of a sudden, it becomes that much harder to correctly identify signal. As Few points out:
Finding a needle in a haystack doesn’t get easier as you’re tossing more and more hay on the pile.
If you listen to some of the discussion around Big Data, you could easily walk away thinking that if you can capture it, all you need to do is run it through some sophisticated analytic software and "boom" you've got new insights.
The problem with this approach is that pesky noise. As you start dealing with huge data sets, it becomes relatively easy to find "statistically significant noise". You may think you're looking at signal, but instead you're just finding random patterns in the noise that happen to look like signal. This is what can happen when analysts are given lots of data and told to go find something.
How do you combat this? Part of it, as Few points out, is having data analysts that have a deep understanding of how to detect signal and the associated challenges that Big Data presents. The other part, is in how you approach data analysis in the first place.
Again, I'll reference our Marketing Science framework and propose that by applying a scientific approach to data collection and analysis, you improve your ability to correctly identify signal. Instead of randomly looking for patterns in the data, by developing hypotheses and then testing and refining them, you're able to focus on signal that (a is more likely to actually be signal and b) will help drive the business forward.
We've seen some really interesting and impactful results internally with the Marketing Science framework. We've developed insights that both drive business outcomes and challenge conventional thinking. I'll be highlighting a few of these examples in future blog posts. In the meantime, I'd love to get your feedback on what challenges you've experienced with identifying signal within Big Data.