Guest post by Tina Groves, Big Data Product Strategist for Business Intelligence.
A common mistake when analyzing operational data is double counting. By operational data, I mean data which contains redundancies and dependencies which haven’t been normalized -- in other words, data commonly found in Excel extracts or application tables. Even accountants and those with the training to navigate this problem can be challenged by double counting, particularly when the data being analyzed joins data from different sources or when the business context requires interpretation.
How serious is double counting? In a world being measured more and more, contracts which reference metrics as part of a guarantee or implied promise are causing corporations to assess their liability and risk. For example, a ruling of the Federal Energy Regulatory Commission, Feb 2012, upheld PJM’s assertion that EnerNOC double-counted its customers’ participation in certain demand response programs. This dispute lasted months, battering EnerNOC’s stock price and resulting in resignations by EnerNOC’s COO and CFO.
What is Double Counting?
Double counting describes a situation in which results are overstated due to referencing a transaction more than once. Double counting does not imply that duplicate records exist.
In my experience, double counting typically occurs when:
Data values are repeated in a column e.g., the same customer number is found in multiple rows
Multiple identifiers for the same “noun” or entity exist -- e.g., one customer has multiple customers numbers, possibly due to different applications, a system migration or different locations being assigned different numbers
A contextual element or business rule is overlooked or not represented in the data
A complex relationship in the data exists, making relating or joining data in calculations challenging
Most of these problems are related to the technical aspects of data management and, as a result, requires a technical approach to solve them.
Setting the Stage
Before discussing how to address double counting issues, let’s first become familiar with our sample data, 6 rows of invoice data for July and August 2013. This data will be used for the double counting lessons.
Table 1 Basic Invoice Data
Some of the questions which can be easily answered with this data are as follows:
The Double Counting Challenge
Using these six rows, I’ll illustrate common double counting challenges in trying to answer “how many customers?” in the next series of posts. Providing accurate counts, in context of the business question, is necessary to calculate business metrics such as average deal size, commission payouts or market growth.
About the author
Tina Groves leverages her 25 years of experience in analysis event processing and information-driven applications to lead the big data product strategy for IBM Business Intelligence. She particularly enjoys sharing with customers how these new technologies can improve decision-making. A long-time product manager, she has received corporate recognition for new product introductions and earned two patents. Follow her on Twitter @tinagroves and on LinkedIn.