Guest post by Tina Groves, Big Data Product Strategist for Business Intelligence.
This post is the last in a series about common double counting errors. It focuses on the more complex situations that can inadvertently result in double counting and requires additional technical skill.
If you missed the earlier posts, here are the links:
Post 1 The Rookie Analyst Series: Introduction
Post 2 The Rookie Analyst Series: Lessons 1 and 2
Lesson 3: Business Context
In Lesson 2, the answer to “How many customers” shifted from 5 to 3. ‘3’ would likely be perceived as the better answer by most business users but, again, is it the right answer?
Often, the business context will affect the definition of an entity. In our question “How many customers?”, Customer is the entity. How Customer is defined will vary somewhat by role across the organization. Using Table 2 from Lesson 2, let’s look at how business context affects the answer:
So, which number is the right number? Well, they are all correct, in context.
From a technical perspective, the CEO’s answer of ‘2’ is the most challenging to calculate; the financial accounting answer of ‘5’ is the easiest. Lesson 4 will provide some guidance on how to provide the CEO’s version.
Define entities in context of the stakeholders within your organization.
Document the business rules and the data elements which support each definition
A quick consistency test is to present each stakeholder several possible answers to the question “How many X are there?” [where X is the entity]. If the same answer is not selected, then the definition will require further rigour.
Align the level of detail or grain of the data to the answer before aggregating or counting. e.g., The sales manager is interested in regional data; whereas, the CEO is interested in global data. So, the data is first grouped by their level of detail before applying a calculation.
Lesson 4: Complex Data Relationships
In Lesson 3, three possible answers were identified to our question “How many customers?” Two of the answer are straight-forward to calculated with the data presented in Table 2. To calculate the CEO’s answer of ‘2’ requires additional data.
1. Recursive Relationships
In our example, Acme AG has its headquarters in Germany and offices in the USA and Korea. Partners Plus is based out of the USA and has an office in Korea. Table 4 illustrates how this information might be found in a database table or spreadsheet.
Defining the relationship in this manner makes counting the number of parent companies straight-forward:
COUNT DISTINCT [Customer Name] WHERE [Parent Customer Number] IS NULL = 2
The example in Table 4 is intentionally simplistic - each parent customer has only one office per country. Typically, parent entities have many levels of children. e.g., A company has multiple offices and subsidiaries which, in turn, have their own offices and subsidiaries
Expanding our example in Table 4 to include local subsidiaries might look like this:
Now, answering “How many customers?” becomes more difficult. e.g., How many customers are in Korea? Is the answer ‘1’ or ‘2’?
This type of relationship, where an entity is comprised of multi-level children, is described as a recursive relationship or sometimes described as a Bill of Materials (BOM) relationship. Other examples include organizational structures where employees report to another employee, a course is a prerequisite for another course or a product is comprised of parts which in turn are comprised of other parts.
2. Many-to-Many Relationships
An example of a many-to-many relationship is between Products and Customers: Products are purchased by one or more Customers; Customers purchase one or more Products.
Let’s consider how this many-to-many relationship affects calculating a 5% commission for sales managers where one sales manager is responsible for North American (NA) sales and the other, for OEM sales which sells product BLK-40-5. The first row in Table 2 shows that product BLK-40-5 was sold in NA for an invoice amount of $1,275.00 (150 units @ $8.50).
When calculating the commission, should the $1,275.00 be attributed 100% to both managers, in essence doubling the amount of bonus paid? Or, would this amount be shared between the two in some manner?
Using the invoice data in Table 2, Invoices are used to relate customer to product. For each invoice, there is one Customer and one Product. However, in our example, relating Customers to Product is complicated by the recursive relationship within Customer, as shown in this table:
Most people would object to total number of customers is ‘1’. Most would expect the total to be the sum of the regional totals which, due to the many-to-many relationship, is an incorrect assumption.
Who knew that 6 rows of data could represent so many challenges? Imagine analyzing thousands or millions of transactions.
As organizations increase their reliance on analytics to measure corporate performance, the level of analytical skill within the organization also increases.
To mitigate mistakes by the rookie analyst, consider these recommendations: