Recently I came across an Harvard Business Review (HBR) article with the following stat: 2.5 exabytes of data are created each day, and that number is doubling every 40 months or so (Oct. 2012). An exabyte is one billion gigabytes. The article illustrates this point using Walmart as an example where they estimate 2.5 petabytes of data are generated each hour from customer transactions alone.
Now Walmart is a large Teradata customer with terabyte-scale data warehouses that go back to 1992. According to Gigaom (Mar. 2013), Walmart operational system was at 2.5 petabytes back in 2008, and is now likely in the double digits of petabytes.
Doing the math, if I assume that the HBR and Gigaom figures are at least in the ballpark of reality, then I have a few observations about Walmart’s data management policy. I'll share my first observation.
In August 2013, Walmart cut its outlook for the rest of 2013 citing the impact of shaky recovery, higher payroll taxes and gasoline prices on their customers in the U.S. The firm also experienced cost control issues oversea.
Might Walmart or any retailer be able to improve its sales forecast and merchandising mix by keeping and analyzing more data with big data technology such as Hadoop MapReduce? Only Walmart knows, but I'm guessing - likely.