Big Data and Its Impact on Statistical Analysis
Timothy Powers 270003F3FN firstname.lastname@example.org | | Tags:  spss analytics mobile big-data predictive-analytics skills business-analytics statistics
0 Comments | 2,985 Visits
Guest post from Jing Shyr, Chief Statistician & Distinguished Engineer, IBM Business Analytics
It's the age-old question: why did the chicken cross the road?
With one chicken, the answer is easy to compute.
But, what if there were millions of chickens crossing the road? And each chicken had a mobile device and was tweeting out its opinions, desires, likes/dislikes, photos, and detailed descriptions of what it had for breakfast that morning. Oh, and what if that road was being monitored by millions of sensors?
With current statistical techniques, it's no longer easy to quickly understand why each chicken decided to cross that road and, more importantly, predict when they might cross again.
I'm discussing these trends at the American Statistical Association's Joint Statistical Meetings in San Diego this week.
Lack of Skills
Having been around the analytics industry for many years, it is refreshing to see that businesses are taking statistics and data mining results and injecting them directly into the business (and directly into the business process itself). The Catch-22 is that while more and more organizations are realizing the benefits of analytics, finding those professionals with an understanding of how to not only capture and analyze the tsunami of data created daily still requires training and a unique skill set.
A recent McKinsey Global Institute report indicates that over the next seven years the need for highly skilled business intelligence workers in the U.S. alone will dramatically exceed the available workforce – by as much as 60 percent.
It's nice to see that many universities around the world are expanding and strengthening analytics curricula (many with IBM's help) to meet the growing demand of skilled analytics professionals. Read more about IBM's work with Northwestern University, Yale School of Management and DePaul University, among others.
I often imagine a business analyst presenting results to an executive the same way I present to my students. When teaching a lesson on modeling, I often ask, "Do you see what I see?" Everyone stares with blank looks on their faces and says, "No! What do you see?"
Herein lies part of the problem. To help counteract the skills shortage, we have to make the software easier to use and force the software to be consumable versus strictly scientific. Communicating results is just as important as the results themselves. I strongly believe that statistical software needs to go through a revolution of its own and become as intuitive as a smartphone.
And speaking of smartphones...
Most statistical software produces an incredible amount of very large tables and charts, making it extremely difficult to comprehend in a mobile environment. I torture my eyes every time I try to read a report on my Blackberry.
Consumability means anywhere, anytime and through any device. It's time we hold statistical s
oftware to a higher standard.
Let me get back to the chickens for a moment.
The volume, velocity and variety of data today is seemingly overwhelming traditional statistical software. Not to be cliché, but Big Data is giving the statistics industry big problems.
Previously, if we wanted to analyze any data, we would follow the same logical flow: decide what we want to predict or classify and build a model by bringing in all the predictors (independent variables). The size of predictors are often well below 100.
Today, however, we are dealing with thousands of different variables making traditional statistical analysis a serious hurdle. The machine capacity is no longer capable and many algorithms have been outpaced by data capacity.
The challenge calls for a new process of data reduction before modeling and new computation algorithms are required to handle millions of records and fields quickly in a distributed environment without passing the data back and forth multiple times.
Most importantly, we don't need to be chicken when it comes to Big Data.
Creating new statistical techniques for Big Data will get us all to the other side of the road, and you'll never have to ask why.
For more information:
· Join our upcoming webinar and see all the new features in the upcoming IBM SPSS Statistics release (Aug. 14 at 12:00 pm EDT)