The democratization of data we hear so much about these days has become a reality in part because of packaged solutions like IBM SPSS Statistics and IBM SPSS Modeler. These solutions have certainly changed the speed at which decisions are made as well as how information-rich the data points are that go into these decisions. Now, complex algorithms like regression analysis that were once the sole domain of a few experts within the business (assuming you're lucky enough to have that expertise in-house in the first place) can now be run by all, even those who struggle with single-digit multiplication tables. We all are familiar with this storyline.
But there's something new on the block that's causing a stir. Enter R. R shows great promise in adding even more high-end analytic arrows to the organization's quiver. But, what's an organization to do with R's potential? Jane Hendricks, IBM Product Marketing Manager - Predictive Analytics, a former analyst herself, gives you a few things to consider as you determine R's potential for your organization. Here's Jane's blog post:
Everyone wants to be considered an expert and apply their creativity. And I think that’s why the analytical community has embraced R with the passion that it has. The fact that it’s open source and, therefore, “free” – doesn’t hurt. There used to be a time when true analytics – data mining, text mining, predictive modeling, simulation – could only be done by a few, highly trained experts. Everyone else used Excel or – let’s be honest here – their trusty calculator. And the analyst would laugh when a pivot table was presented as an analytical novelty in the boardroom.
Those days are gone.
There are tools that make high-end analytics more accessible. Excel users are now demanding more than pivot tables. They have heard the “Big Data” message and they want to get on the advanced analytics train as fast as they can. For the analyst, this is a very scary world. “Real” analysts fear that giving direct predictive power to marketing, human resources, customer service and even finance can only mean one thing: analytical mayhem.
R allows the analyst to take control back. R is a very powerful programming language for manipulating and analyzing data. As a programming language, a user has to not only understand their data and what they want to get out of it, but also be comfortable with commands, cryptic error messages and documentation written by programmers for programmers. (For example, print prints its argument and returns it invisibly (via invisible(x))).
Let’s look at something quite simple here: Reading a file. Just that. Nothing more.
Here is how you read a file using R:
A <- read.table("x.data", sep=",",
col.names=c("year", "my1", "my2"))
nrow(A) # Count the rows in A
# You might find these useful, to "look around" a dataset --
library(Hmisc) # This requires that you've installed the Hmisc package
Now, contrast that with the following:
(figure 1 - IBM SPSS Statistics - below)
(figure 2 - IBM SPSS Modeler - below)
Figure 1 is IBM SPSS Statistics. This was accomplished with a single menu selection. Figure 2 is IBM SPSS Modeler. There, the user can open as many files as they want with simple drag and drop.
I can look at my data right on the screen, delete a column and more with simple menu options and often with a single click of a mouse. And if I make a mistake, I can easily “undo” that action, which, by the way, can be exceedingly useful.
Of course, if I wanted to write a completely new algorithm in either SPSS Statistics or SPSS Modeler, it wouldn’t be easy. These are, after all, packaged solutions. The reality is that for most real-world situations, what is within these commercial packages is more than enough. Very few use the full power of either one of these packaged solutions and most rely on a few “go-to” procedures for their day-to-day work.
There are certainly commercial packages that are using R to create predictive capabilities. Users can also use R with Excel. To do so, the user often (if not always) needs to install R or at the least be able to update it as needed (since there is no guarantee of version compatibility), learn the packages, and learn to code to be truly effective. When a vendor announces they have R integration, that’s typically code for code.
IBM has no issue with R code itself either. Recognizing that power users exist, and respecting the fact that there is some super-cool R “stuff” out there, users can use their favorite R code within the GUI. But what’s nice is that you don’t have to use programming for everything. You can be selective and use R where it makes sense, and what’s already “baked in” in situations where you don’t know, don’t have the time (or the inclination) to learn it ,or even just figure out the syntax.
Both IBM SPSS Statistics and IBM SPSS Modeler pre-date R. They have been and continue to be used by thousands across industries, in academia and around the world. In these SPSS packaged solutions there are data preparation tools galore, hundreds of statistical functions, a staggering number of predictive algorithms, charts, graphs, including all varieties of outputs, and more. But that’s just in a single analyst view. When you think about what it takes to make analytics part of the business, the capabilities for collaboration, repeatability and deployment are vast. Trying to build out that discipline on top of an open source command line programming environment can make that “free” R, very (very) expensive.
It’s really not a terribly new story. The tug of war between the power of the command line and the (perceived) limitations of a GUI environment are an old story. And now I think I’ll formally date myself as a dinosaur. But in the interest of analytics, well – why not.
While I am a marketer now, I grew up as an analyst. Back in those days, I used a command line, Unix-based package and my trusty Vi-editor to analyze and report on survey data. Massive amounts of survey data. BIG DATA is what one would say today. I could do magic. However, I could never get any help. There just wasn’t enough people around who could do magic as well as I could.
When my organization abandoned the command line in favor of the GUI, I was naturally upset. Now, everyone could do the same magic that only I had power over.
The truth though is that the GUI allowed my organization to replicate this magic for many customers at a much lower cost. And I found that the time I was spending poring through obscure documentation and trolling ListServs to find help with my magic, was much better spent actually looking at the results that were being produced, understanding how they mattered, and explaining it all to others.
The ecstasy of R is that complete flexibility that allows you to believe that you can do anything and everything (for free!). The agony is that there really is no free lunch – and R is no exception.
IBM SPSS Statistics
IBM SPSS Modeler
Download the trial version of IBM SPSS Statistics from the AnalyticsZone by clicking here now.
"R and SPSS sof tware: Everyone wins" whitepaper