Sign in to participate
Counterintuitive Data Science Methods May Yield Keener Analytical InsightsMathematics is not a hermetic metaphysical pursuit, but rather a field where researchers craft and tweak algorithmic approaches that are suited to various problem domains. The best mathematicians know it's a deadend to develop new approaches with any or all of these limitations: have no realworld applications, consume an inordinate amount of computing resources, and/or are so complex that almost no one else understands or knows how to apply them. The best statisticalanalysis algorithms provide tools for collective discovery of quantitative relationshipspreferably, where science comes into the picture, of an empirical nature. However, sometimes the traditional approaches get in the way of datadriven insight extraction. The underlying algorithms can just as easily obscure key quantitative relationships as reveal them. New branches of the mathematical arts often emerge to help scientists see patterns that are otherwise dark. Think of Newton, modern physics, and the pivotal impact of the calculus. Think of Mandelbrot, modern chaos theory, and fractal dimensionality.As more scientists incorporate big data into their working methods, they're going to reassess whether the mathematical and statistical algorithms in their datascience toolkits are as effective at petascale as in "small data" territory. One key criterion is whether machinelearning algorithms can continue to calculate "good enough" predictions from data at extreme scales. One key way to define "good enough" is "efficiently executable with available bigdata platforms in a acceptable timeframe while delivering actionable results." In that regard, I recently came across an excellent article presenting a new mathematical approach for tuning otherwise "inferior" machinelearning algorithms for big data. Within the context of the article, the author, Brian Dalessandro, essentially defines "inferior" as any algorithm that degrades the quality of trainingset data that is used to tune the statistical model. What was most noteworthy about the discussion was the counterintuitive thrust of the approach: an algorithm that is inferior on one attribute (e.g., data quality) can also be superior on others (e.g., predictive accuracy, efficient linear scaling, costeffectiveness on bigdata platforms). Dalessandro outlines an approach that relies on "stochastic gradient descent" (SGD) and featurehashing algorithms to reduce the "dimensionality" (i.e., the number of features/variables) being modeled. From a statistical analysis standpoint, the dimensionalityreduction approach increases one type of modeling error ("optimization error") in order to reduce the other types ("estimation error" and "approximation error") that contribute to modeling accuracy. Dalessandro makes it clear why this algorithmic approach is suited to big data: "By choosing SGD, one introduces more optimization error into the model, but using more data reduces both estimation and approximation errors. If the data is big enough, the tradeoff is favorable to the modeler." Essentially, it's favorable to the modeler in analytical problem domains, such natural language processing, to which the approach's optimization errors are not showstoppers. He also mentions other benefits, such as enabling more complex feature/variable sets to be modeled in constrained memory resources and providing a more privacy friendly way to store and use personal data. But he also notes a tradeoff: the approach introduces more chaos into the modeling results. Though highly arcane, this is the soul of practical data science: fitting the mathematical, statistical, and algorithmic approaches to the problem at hand and adapting them to the bigdata resources at our disposal. Like any engineering discipline, this involves making tradeoffs among algorithmic approaches. It's applied math on the proverbial steroids. Connect with me on Twitter @jameskobielus 


2 Comments