Thomas Edison said genius was "one percent inspiration and 99 percent perspiration."
A similar thing could probably be said about data mining projects.
Whether you want to segment your customers or uncover possible fraud, you need to select a mining algorithm that will provide the optimal results. But the most perspiration occurs before you can apply the algorithm. You need to design data attributes that provide the relevant input for the problem you're trying to solve. These are called predictor variables. Predictor design involves both business understanding and the ability to quickly do various kinds of data transformations and aggregations.
For instance, if you want to segment your customers according to their buying behavior you may need to build aggregations such as total revenue or profit. You may also need to do complex transformations to calculate the revenue for different product groups or time periods. Predictor design consumes a huge portion of the overall project effort, as it typically requires advanced SQL skills.
IBM InfoSphere Warehouse offers new data preparation tooling which helps you to speed up data transformation and predictor design tasks considerably. The tooling uses a new approach to let you create predictors in a descriptive style without having to worry about writing SQL code or using complex ETL tools. You describe the predictors for your algorithms and the tooling generates SQL scripts and data flows that perform the transformations for you at the push of a button.
The tooling provides:
- sample contents and multi-variate distributions of your source data
- automatic determination of discrete boundaries
- flexible selection of aggregation levels
- support for finding split categories
- previews of your transformation results
You can get more information about the data preparation tooling in the InfoSphere Warehouse 9.7 Information Center.http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.datatools.datamining.doc/c_dp_datapreparationoverview.htmlSimone Daum
Business Intelligence and Data Mining