- preconfigured masking policies for common business data types
- reusable, repeatable masking routines which accelerate solution deployment
- capabiliies to simulate realistic data in situations where data type and format must be preserved
- the performance, scalability, and reusability of InfoSphere DataStage
- support for masking complex file types, including mainframe and EBCIDIC
Once installed, the Data Masking (DM) stage appears as any other on the Designer pallete. The screen shot below illustrates how that may look in a sample job. In this case, we are simply moving data from one file to another and applying the masking rules as the only transformation, but this job could have any series of transformations, aggregations, pivots and the like. The DM stage is capable of setting the masking policy for any number of fields, so in most use cases you would only need one such stage in a job. Of course, not every field requires a mask, and those fields can simply pass through this stage unaffected.
The DM stage also includes validation checking. For instance, if you are masking a Social Security Number, you may want to reject any data that doesn't conform to a standard pattern for SSN. In those cases, the user can set a property to either send that data down a reject link (not drawn in this particular job). Alternatively, the job can be set to abort on those conditions or simply pass the data through unaffected. This provides very robust handling for exceptions.
- output column: lists all columns in the record stream and allows the developer to choose which ones require a masking policy
- masking policy: any of a series of policies for the obfuscation of data, including National ID for a variety of countries, Credit Card Number, Random, Repeatable, etc...
- mask policy options: depending on the policy selected, the relevant options for configuration of that policy
The developer simply works through the drop down lists to select the columns and policies that are required. In the screen shot to the right the masking policy is set to "Hash Lookup" which gets a column or multiple columns from predefined lookup tables. This feature is important where the customer requires that a particular data value will always map to the same substitute value. The pack includes substitute data for several reference sets, including first name, last name, company name, and address.
If your organization is challenged with data privacy issues related to moving data throughout your organization, I'd enjoy discussing with you the unique benefits DataStage can introduce into those scenarios. As always, feel free to drop me a line anytime.