Unleashing the Power in DataStage 8.5
Issue 2 - Caching in the Transformer
This is the second in a series of blogs on the new features of InfoSphere DataStage 8.5. In this series I'll provide some detail on the new functionality and explain areas where I think customers will receive significant benefit. The first issue, "Looping Transformer" is available at the following link. This blog will build on some of those concepts, so you may want to review that post before reading this one.
- SaveInputRecord: stores an input row to back of the cache
- GetInputRecord: retrieves a saved row from the front of the cache
- The input data set is sorted by fund id and valuation date in ascending order. We have an unknown number of records for each fund. The requirement is to output the five most recent valuations for any fund and if there are not at least five, do not output any.
- There is a varying number of clients (N) related to each salesperson. The requirement is to label each such client detail record with a label that reads "1 of N".
- An input file contains multiple bank accounts for each customer. The requirement is to show the percentage of the total balance for each individual account record.
- input data is read and stored in the cache via the "SaveInputRecord" function
- key breaks are tested via the "LastRowInGroup" and "LastRow" functions
- a summing balance is calculated while the group is the same and the total balance field is set at key break
- if there is a key break, the "RowsAtBreak" variable is set to the number of records on the cache and we'll skip to step 6
- if there is no key break, the Loop Condition will test false (since RowsAtBreak will be 0). This will cause the next input record to be read and processed (note the loop pictured on the top left)
- finally, the transformer will loop while "@ITERATION <= RowsAtBreak" and read the cache records (this is the loop pictured on the bottom right)
- for each output record processed, the percentage is calculated as Balance/TotalBalance.
- In step 5, we noted that RowsAtBreak must be "0" until we are ready to process output data. This will prevent the transformer from entering the Loop construct since @ITERATION is equal to "1" before we enter the loop. Since 1 <= 0 will test false, the transformer will not enter the loop construct and instead will read the second input record. Assuming we then hit a key break, RowsAtBreak will be set to 2 and we will then iterate through the loop twice.
- The second item to clarify is a simpler one. In the loop variable we call the GetSavedInputRecord function. When this is called, the output derivations for "DSLink2" redirect to the row pulled from cache. This allows us to code a single output derivation (for instance "DSLink2.Balance/TotalBalance") which would work for either a cached row or one coming direclty from the input link.
At this point, we've solved scenario 3. You probably can surmise the variations that would be required to solve scenario 1 and 2... they only require a few small tweaks to this model.