Hi. Welcome to my blog. If we haven't had the chance to meet yet, let me introduce myself. I'm Tony Curcio the Product Manager for DataStage, Information Services Director and the Balanced Optimizer product within Information Server. In this role, I have the chance to meet many of our customers, understand the types of projects they are applying the technology to, receive feedback from their experiences and then use that information to prioritize the product roadmap. It is a very rewarding job - and one that I am very passionate about. In my posts, I hope to provide an inside look on data integration as a general practice and what we are specifically doing at IBM to help our customers.
This will be the first in a series of posts related to some of the new features we've included in InfoSphere DataStage 8.5. As I've had the opportunity to speak with many of our customers who participated in the beta program and are now upgrading, I've been able to hear first hand some of the "oohs and aahs" when folks begin working with these new features.
In this series I'll provide some detail on the new functionality and explain areas where I think customers will receive significant benefit. I'll be very interested in hearing back on this blog (or privately - see my contact details) on other scenarios where you are benefitting from these enhancements.
Looping Transformer - The One Inch Punch
I'll quote a bit of wikipedia here for those not familiar with the One Inch Punch....
I like that definition for the Looping Transformer for a couple of reasons.
- Extremely close distances - is very applicable since this is actually built into the Transformer Stage that developers have been using for years, so you can begin using it in any existing job and there's only a few new things to know
- Explosive power - this feature allows very complex data integration challenges to be solved elegantly from a design experience, and efficiently from a resource perspective
- One inch - well, that's about the size of the stage on the canvas (sorry, I'm likely to insert corny jokes from time to time)
Last week at the DataStage Deep Dive session at the Information OnDemand conference I presented a few scenarios where looping transformer moves certain patterns from complex to simple. I'll review my favorite one in this post (one of the others will be included in my next post on "Caching in the Transformer").
Variable Length Records with Embedded Payloads
The following example derived from a customer I was speaking with last month. They had a very interesting challenge related to a variable length string that included multiple record types and payloads embedded in the data. Here's a sample record that I drew up:
You can see there is a series of record types (the first being "A"), payload lengths ("005") and payloads ("abcde"). In the output the customer wanted to convert that data to the following:
What makes this a challenging problem is the fact that the length is defined in the data and the number of segments can vary tremendously (record 2 may have 100 payloads in that string). In DataStage 8.5, the looping transformer handles this very easily by simply introducing a loop condition. Here's the transfomer logic for solving this:
If you are a DataStage user, you'll probably recognize most of what's on the screen and then double-take on the fact that there's this new "loop condition" section on the canvas. You'll notice there is a loop test (which is a "while" condition) that controls how many times the output logic should be iterated over. The logic that has been circled includes a new variable named "@ITERATION" which is a counter indicating what pass through the loop this is. One other item that will appear new are the Loop variables - basically the same as Stage variables, but these get evaluated each time through the loop. The rest of this logic is old hat ... basic substringing and calculations to delineate the record payloads. The test for RemainingRecord <> "" allows us to exit the loop when all bytes in the string have been consumed.
You can see this is a very intuitive way to solve this particular issue and incrementally only added a single loop condition. It avoids several other transformers and funnels the customer is using in the current implementation. The savings therefore apply not only to the initial design experience and runtime performance, but then also the ongoing maintenance of this job as related requirements in the organization change.
As I mentioned earlier, if after you review this info you recognize an opportunity to optimize a similar data challenge in your enterprise, I'd enjoy discussing it.
If you are interested in learning more about Information Server or DataStage 8.5, I'd recommend a few items to read:The ibm.com "what's new in 8.5" site:Vince McBurney has several blogs on ITtoolboxAnd of course our InfoCenter which will get A LOT of detail on the suite in total