As I mentioned before there seems to be two strategies about monitoring
in general and I have had this discussion with many z sys progs. There
is the group that says the fastest resolution is to wait for a failure to
happen and then start working on fixing the problem which is the
best way to use their skills. The latest discussion I had on this
was with a group of customers in Toronto last month. I was talking about
the idea of finding loopers running on the z and some new techniques
with the OMEGAMON XE on z/OS v4.20 release. Some creative use of monitoring
buckets to pick them off as early as possible which by doing
this could save MIPS which is saving money versus using
the cycles for something that is going to be cancelled eventually.
That generated a whole group discussion on what happens if you generate
some false positives meaning that there isn't a looping job and it is
better to let a failure happen then generate these false positives where
the staff would be working on a perceived problem and not the real thing.
Some were for being proactive and had actually seen results with the looper
situation and others were pretty steadfast in only working on failures. In
the discussion a bit about different IT organizations came up as there
are strict department areas of control between say a group focused on
performance versus a group focused on automation. Some have performance
groups only for z and a different performance group for distributed and networks
Other groups have a silo'd approach to their area of control
and things outside their domain are not in their control and hence, not our job.
That discussion also brought up the point that to tie automation
and performance for proactive management would span several departments
which may have other priorities.
In general, the attendees viewed it important to ensure that
the z platform is inclusive of an enterprise wide strategy for achieving
the business goals that are imposed on IT and they need to work
across the silos but there are only so many hours in the week.
I know many customers have established performance automation routines
using OMEGAMON products where if a problem is uncovered with situations
where a proactive recovery routine is started or notification to the IT
staff for them to take a specified action to prevent an outage or degradation
of the system to take place. Although OMEGAMON products can deploy the "eyes"
into the systems by not needing a message to be issued to kick off an event.
The sister products like NetView on z/OS and Systems Automation actually
perform the automated actions, but how in a silo'd group does this
get accomplished for the benefit of all?
Again they all had their different domains and areas of responsibility but
With the discussion it was interesting how people react. I have
already discussed how to integrate the different managing IT staff with
different domain responsibilities by getting them to share the same view
of the problem on a piece of glass at a console. Even then some of the
attendees see no value in this. Some viewed this as a loss of control of
being in charge of their domain. I still think this saves both time and
money and could be leveraged effectively based on defining a business process.
The next step would be to integrate the performance tooling to work as a
business process to be able to sense a problem, isolate the problem,
diagnose and then repair the problem as an integrated process.
Many companies are using business process modelers to streamline actions
and provide for a competitive advantage of how a line of business manager
can generate an exacting control based on business needs.
When you think about your IT business processes, Change, Configuration, Problem,
Performance, Availability, how integrated is the IT staff and the process
to resolve events about these processes whether they are on the z platform
or from distributed platforms.
The group in Toronto all identified areas where this would help
with their jobs, but they all said, how much time in the week can you do
your job and still have time left over to talk about an automated process. There
has got to be an easier way. I guess that will be the next discussion.
Mike E Goodman Ends with z
Mike Goodman 270001BMTD firstname.lastname@example.org Tags:  tivoli performance proactive z/os management omegamon service 468 Visits
Mike Goodman 270001BMTD email@example.com Tags:  omegamon tivoli smcz tep green screen 540 Visits
So with old dogs and new tricks. It seems obvious that when something
goes bump in the night and it is detected that this information can be
sent to many different personas. The many different IT personas then
can be using many different UI presentation services from Web 2.0 graphics
to a green screen. A point here is that the bump in the night
is actually detected and can generate an event to notify the different
personas. If a problem has occurred or better yet, about to occur, with the
more screens to watch, will the IT person actually catch it without delay?
I was presenting recently at a conference meeting in Toronto when a discussion
came up about whether it is better to try to be pro-active and try to catch
system problems before they happen and risk generating a false positive or just
let the problem happen and then work on the recovery. I will come back to
that as I was actually surprised by the discussion it generated. Any thoughts
Right now there are a growing number of presentation services that have
a particular view into the IT resources of the enterprise. Tivoli has the Tivoli
Enterprise Portal, STG is delivering the z/OS MF, AIM has CICS Explorer, other
vendors are also delivering web 2.0 type consoles. As an event happens, the
problem could show up in any of these displays over the next years. I would
believe that the GUIs will grow with more options and more choices in the
years to come. It would seem that these changes are happening very quickly
with each internet technology turn of the crank. When I discuss the way
events are funneled into the personas, how do the personas get back to the
original cause of the problem.
There are really two particular things here that I believe save time and money
when your in IT and doing PD work trying to find a problem. The first item
is how if an event shows up in different GUIs, do you correctly communicate
to the IT support team, the actual problem that has occurred and then how would a
first line or second line support get to the root cause.
The strategy with the Tivoli z Portfolio is that the events can be sent to the
Tivoli Enterprise Portal which can be used as a consolidated view of many different
products as an services dashboard. So in the case of having to watch many screens
to keep an eye on the z, you only have to look at one. If an event occurs, it
shows up on the workspace view of the portal. For a business to have a
separate group watch just the z and another group watch just the distributed
environment, duplicates process, staff, adds to more complexity of managing
silos and doesn't help with any end to end view of service delivery.
For an SME that is doing their normal day job of supporting several
different z products the questiont I am asked is so what, I still will do my job
by dropping to a green screen for performance criteria, etc etc. So having
a z event show up in a Portal doesn't do me any good and I need it in a green screen.
So, a persona watching the enterprise events from the portal, needs to call or
discuss the problem now with the SME. This wastes time, adds to delay
and causes in some cases, organization slowness of response.
With the Tivoli Portal, we have added capability so that if an event shows
up in a workspace, that you can launch from that GUI workspace to the green screen
that is associated with the problem. Something we call launch in context.
So instead of having to pick up a phone and trying to explain what the event
is between the SME and perhaps the help desk or customer service,
an SME with access to the Tivoli Enterprise Portal, can launch
from the workspace right to the root cause that made the event show up on the portal
to begin with. How much money could the business lose in the 10 minutes of talk?
I don't believe that SMEs look at screens as part of the day job, but
rely on them for problem resolution based on how they do the PD work. One of the
major advantages is that you can pro-actively put situation monitors into the z systems and
subsystems, have those events sent to the portal dynamically, and then launch in
context back to the root cause. This should prevent a lot of hunting and pecking as
to where or what generated the event in the first place.
Working with the idea that an event can be sent to many different personas for an
indication of a failure whether it is an IT resource problem or even perhaps a
business process failure, and quickly from an enterprise view, get to root cause
will save time and money. So a strategy of just providing information in a green
screen or just providing information in GUIs is not a good strategy and a tool
should be able to do both with out of the box capability. Portal based GUIs are important
and growing in importance but green screens then are just as
important to resolve and to PD work. So the bridge from the GUIs to ANY green
screen and the idea of launch in context can keep the different IT personas all
focused on the same event and speed recovery which is a major part of the
OMEGAMON strategy and how it save the company money.