Recovering from BSDS or log failures during restart
When the bootstrap data set (BSDS) or part of the recovery log for Db2 is damaged or lost and that damage prevents restart, you need to recover from that situation. What you do to recover varies based on the particular circumstances.
If the problem is discovered at restart, begin with one of the following recovery procedures:
- Recovering from active log failures
- Recovering from archive log failures
- Recovering from BSDS failures
If the problem persists, return to the procedures in this section.
When Db2 recovery log damage terminates restart processing, Db2 issues messages to the console to identify the damage and issue an abend reason code. (The SVC dump title includes a more specific abend reason code to assist in problem diagnosis.) If the explanations for the reason codes indicate that restart failed because of some problem that is not related to a log error, contact IBM® Software Support.
To minimize log problems during restart, the system requires two copies of the BSDS. Dual logging is also recommended.
- Restart Db2, bypassing the inaccessible portion of the log and rendering some data inconsistent. Then recover the inconsistent objects by using the RECOVER utility, or re-create the data by using REPAIR. Use the methods that are described following this procedure to recover the inconsistent data.
- Restore the entire Db2 subsystem to a prior point of consistency. The method requires that you have first prepared such a point; for suggestions, see Preparing to recover to a prior point of consistency. Methods of recovery are described under Recovering from unresolvable BSDS or log data set problem during restart.
Bypassing the damaged log
Even if the log is damaged, and Db2 is started by circumventing the damaged portion, the log is the most important source for determining what work was lost and what data is inconsistent.
- Db2 restart fails. A problem exists on the log, and a message identifies the location of the error. The following abend reason codes, which appear only in the dump title, can be issued for this type of problem. This is not an exhaustive list; other codes might occur.
- 00D10261
- 00D10262
- 00D10263
- 00D10264
- 00D10265
- 00D10266
- 00D10267
- 00D10268
- 00D10329
- 00D1032A
- 00D1032B
- 00D1032C
- 00E80084
The following figure illustrates the general problem:
- Db2 cannot skip over the damaged portion of the log and continue restart processing. Instead, you restrict processing to only a part of the log that is error free. For example, the damage shown in the preceding figure occurs in the log RBA range between X to Y. You can restrict restart to all of the log before X; then changes later than X are not made. Alternatively, you can restrict restart to all of the log after Y; then changes between X and Y are not made. In either case, some amount of data is inconsistent.
- You identify the data that is made inconsistent by your restart
decision. With the SUMMARY option, the DSN1LOGP utility scans the
accessible portion of the log and identifies work that must be done
at restart, namely, the units of recovery that are to be completed
and the page sets that they modified.
Because a portion of the log is inaccessible, the summary information might not be complete. In some circumstances, your knowledge of work in progress is needed to identify potential inconsistencies.
- You use the CHANGE LOG INVENTORY utility to identify the portion of the log to be used at restart, and to tell whether to bypass any phase of recovery. You can choose to do a cold start and bypass the entire log.
- You restart Db2. Data that is unaffected by omitted portions of the log is available for immediate access.
- Before you allow access to any data that is affected by the log damage, you resolve all data inconsistencies. That process is described under Resolving inconsistencies resulting from a conditional restart.
Where to start
The specific procedure depends on the phase of restart that was in control when the log problem was detected. On completion, each phase of restart writes a message to the console. You must find the last of those messages in the console log. The next phase after the one that is identified is the one that was in control when the log problem was detected. Accordingly, start at:- Recovering from failure during log initialization or current status rebuild
- Recovering from a failure during forward log recovery
- Recovering from a failure during backward log recovery
As an alternative, determine which, if any, of the following messages was last received and follow the procedure for that message. Other DSN messages can also be issued.
Message ID | Procedure to use |
---|---|
DSNJ001I | Recovering from failure during log initialization or current status rebuild |
DSNJ100I | Recovering from unresolvable BSDS or log data set problem during restart |
DSNJ107 | Recovering from unresolvable BSDS or log data set problem during restart |
DSNJ1191 | Recovering from unresolvable BSDS or log data set problem during restart |
DSNR002I | None. Normal restart processing can be expected. |
DSNR004I | Recovering from a failure during forward log recovery |
DSNR005I | Recovering from a failure during backward log recovery |
DSNR006I | None. Normal restart processing can be expected. |
Other | Recovering from failure during log initialization or current status rebuild |
Another procedure (Recovering from a failure resulting from total or excessive loss of log data) provides information to use if you determine (by using Recovering from failure during log initialization or current status rebuild) that an excessive amount (or all) of Db2 log information (BSDS, active, and archive logs) has been lost.
The last procedure,Resolving inconsistencies resulting from a conditional restart, can be used to resolve inconsistencies introduced while using one of the restart procedures in this information. If you decide to use Recovering from unresolvable BSDS or log data set problem during restart, you do not need to use Resolving inconsistencies resulting from a conditional restart.
Because
of the severity of the situations described, the procedures identify Operations
management action
, rather than Operator action
. Operations
management might not be performing all the steps in the procedures,
but they must be involved in making the decisions about the steps
to be performed.