IBM Support

DataStage parallel jobs fail with fork() failed, Resource temporarily unavailable

Troubleshooting


Problem

InfoSphere Information Server DataStage parallel job are intermitently failing with the error message: fork() failed, Resource temporarily unavailable

Cause

This error occurs when the operating system is unable to create all of the processes needed for the job at runtime. Unfortunately, there is not a single reason to cause this problem.

On Unix or Linux platforms the most common reasons are:

  • The maximum process limit has been reached for the kernel
  • The maximum open file limit has been reached
  • Swap space allocation or pre-allocation has been exceeded
On Windows this issue can be related to running out of Desktop Heap or other system resources.

Resolving The Problem

On Unix or Linux

  • Adjust the maximum process limit

The exact methods for identifying and modifying process and file limits will vary between the different Unix and Linux platforms. The system administrator for the operating system should be engaged to assist with this.

As an example, the following command can be used on the AIX platform to see the current value of maxuproc:

lsattr -E -l sys0 | grep maxuproc

On Linux check the nproc setting in /etc/security/limits.conf

A reasonable setting for environments running large jobs would be MAXUPROC = 4096. To optimize this value, you can monitor (over time) the number of processes that are running daily, and then set the value accordingly. Here is some sample shell script code that you can use to monitor the number of processes belonging to the 'dsadm' user. The script will loop 365 times and take a measurement every 5 seconds.

#!/bin/sh
COUNTER=360
echo "" > /tmp/dsadm_count.txt


until [ $COUNTER -lt 0 ]; do
let COUNTER-=1
sleep 5
date >> /tmp/dsadm_count.txt
ps -ef | grep dsadm |wc -l >> /tmp/dsadm_count.txt
done

  • Reduce the number of processes

If you are unable to adjust the limits or if the limits are at maximum and the error still occurs, then the number of running process may be too large for this environment. In that case try to reduce the number of processes to workaround this problem. Reducing the number of processes may decrease performance but can allow jobs to complete. To reduce the number of processes you can reduce the number of logical nodes in the configuration file (defined by the environment variable APT_CONFIG_FILE) and also make sure that environment variable APT_DISABLE_COMBINATION is not set to True.

On Microsoft Windows

Refer to the Technote 1684610 to obtain a list of general recommendations for this platform.

[{"Product":{"code":"SSVSEF","label":"IBM InfoSphere DataStage"},"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Component":"--","Platform":[{"code":"PF002","label":"AIX"},{"code":"PF010","label":"HP-UX"},{"code":"PF016","label":"Linux"},{"code":"PF027","label":"Solaris"},{"code":"PF033","label":"Windows"}],"Version":"9.1;8.7;8.5;8.1;8.0;7.5","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

Document Information

Modified date:
16 June 2018

UID

swg21393359