What to look for if DB2 crashs or hangs due to transparent LDAP?
The crash of DB2 or hang of it usually causes DB2 to create a trap that leads up to the failure, in this case the trap usually happens outside of DB2 and shows the following:
0x00002AAACE1F49CF _nss_ldap_ent_context_release + 0x002f (/lib64/libnss_ldap.so.2)
0x00002AAACE1F6A75 _nss_ldap_endgrent + 0x0015 (/lib64/libnss_ldap.so.2)
or review the OS logs for similar /var/log/messages may also show:
May 7 08:23:33 db2test db2sysc 0: nss_ldap: could not get LDAP result - Can't contact LDAP server
May 7 08:23:37 db2test db2pd: nss-ldap: do_open: do_start_tls failed:stat=-1
May 7 08:23:38 db2test db2pd: nss_ldap: reconnected to LDAP server ldap://db2test.ca.db2test1.com/
The OS is configured to perform group enumeration via LDAP because we see 'endgrent' calling '_nss_ldap_endgrent'. In most cases of this issue the code in libnss_ldap.so.2 has aborted. This abort affects endgrent and db2ckpw and caused db2ckpw to fail. The reason why libnss_ldap.so.2 was aborted is because that was the library that was being access at the time DB2 came down or crashed.
The 'crash' of db2ckpw causes DB2 itself to come down - which is DB2 acting as designed. DB2 comes down as gracefully as it can, but crash recovery will always occur for active databases at the time of the crash. Crash recovery will occur the next time the database is activated.
The way transparent LDAP works DB2 knows nothing about LDAP. DB2 makes OS calls, which in turn make LDAP calls. DB2 is not responsible for the crash, because the crash is not occurring in DB2 code.
If your OS is configured to use LDAP for authentication, and the connection to LDAP server fails, DB2 will fail.
Resolving the problem
To resolve this issue please do the following:
- Please make sure DB2 is configured correctly with LDAP by reviewing the following link:
Configuring transparent LDAP for authentication and group lookup (Linux)
and make sure the following is set in your DB2 environment variables
Please note the above configuration of transparent LDAP with DB2 is the only supported set up
- If the crash or hang is still occurring make sure OS defect 350 and 352 are applied. They were fixed upstream back in 2007 but the fixes have not been incorporated into the RedHat nss_ldap packages that you maybe using. Please make sure you are using RedHat 5.7.
- If the above OS bug has been applied and the hang or crash is occurring then you or your LDAP admin may need to review the /etc/nsswitch.conf and /etc/ldap.conf to make sure they are set up correctly.
It is also recommended that you check the BIND_POLICY SOFT found in your nss_ldap module. In some cases this would cause nss_ldap to return a negative result if it cannot connect to the LDAP server otherwise it will retry indefinitely. This has alleviated problems where LDAP has time-outs or caused by repeated failure to contact the LDAP server. The alternatives to the "soft" bind policy are "hard_open" and "hard_init".
"Hard_open" reconnects if opening the connection to the LDAP server failed. "Hard_init" will reconnect if initializing the connection fails. The difference is that initialization may not actually contact the server. The "Soft" option forbids nss_ldap from retrying failed LDAP queries. If the default bind policy is used, LDAP will retry a query several times when the LDAP server is not present. In some case changing the from "Soft". In some cases this issue was resolved by changing the BIND_POLICY from "Soft" to either "Hard_init" or "Hard_open". It is recommended to consult with the LDAP admin before making this change to find what the best option would be for the affected environment, as this not an IBM DB2 setting.
- If the problem persists than it is recommended to contact your LDAP provider or OS support team for further assistance.