Collecting data for HTTP hang or performance issues on a Lotus Domino server
To troubleshoot HTTP crash or hang issues for a Lotus Domino server, several files are necessary for investigation. Submitting the data from the same problem occurrence is key to finding the root cause and resolution. Gathering this data before contacting IBM Support will help you understand the problem and save time analyzing the data.
An HTTP hang occurs when the task appears to be inaccessible. However, Domino does not generate an NSD log because the Domino server is still running. There are two types of HTTP hangs:
- Performance - HTTP response is slow or appears not to respond. However, HTTP is processing requests but is doing so very slowly.
- Semaphore Deadlock - A true hang where two processes are waiting for each other. HTTP will not respond to any requests or console commands due to the deadlock.
|Collection of troubleshooting data|
To troubleshoot HTTP hang or performance issues, IBM Support requires a complete set of debug data from the same occurrence; otherwise identifying the root cause and resolution can be delayed. HTTP hangs are more difficult to troubleshoot. More investigative work is required with HTTP hangs, and several iterations of data must take place to narrow down the problem.
To create this data, use these steps:
1. Enable the appropriate debug parameters in the notes.ini file, then restart the Domino server.
|Console_Log_Enabled=1||Enables console logging|
|Debug_threadid=1||Prints the Process ID (PID) in the console log|
|HTTPEnableThreadDebug=1||Logs incoming HTTP request & response headers|
|AgentThreadDebug=1||Logs start and end for Java or LotusScript agents|
2. Wait for the next occurrence of the HTTP hang, then issue the following Domino server console commands:
show stat Domino
tell http show thread state
3. Run the NSD debugger manually back-to-back (that is, multiple times) during the problem state. To do so, refer to the instructions in "How to run a manual NSD for Notes/Domino on Windows" (#1204263).
4. For performance and hang conditions, issue the dump command, available with the NSD debugger, to dump thread stacks. Be sure to collect the dump at least three times while the issue is occurring to determine any patterns.
5. Set up and manage HTTP thread logging.
HTTP thread logs
When you enable HTTP thread logging, Domino creates a file for each active thread with a file name as follows: htthr_processid_threadid_YYYYMMDD@HHMMSS.log.
Information about each HTTP request processed is appended to this file, with approximately 10 to 15 lines per request. These files are created in the IBM_TECHNICAL_SUPPORT folder by default, but they can be redirected using LOGFILE_DIR=<path> in the notes.ini file. By default, 40 files are created.
For an HTTP hang, you supply all HTTHR*.LOG files produced. These files are required to determine which URL might be contributing to the hang condition.
IMPORTANT NOTE: In many cases, a hang does not occur immediately after you enable HTTP thread logging. Hence, these HTTHR*.LOG files may grow in size. To avoid compiling a large amount of data not relevant to the performance situation, you should purge thread logs at least once a day until the hang occurs. The HTTHR*.LOG files can be deleted while the server is running, and new logs will be created dynamically. For more information on managing these files, refer to the following documents
"How to Manage Size of HTTP Request Log Files (REQ Files)" (#1098527)
" Overview of HTTP Request Logs for Domino Web server" (#7003598)
Additional files that may be requested:
- NOTES.INI and Server document
- Screen capture of the Task Manager on Windows
- Other operating-system level diagnostics, such as perfmon log, disk swapping, memory usage, etc.
A majority of HTTP hangs are caused by a hang of a Web-triggered agent. There are three phases in troubleshooting an HTTP hang due to an agent.
Phase 1: Determining if the hang is due to an agent
Phase 2: Finding the originating thread/URL hang
Phase 3: Determining the cause of the agent hang
Phase three requires several iterations of debug for the agent in question. In most cases, Support asks that Message Box statements be added to the agent to narrow the point of the hang. Once the point of hang in the agent code is determined, Lotus Support can move forward with a resolution.
Other possible causes for HTTP Performance or hangs include:
- CPU spin (due to excessive view rebuilds, corrupted views or documents)
- Excessive agent executions or agent hangs
- DNSLookup enabled in the Server document
- Semaphore time-out issues
- Network or port binding problems (Tip: May need to reinstall patches)
- Too much traffic on the server (as determined from Domino statistics)
Support needs to determine if the server is truly in a hang state or if it is encountering performance problems that cause it to become unresponsive for a certain period of time. To determine this, the customer removes existing HTTHR*.LOG files during the "hang" state. If new HTTHR*.LOG files are created over the course of a few minutes, then the server is actively serving new browser requests. Therefore, the HTTP task has not hung but has experienced a performance slow down. If this is the case, Support troubleshoots the issue as a performance problem, and not as a hang. If no new HTTHR*.LOG files are created over the course of a minute or two, then the server is in an actual hang state, indicated by the fact that no new Web browser requests are being serviced.
Example of a semaphore deadlock
THREAD [18898:00002-00001] WAITING FOR FRWSEM 0x0392 Database/user Global Unread List semaphore (@50E4A88C)
(R=0,W=1,WRITER=12896:01800,1STREADER=00000:00000) FOR 120000 ms
THREAD [12896:00002-01800] WAITING FOR FRWSEM 0x0244 database semaphore (@6446447C) (/data/local/notesdata/support/retain.nsf)
(R=1,W=0,WRITER=00000:00000,1STREADER=18898:00001) FOR 120000 ms
Example of HTTP Performance – Slowdown due to Semaphore
02/01/2006 11:41:25 AM EST sq="000C1146" THREAD [153C:0FAB-18BC] WAITING FOR FRWSEM 0x030B Collection semaphore (@01C14108) (R=9,W=0,WRITER=0000:0000,1STREADER=153C:0E2C) FOR 30000 ms
02/01/2006 11:41:25 AM EST sq="000C1147" THREAD [153C:0034-168C] WAITING FOR FRWSEM 0x030B Collection semaphore (@01C14108) (R=9,W=0,WRITER=0000:0000,1STREADER=153C:0E2C) FOR 30000 ms
02/01/2006 11:41:25 AM EST sq="000C1148" THREAD [153C:0030-16A8] WAITING FOR FRWSEM 0x030B Collection semaphore (@01C14108) (R=9,W=0,WRITER=0000:0000,1STREADER=153C:0E2C) FOR 30000 ms
02/01/2006 11:41:25 AM EST sq="000C114A" THREAD [153C:001D-1488] WAITING FOR FRWSEM 0x030B Collection semaphore (@01C14108) (R=9,W=0,WRITER=0000:0000,1STREADER=153C:0E2C) FOR 30000 ms
02/01/2006 11:41:25 AM EST sq="000C114B" THREAD [153C:001C-1480] WAITING FOR FRWSEM 0x030B Collection semaphore (@01C14108) (R=9,W=0,WRITER=0000:0000,1STREADER=153C:0E2C) FOR 30000 ms
From NSD (NIF Collections)
[ 14: 16388] 2430 a27dc59d 2650 35 0x0201 00000000 NO NO NO NO 0 (UserIDFull)|vUserIDFull
CIDB = [ 17: 44036]
CollSem (FRWSEM:0x030b) state=9, waiters=24, refcnt=0, nlrdrs=9 Writer=
Waiter# 1: mode=W, SemNum = 602 RefCnt= 0 [ nHTTP:153c:0eb8]
Waiter# 2: mode=W, SemNum = 688 RefCnt= 0 [ nHTTP:153c:1670]
Waiter# 3: mode=W, SemNum = 484 RefCnt= 0 [ nHTTP:153c:1fa4]
Waiter# 4: mode=R, SemNum = 635 RefCnt= 0 [ nHTTP:153c:14e4]
Waiter# 5: mode=R, SemNum = 479 RefCnt= 0 [ nHTTP:153c:1500]
Waiter# 6: mode=R, SemNum = 612 RefCnt= 0 [ nHTTP:153c:15e0]
|Submitting information to IBM Support|
After you collect the appropriate diagnostic data, you can provide that information to IBM Support. This step requires that a PMR be opened, if one does not exist. Refer to "Exchanging information with IBM Lotus Technical Support" (#1099524) for more information and steps.
Translate this page: