The job of the WebSphere plug-in, installed with the IHS server, is to balance load across servers in the WebSphere Application Server cluster. An important aspect of this work relates to failover. What should the plug-in do if a server fails to provide a timely response? A tuned plug-in configuration allows for better load distribution and faster recovery.
There is some good documentation for how the plug-in works. If you are unfamiliar or need a good reference, check out these links:
Next, I'll describe some of the most important settings for failover, including new features from recent fix packs:
MaxConnections determines the maximum number of pending (waiting for AppServer to respond) connections a JVM can have before it is considered overloaded. Once MaxConnections is reached, the plugin stops routing new requests to that server.
Imagine a JVM is experiencing problems: If MaxConnections is not configured, the plug-in can route unlimited requests to it until there is a time-out (Connect or ServerIO) and the JVM is marked down. If the JVM recovers, it will find thousands of requests waiting for processing. All these requests are likely to overload the JVM, possibly leading to high CPU and further time-outs. The higher the number of pending requests, the longer the server will take to recover.
If the JVM doesn't recover, the plug-in will failover the pending requests to another server. This impacts user experience, as the shopper needs to wait for the failover to happen before the request can complete on another server.
Although very recommended, tuning MaxConnections can be tricky. Each plug-in process keeps track of the connections independently. Let's say you have 2 Web Servers and each server allows for 5 processes each (MaxClients). If MaxConnections is set to 20, each JVM can potentially have up to 200 pending requests ( 2 IHS * 5 MaxClients * 20 MaxConnections ). Still, even if the number is higher than what you would like, it is better than allowing unlimited connections.
The ServerIOTimeout fires if a request hasn't responded after n seconds.
If the timeout uses a negative value, the server is marked down after the timeout. The server wont be used again until after the RetryInterval has elapsed.
If a positive value is used for ServerIOTimeout, the particular request will be failed over but the server will not be marked down.
Most customers prefer to mark the servers down, but be careful. If you have two servers and a slow request hits a negative ServerIOTimeout value, the server will be marked down and you will lose 1/2 of your capacity. Remember, it is possible to have that odd slow request even when a server is responsive overall.
The WebSphere transaction timeout should be smaller than the ServerIOTimeout. There is no point in allowing the request to continue in the Application Server if the client is no longer waiting, or if the plug-in started the request on another server
RetryInterval is the amount of the time the plug-in waits before trying to use a server that was previously marked down. It is most important when the ServerIOTimeout is configured to mark servers down (using a negative number), but it will also be used with connect timeouts. Review the Recommended values for web server plug-in config technote for pointers to choose a value. The default of 60 seconds is reasonable. Same as with ServerIOTimeout, the impact of this setting is larger with small clusters.
ServerIOTimeoutRetry is available since 184.108.40.206. Before this setting, when a request hit ServerIOTimeout, the plug-in would continue retrying it on every server in cluster. This meant, if you have 20 JVMs and the farm was down, each request could be retried up to 20 times!
With ServerIOTimeoutRetry you can now control the number of retries. Use a small number, but before deciding on a value, see section 'Request specific time-outs' coming next.
For more info see: PM70559: LIMIT NUMBER OF RETRIES WHEN TIMEOUT OCCURS AT THE WEBSERVER PLUGIN
Request specific time-outs
This option is brand new. It is available since 220.127.116.11 and higher. You can now use different time-out values and retries depending on the request by defining any of the following variables in httpd.conf using SetEnv or SetEnvIf:
For example, if there is a URL that is slower than the rest (e.g. OrderProcess due to backend calls), you can use a specific ServerIOTimeout value
for this command, allowing you to use a more aggressive timeout on the rest of the site.
SetEnvIf Request_URI ^/webapp/wcs/stores/servlet/AjaxOrderProcessServiceOrderSubmit$ websphere-serveriotimeout=60
Or if you want a single retry (or no retries) for all browse (non-SSL) requests:
Alias /wcsstore "/opt/WebSphere/AppServer/profiles/demo/installedApps/WC_demo_cell/WC_demo.ear/Stores.war"
SetEnv websphere-serveriotimeoutretry 1
For more info see: PM94198: TIMEOUTS MUST BE THE SAME FOR EVERY URL ON A SERVER
As per the Recommended values for web server plug-in config technote, it is recommended to configure a timeout for connects. A value of 5 seconds is appropriate for most sites.