This blog promotes knowledge sharing through experience and collaboration. For more product information, visit our WebSphere Commerce CSE page. For easier navigation, utilize the Categories to find posts that match your interest.
Fail on your own terms - Plug-in failover options
The job of the WebSphere plug-in, installed with the IHS server, is to balance load across servers in the WebSphere Application Server cluster. An important aspect of this work relates to failover. What should the plug-in do if a server fails to provide a timely response? A tuned plug-in configuration allows for better load distribution and faster recovery.
Next, I'll describe some of the most important settings for failover, including new features from recent fix packs:
MaxConnections determines the maximum number of pending (waiting for AppServer to respond) connections a JVM can have before it is considered overloaded. Once MaxConnections is reached, the plugin stops routing new requests to that server.
Although very recommended, tuning MaxConnections can be tricky. Each plug-in process keeps track of the connections independently. Let's say you have 2 Web Servers and each server allows for 5 processes each (MaxClients). If MaxConnections is set to 20, each JVM can potentially have up to 200 pending requests ( 2 IHS * 5 MaxClients * 20 MaxConnections ). Still, even if the number is higher than what you would like, it is better than allowing unlimited connections.
The ServerIOTimeout fires if a request hasn't responded after n seconds.
RetryInterval is the amount of the time the plug-in waits before trying to use a server that was previously marked down. It is most important when the ServerIOTimeout is configured to mark servers down (using a negative number), but it will also be used with connect timeouts. Review the Recommended values for web server plug-in config technote for pointers to choose a value. The default of 60 seconds is reasonable. Same as with ServerIOTimeout, the impact of this setting is larger with small clusters.
ServerIOTimeoutRetry is available since 220.127.116.11. Before this setting, when a request hit ServerIOTimeout, the plug-in would continue retrying it on every server in cluster. This meant, if you have 20 JVMs and the farm was down, each request could be retried up to 20 times!
For more info see: PM70559: LIMIT NUMBER OF RETRIES WHEN TIMEOUT OCCURS AT THE WEBSERVER PLUGIN
Request specific time-outs
This option is brand new. It is available since 18.104.22.168 and higher. You can now use different time-out values and retries depending on the request by defining any of the following variables in httpd.conf using SetEnv or SetEnvIf:
For example, if there is a URL that is slower than the rest (e.g. OrderProcess due to backend calls), you can use a specific ServerIOTimeout value
For more info see: PM94198: TIMEOUTS MUST BE THE SAME FOR EVERY URL ON A SERVER
As per the Recommended values for web server plug-in config technote, it is recommended to configure a timeout for connects. A value of 5 seconds is appropriate for most sites.