|
When a MongoDB server or cluster doesn't respond quickly enough to client requests, the clients may decide to time out requests. This can produce poor user experiences, but from the perspective of the server it can look like clients just went away, because the response could not be delivered.
It would be better if clients could note all of their failed transactions and pass a summary of these failed transactions to the server on their next request. The server could collect these datagrams and store them in a capped or TTL collection for use by a monitoring tool such as MMS.
In a situation where clients time out requests in 30 seconds but a busy server responds in 35 seconds, the end-user of the client may simply see a failure, but the server only sees that the response to the initial query could not be delivered. If the client sent a short indication that the previous query had timed out on its next request to the server, then this data could be reported to a monitoring tool, allowing visibility into client-side perceptions of the health of the system.
It is not feasible to collect this data directly from the clients, but if clients were enhanced to provide this data as part of their normal interaction with the server, then it would be possible to monitor a system for client timeouts and to correlate this information with other server-side data to view total system transaction failure rates.
Doing this methodically for all drivers would take a while, but adding the server-side ability to accept this data and provide it to a monitoring tool would enable drivers to enable this capability one-by-one.
A first pass at this feature could collect data only for individual servers (mongod and mongos) and not attempt to aggregate or even propagate it. If the feature is useful, a later pass could try to make cluster-wide information available to monitoring tools.
|