-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: None
-
None
-
Networking & Observability
-
ALL
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Overview
On non-Linux platforms, markKillOnClientDisconnect() fails to detect that a client has disconnected when there is unread data in the socket receive buffer. The periodic isConnected() fallback used by the DefaultBaton sees the buffered bytes and concludes the peer is still alive.
Problem
On Linux, AsioNetworkingBaton::markKillOnClientDisconnect() registers POLLRDHUP on the session socket, which detects a peer's half-close regardless of buffered data. On non-Linux platforms, AsioNetworkingBaton is not available, so the DefaultBaton fallback is used. This fallback periodically calls session->isConnected(), which polls for POLLIN and peeks a byte.
When there is any unread data in the socket receive buffer and the client disconnects:
- POLLIN is set (unread data still in buffer)
- MSG_PEEK succeeds (reads a buffered byte)
- isConnected() returns true
- The disconnect is never detected, and markKilled(ClientDisconnect) is never called. Any operation sleeping on the {{opCtx }}continues indefinitely.
Impact
Any server-side code path that calls markKillOnClientDisconnect() and then blocks before fully consuming the client's request from the socket is affected. One concrete example is the ingress connection establishment rate limiter, which blocks in throttleIfNeeded() before the hello command is read. Dead connections remain in the rate limiter queue for the full wait duration, occupying queue slots and causing the interruptedDueToClientDisconnect metric to under-report.
The existing JS test connection_establishment_rate_limiting_client_disconnect.js already works around this by only asserting on Linux.
Root Cause
CommonAsioSession::isConnected() (asio_session_impl.cpp:438) uses POLLIN + MSG_PEEK to check connectivity. This cannot distinguish "buffered data from a live peer" from "buffered data from a dead peer" — a half-closed socket with unread data looks identical to a healthy one. Linux avoids this entirely via POLLRDHUP in the AsioNetworkingBaton, but the DefaultBaton has no equivalent mechanism.
Possible Approaches
- Extend DefaultBaton to use a platform-appropriate socket state check (e.g. POLLHUP on macOS/BSD)
- Improve isConnected() to also check POLLHUP/POLLERR when POLLIN is set
- Investigate whether a non-blocking read-for-EOF probe could work cross-platform as an alternative to MSG_PEEK
- related to
-
SERVER-123227 Special-case tests that are looking for interruptedDueToClientDisconnect on non-Linux platforms
-
- In Code Review
-