A small value for the backlog parameter to listen() can result in very high latencies as seen by the client when an increase in application load or a decrease in db performance requires the client to create a large number of connections nearly simultaneously (i.e. a "connections spike").
When a connection spike occurs, mongod may not be able to accept() the connections as quickly as they are created. When the number of connections that have been created but not accepted by mongod exceeds the backlog, the SYN packets to establish the connections are ignored. The ignored clients then retry the SYN packets 1 second later, creating another smaller spike. If this spike also exceeds the backlog the result will be more ignored SYN packets, and another spike of retries 2 seconds later, and so on, with the clients doubling the backoff time on each attempt. Since the backoff is exponential, some unlucky connections can wait a very long time - tens of seconds - resulting in extreme operation latencies as seen by the client.
This can be seen in the following two runs of the same client starts 5000 threads creating a connection spike of 5000 connections. The first run is with a backlog setting of 10000, while the second is with the default backlog of 128.
At the default backlog setting on the right the network stats report a large number of "listen drops" and this results in the series of connection spikes of decreasing sizes and extreme latencies (as seen by the client) of ~64 s ("slowest query durations") that we see here.
With the larger backlog setting on the left there is no exponential backoff - connections are queued by the kernel for mongod to accept at the rate it can, eliminating the extreme latency outliers.
3.6 introduces a parameter "--listenBacklog" to allow setting the backlog. However there are two problems with this:
- the default is SOMAXCONN (which is 128)
- it isn't sufficient just to increase --listenBacklog, as the value is silently truncated at the value of the kernel parameter net.core.somaxconn, which by default is also 128, so the user must also increase this.
Rather than requiring the user to change two parameters, it would seem more straightforward for mongod to specify a large value in listen() by default, allowing the customer to control the backlog just by changing net.core.somaxconn.