[SERVER-34684] --maxConns parameter limit both client and replica connections Created: 26/Apr/18  Updated: 06/Dec/22  Resolved: 09/Nov/21

Status: Closed
Project: Core Server
Component/s: Networking, Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Sergey Zagursky Assignee: Backlog - Service Architecture
Resolution: Won't Do Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-34884 Replication executor creates connecti... Closed
Assigned Teams:
Service Arch
Participants:

 Description   

In the case when `mongod` hits connection limit it stops accepting connection both from clients and from other replicas in its replicaset.
When something particularily bad happens on MongoDB replicaset and `mongod` exhausts connection limit, the only way to restore replicaset service is to restart all `mongod` processes.
I consider `--maxConns` in its current state harmful for replicasets. I propose two ways for fixing the issue:
1. Either reserve some quota for replicas or by any other means move connections from replicas out of connection limit.
2. Forbid usage of `--maxConns` for replicasets.



 Comments   
Comment by Sergey Zagursky [ 10/Nov/21 ]

@Lauren Lewis, your link leads to SERVER-31541 "Measure CPU utilization for specific thread / connection". Is this the way to address the issue? It is closed with "Won't Do" btw. I suggest you've provided wrong link.

Comment by Lauren Lewis (Inactive) [ 09/Nov/21 ]

We, the Server Team, plan to address this through a related initiative. Closing & will investigate further through the initiative.
https://jira.mongodb.org/secure/AddComment!default.jspa?id=444741

Comment by Peter Ivanov [ 18/Dec/19 ]

Are there any current development plans relevant to this issue?

Comment by Mira Carey [ 08/May/18 ]

There are a few issues at play with limiting the number of connections to a mongod in a replicaset:

  1. Heartbeats between nodes.  These you want to avoid privileging, because you probably don't want a node as primary that can't accept new writes.
  2. Reserving a way to connect to run administrative commands.  These seem fine to reserve.
  3. Reserving a way to connect for secondaries that are replicating from the primary.  This seems important for ensuring that when an election does happen, as many non-majority writes as possible are preserved.

This is a reasonable ask, and we'll look into providing a way to ensure that cases 2 and 3 work if --maxConns is reached.

Comment by Sergey Zagursky [ 27/Apr/18 ]

It looks like Fortnite suffered from this issue too: https://www.epicgames.com/fortnite/en-US/news/postmortem-of-service-outage-4-12

Generated at Thu Feb 08 04:37:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.