[SERVER-85590] Mongos startup time increase linearly with the amount of databases in the cluster Created: 23/Jan/24 Updated: 05/Feb/24 Resolved: 05/Feb/24 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Admin, Sharding |
| Affects Version/s: | 5.0 Required, 4.4 Required, 6.0 Required |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Anthony Brodard | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | mongos | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Mongodb CE 4.4.27 - 5.0.24 - Ubuntu 22.04 |
||
| Operating System: | ALL |
| Steps To Reproduce: |
|
| Participants: |
| Description |
|
Hello, On startup, mongos runs 1 query per database to the `config.collections` collection. On cluster with large amount of databases, it will have an important impact on the process startup time. Moreover, as mongos uses `ReadConcern: majority` to communicate with config replica, the startup time can also vary depending to the network latency between config members and the mongos. Here are the kind of requests sent by mongos to config RS on startup: ``` {"t":\{"$date":"2024-01-22T13:14:58.987+00:00"},"s":"D4", "c":"ASIO", "id":22596, "ctx":"mongosMain","msg":"startCommand","attr":{"request":"RemoteCommand 939 – target:[cfg-1.internal:37001] db:config expDate:2024-01-22T13:15:28.987+00:00 cmd:{ find: \"collections\", filter: { _id: /^db_20240119160851\\-800\\./ }, readConcern: { level: \"majority\", afterOpTime: { ts: Timestamp(1705929298, 1), t: 2 }}, maxTimeMS: 30000 }"}} {"t":\{"$date":"2024-01-22T13:14:58.989+00:00"},"s":"D4", "c":"ASIO", "id":22596, "ctx":"mongosMain","msg":"startCommand","attr":{"request":"RemoteCommand 940 – target:[cfg-1.internal:37001] db:config expDate:2024-01-22T13:15:28.989+00:00 cmd:{ find: \"collections\", filter: { _id: /^db_20240119160851\\-801\\./ }, readConcern: { level: \"majority\", afterOpTime: { ts: Timestamp(1705929298, 1), t: 2 }}, maxTimeMS: 30000 }"}} {"t":\{"$date":"2024-01-22T13:14:58.991+00:00"},"s":"D4", "c":"ASIO", "id":22596, "ctx":"mongosMain","msg":"startCommand","attr":{"request":"RemoteCommand 941 – target:[cfg-2.internal:37001] db:config expDate:2024-01-22T13:15:28.991+00:00 cmd:{ find: \"collections\", filter: { _id: /^db_20240119160851\\-802\\./ }, readConcern: { level: \"majority\", afterOpTime: { ts: Timestamp(1705929298, 1), t: 2 }}, maxTimeMS: 30000 }"}} {"t":\{"$date":"2024-01-22T13:14:59.004+00:00"},"s":"D4", "c":"ASIO", "id":22596, "ctx":"mongosMain","msg":"startCommand","attr":{"request":"RemoteCommand 942 – target:[cfg-3.internal:37001] db:config expDate:2024-01-22T13:15:29.004+00:00 cmd:{ find: \"collections\", filter: { _id: /^db_20240119160851\\-803\\./ }, readConcern: { level: \"majority\", afterOpTime: { ts: Timestamp(1705929298, 1), t: 2 }}, maxTimeMS: 30000 }"}} ``` In our case, we are operating a cluster with more than 450k databases. Config servers is distributed between 3 sites:
When the mongos start, as the queries as distributed equally to all config member, it takes approximately 30 minutes to start, which can be confirmed by doing the maths: `150k * 1ms + 150k * 3ms + 150k * 10ms = 35 minutes` We tried to set a lower `–localThreshold`, but it apparently only applies to communications between mongos and data shard. We are able to reproduce this behaviour by creating 25k on a testing env:
We were able to reproduce it on 5.0.24, but not on 6.0.13.
``` {"t":\{"$date":"2024-01-23T14:18:32.675+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"mongosMain","msg":"Build Info","attr":{"buildInfo":{"version":"4.4.22","gitVersion":"fc832685b99221cffb1f5bb5a4ff5ad3e1c416b2","openSSLVersion": ,"s":"I", "c":"NETWORK", "id":23016, "ctx":"listener","msg":"Waiting for connections","attr":{"port":57000,"ssl":"off"}} ```
``` {"t":\{"$date":"2024-01-23T15:01:52.077+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"mongosMain","msg":"Build Info","attr":{"buildInfo":{"version":"5.0.24","gitVersion":"f034f0c51b3dffef4b8c9452d77ede9888f28f66","openSSLVersion": ,"s":"I", "c":"NETWORK", "id":23016, "ctx":"listener","msg":"Waiting for connections","attr":{"port":57000,"ssl":"off"}} ```
``` {"t":\{"$date":"2024-01-23T15:53:10.296+00:00"},"s":"I", "c":"CONTROL", "id":23403, "ctx":"mongosMain","msg":"Build Info","attr":{"buildInfo":{"version":"6.0.13","gitVersion":"3b13907f9bdf6bd3264d67140d6c215d51bbd20c","openSSLVersion": ``` Also, is it possible to backport the behaviour on 6.0.13 to upcoming 4.4 and 5.0 versions? Thank you! |
| Comments |
| Comment by Eric Sedor [ 05/Feb/24 ] |
|
Hi anthony@sendinblue.com, thanks for your patience. Specific improvements are backported where reasonable, but it's pretty unlikely that we'll backport to earlier versions the breadth of recent improvements for handling large numbers of namespaces. Hopefully moving to MongoDB 6.0 is an option for you! Sincerely, |