Details
-
Bug
-
Resolution: Duplicate
-
Major - P3
-
None
-
4.0.1
-
None
-
Sharding
-
ALL
Description
In a sharded cluster, a long running query can cause a shard to refresh the routing table history multiple times. If the sharded cluster is very large, this routing table history can take up a large amount of space and eventually lead to an OOM.
Here is a snapshot of call stacks that show 6.5 GB being used solely to update the routing table history.
Here is the balancer information:
balancer:
|
Currently enabled: yes
|
Currently running: yes
|
Collections with active migrations:
|
buildlogs.logs started at Tue Jan 08 2019 21:57:37 GMT+0000 (UTC)
|
Failed balancer rounds in last 5 attempts: 0
|
Migration Results for the last 24 hours:
|
2990 : Success
|
1 : Failed with error 'aborted', from logkeeperdb-shard_26 to logkeeperdb-shard_24
|
1 : Failed with error 'aborted', from logkeeperdb-shard_26 to logkeeperdb-shard_21
|
2 : Failed with error 'aborted', from logkeeperdb-shard_14 to logkeeperdb-rs0
|
1 : Failed with error 'aborted', from logkeeperdb-shard_9 to logkeeperdb-shard_18
|
1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_21
|
1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_17
|
1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_22
|
1 : Failed with error 'aborted', from logkeeperdb-shard_17 to logkeeperdb-shard_12
|
1 : Failed with error 'aborted', from logkeeperdb-shard_14 to logkeeperdb-shard_4
|
1 : Failed with error 'aborted', from logkeeperdb-shard_22 to logkeeperdb-shard_13
|
1 : Failed with error 'aborted', from logkeeperdb-shard_13 to logkeeperdb-shard_8
|
1 : Failed with error 'aborted', from logkeeperdb-shard_17 to logkeeperdb-shard_20
|
mongos> db.chunks.find({ns: "buildlogs.logs"}).count()
|
1303476
|
Attachments
Issue Links
- duplicates
-
SERVER-36443 Long-running queries should not cause a build-up of unused ChunkManager objects
-
- Closed
-