Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.0.1
Component/s: Sharding
Labels:
None

Assigned Teams:

Sharding
Operating System:
ALL
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

In a sharded cluster, a long running query can cause a shard to refresh the routing table history multiple times. If the sharded cluster is very large, this routing table history can take up a large amount of space and eventually lead to an OOM.

Here is a snapshot of call stacks that show 6.5 GB being used solely to update the routing table history.

Here is the balancer information:

balancer:
        Currently enabled:  yes
        Currently running:  yes
        Collections with active migrations: 
                buildlogs.logs started at Tue Jan 08 2019 21:57:37 GMT+0000 (UTC)
        Failed balancer rounds in last 5 attempts:  0
        Migration Results for the last 24 hours: 
                2990 : Success
                1 : Failed with error 'aborted', from logkeeperdb-shard_26 to logkeeperdb-shard_24
                1 : Failed with error 'aborted', from logkeeperdb-shard_26 to logkeeperdb-shard_21
                2 : Failed with error 'aborted', from logkeeperdb-shard_14 to logkeeperdb-rs0
                1 : Failed with error 'aborted', from logkeeperdb-shard_9 to logkeeperdb-shard_18
                1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_21
                1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_17
                1 : Failed with error 'aborted', from logkeeperdb-shard_15 to logkeeperdb-shard_22
                1 : Failed with error 'aborted', from logkeeperdb-shard_17 to logkeeperdb-shard_12
                1 : Failed with error 'aborted', from logkeeperdb-shard_14 to logkeeperdb-shard_4
                1 : Failed with error 'aborted', from logkeeperdb-shard_22 to logkeeperdb-shard_13
                1 : Failed with error 'aborted', from logkeeperdb-shard_13 to logkeeperdb-shard_8
                1 : Failed with error 'aborted', from logkeeperdb-shard_17 to logkeeperdb-shard_20

mongos> db.chunks.find({ns: "buildlogs.logs"}).count()
1303476

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Screen Shot 2019-01-08 at 5.10.45 PM.png
411 kB
Jan 10 2019 04:52:59 PM UTC

duplicates

SERVER-36443 Long-running queries should not cause a build-up of unused ChunkManager objects

Closed

Assignee:: [DO NOT USE] Backlog - Sharding Team
Reporter:: Danny Hatcher (Inactive)
Participants:: [DO NOT USE] Backlog - Sharding Team, Danny Hatcher
Votes:: 0 Vote for this issue
Watchers:: 12 Start watching this issue

Created:: Jan 10 2019 04:56:47 PM UTC
Updated:: Dec 06 2022 03:08:37 AM UTC
Resolved:: Jan 15 2019 05:17:06 PM UTC

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates