[SERVER-70968] mongodb 4.4 balance not completed more than 48hours for one collection Created: 31/Oct/22  Updated: 01/Nov/22  Resolved: 31/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: jing xu Assignee: Yuan Fang
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

1、balance status
active mongoses:
"4.4.12" : 6
autosplit:
Currently enabled: yes
balancer:
Currently enabled: no
Currently running: yes
Balancer active window is set between 00:00 and 06:00 server local time
Collections with active migrations:
call.CallManHourReport started at Sat Oct 29 2022 16:05:15 GMT+0800 (CST)
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations

2、locks & migrations
mongos> db.locks.find({_id:"call.CallManHourReport"})

{ "_id" : "call.CallManHourReport", "state" : 2, "process" : "ConfigServer", "ts" : ObjectId("62c3a7acc5b092293290dbb3"), "when" : ISODate("2022-10-29T08:05:15.379Z"), "who" : "ConfigServer:Balancer", "why" : "Migrating chunk(s) in collection call.CallManHourReport" }

mongos> db.migrations.find();
{ "_id" : ObjectId("635cdebbc5b09229324072cb"), "ns" : "call.CallManHourReport", "min" : { "code" :

{ "$minKey" : 1 }

, "reportDate" :

{ "$minKey" : 1 }

}, "max" :

{ "code" : "-1", "reportDate" : ISODate("2022-06-28T08:00:00Z") }

, "fromShard" : "shard3", "toShard" : "shard2", "chunkVersion" : [ Timestamp(155, 0), ObjectId("62a97d167e80b2794633c81e") ], "waitForDelete" : false, "forceJumbo" : "doNotForceJumbo" }
mongos>

3、shard3 log
{"t":

{"$date":"2022-10-29T16:05:15.393+08:00"}

,"s":"I", "c":"MIGRATE", "id":22016, "ctx":"MoveChunk","msg":"Starting chunk migration donation","attr":{"requestParameters":"ns: call.CallManHourReport, [

{ code: MinKey, reportDate: MinKey }

,

{ code: \"-1\", reportDate: new Date(1656403200000) }

), fromShard: shard3, toShard: shard2","collectionEpoch":

{"$oid":"62a97d167e80b2794633c81e"}

}}

{"t":

{"$date":"2022-10-29T16:05:15.397+08:00"}

,"s":"I", "c":"SHARDING", "id":22080, "ctx":"MoveChunk","msg":"About to log metadata event","attr":{"namespace":"changelog","event":{"_id":"srvdb307.yto.cloud:21003-2022-10-29T16:05:15.397+08:00-635cdebb9d1133e1bb27584a","server":"srvdb307.yto.cloud:21003","shard":"shard3","clientAddr":"","time":

{"$date":"2022-10-29T08:05:15.397Z"}

,"what":"moveChunk.start","ns":"call.CallManHourReport","details":{"min":{"code":

{"$minKey":1}

,"reportDate":{"$minKey":1}},"max":{"code":"-1","reportDate":{"$date":"2022-06-28T08:00:00.000Z"}},"from":"shard3","to":"shard2"}}}}

{"t":

{"$date":"2022-10-29T16:05:21.896+08:00"}

,"s":"I", "c":"SHARDING", "id":21993, "ctx":"MoveChunk","msg":"moveChunk data transfer progress","attr":{"response":{"waited":true,"active":true,"sessionId":"shard3_shard2_635cdebb9d1133e1bb2758a4","ns":"call.CallManHourReport","from":"shard3/mongo31.prd.db:21003,mongo33.prd.db:21003,mongo90.prd.db:21003","fromShardId":"shard3","min":{"code":

{"$minKey":1}

,"reportDate":{"$minKey":1}},"max":{"code":"-1","reportDate":{"$date":"2022-06-28T08:00:00.000Z"}},"shardKeyPattern":

{"code":1.0,"reportDate":1.0}

,"supportsCriticalSectionDuringCatchUp":true,"state":"ready","counts":

{"cloned":0,"clonedBytes":0,"catchup":0,"steady":0}

,"ok":1.0,"$gleStats":{"lastOpTime":{"ts":{"$timestamp":{"t":1667030715,"i":5211}},"t":10},"electionId":{"$oid":"7fffffff000000000000000a"}},"lastCommittedOpTime":{"$timestamp":{"t":1667030721,"i":6665}},"$configServerState":{"opTime":{"ts":{"$timestamp":{"t":1667030721,"i":4717}},"t":10}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1667030721,"i":6954}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"operationTime":{"$timestamp":

{"t":1667030721,"i":6855}

}},"memoryUsedBytes":0,"docsRemainingToClone":0}}

{"t":

{"$date":"2022-10-29T21:56:14.717+08:00"}

,"s":"I", "c":"SHARDING", "id":21993, "ctx":"MoveChunk","msg":"moveChunk data transfer progress","attr":{"response":{"waited":true,"active":true,"sessionId":"shard3_shard2_635cdebb9d1133e1bb2758a4","ns":"call.CallManHourReport","from":"shard3/mongo31:21003,mongo33:21003,mongo90:21003","fromShardId":"shard3","min":{"code":

{"$minKey":1}

,"reportDate":{"$minKey":1}},"max":{"code":"-1","reportDate":{"$date":"2022-06-28T08:00:00.000Z"}},"shardKeyPattern":

{"code":1.0,"reportDate":1.0}

,"supportsCriticalSectionDuringCatchUp":true,"state":"ready","counts":

{"cloned":0,"clonedBytes":0,"catchup":0,"steady":0}

,"ok":1.0,"$gleStats":{"lastOpTime":{"ts":{"$timestamp":{"t":1667030715,"i":5211}},"t":10},"electionId":{"$oid":"7fffffff000000000000000a"}},"lastCommittedOpTime":{"$timestamp":{"t":1667051774,"i":1151}},"$configServerState":{"opTime":{"ts":{"$timestamp":{"t":1667051774,"i":1105}},"t":10}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1667051774,"i":1165}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"operationTime":{"$timestamp":

{"t":1667051774,"i":1163}

}},"memoryUsedBytes":0,"docsRemainingToClone":0}}

4、shard2 log
"t":

{"$date":"2022-10-29T16:05:15.890+08:00"}

,"s":"I", "c":"MIGRATE", "id":22000, "ctx":"migrateThread","msg":"Starting receiving end of chunk migration","attr":{"chunkMin":{"code":

{"$minKey":1}

,"reportDate":{"$minKey":1}},"chunkMax":{"code":"-1","reportDate":{"$date":"2022-06-28T08:00:00.000Z"}},"namespace":"call.CallManHourReport","fromShard":"shard3","epoch":

{"$oid":"62a97d167e80b2794633c81e"}

,"sessionId":"shard3_shard2_635cdebb9d1133e1bb2758a4","migrationId":{"uuid":{"$uuid":"b9839ef9-7e7b-4567-9c63-d88db8847b68"}}}}

{"t":

{"$date":"2022-10-29T16:05:16.664+08:00"}

,"s":"I", "c":"MIGRATE", "id":22001, "ctx":"migrateThread","msg":"Migration paused because the requested range overlaps with a range already scheduled for deletion","attr":{"namespace":"call.CallManHourReport","range":"[

{ code: MinKey, reportDate: MinKey }

,

{ code: \"-1\", reportDate: new Date(1656403200000) }

)","migrationId":{"uuid":{"$uuid":"b9839ef9-7e7b-4567-9c63-d88db8847b68"}}}}
{"t":

{"$date":"2022-10-29T16:05:16.664+08:00"}

,"s":"I", "c":"MIGRATE", "id":21919, "ctx":"migrateThread","msg":"Waiting for deletion of orphans","attr":{"namespace":"call.CallManHourReport","orphanRange":{"min":{"code":

{"$minKey":1}

,"reportDate":{"$minKey":1}},"max":{"code":"-1","reportDate":

{"$date":"2022-06-28T08:00:00.000Z"}

}}}}

{"t":

{"$date":"2022-10-29T16:05:16.886+08:00"}

,"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn1452484","msg":"Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"_recvChunkStatus":"call.CallManHourReport","waitForSteadyOrDone":true,"sessionId":"shard3_shard2_635cdebb9d1133e1bb2758a4","$clusterTime":{"clusterTime":{"$timestamp":{"t":1667030715,"i":5217}},"signature":{"hash":{"$binary":{"base64":"SxYdDiYZbqc2ggOGyZw1Z99ZbBY=","subType":"0"}},"keyId":7124137948378824973}},"$configServerState":{"opTime":{"ts":{"$timestamp":{"t":1667030715,"i":3523}},"t":10}},"$db":"admin"},"numYields":0,"reslen":806,"locks":{},"protocol":"op_msg","durationMillis":1000}}

{"t":

{"$date":"2022-10-29T16:05:17.888+08:00"}

,"s":"I", "c":"COMMAND", "id":51803, "ctx":"conn1452484","msg":"Slow query","attr":{"type":"command","ns":"admin.$cmd","command":{"_recvChunkStatus":"call.CallManHourReport","waitForSteadyOrDone":true,"sessionId":"shard3_shard2_635cdebb9d1133e1bb2758a4","$clusterTime":{"clusterTime":{"$timestamp":{"t":1667030716,"i":4993}},"signature":{"hash":{"$binary":{"base64":"VPtuOvnu6X7WSDrWmYIqsa1sMoU=","subType":"0"}},"keyId":7124137948378824973}},"$configServerState":{"opTime":{"ts":{"$timestamp":{"t":1667030715,"i":3523}},"t":10}},"$db":"admin"},"numYields":0,"reslen":806,"locks":{},"protocol":"op_msg","durationMillis":1001}}



 Comments   
Comment by jing xu [ 01/Nov/22 ]

call.CallManHourReport started at Sat Oct 29 2022 16:05:15 GMT+0800 (CST)

Comment by Yuan Fang [ 31/Oct/22 ]

Hi 601290552@qq.com,

Thank you for your report. From the balancer status that you provided, it seems the balancer hasn't been enabled.  

balancer:
Currently enabled: no

Could you check if the balancer is enabled and see if enabling it would solve the issue?

For this issue, we'd like to encourage you to start by asking our community for help by posting on the MongoDB Developer Community Forums.

If the discussion there leads you to suspect a bug in the MongoDB server, then we'd want to investigate it as a possible bug here in the SERVER project.

Regards,
Yuan

Generated at Thu Feb 08 06:17:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.