[SERVER-62695] 【move chunk error】migrate failed: DuplicateKey: operation was interrupted Created: 18/Jan/22  Updated: 27/Oct/23  Resolved: 18/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: li jiaman Assignee: Unassigned
Resolution: Community Answered Votes: 0
Labels: ChunkMigrationRefactor, balancer
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongodb shard集群chunk迁移失败-20220118.txt    
Operating System: ALL
Participants:

 Description   

(1)we find chunk Uneven distribution

mongos> show dbs;mongos> show dbs;admin              0.000GBpro_db1            8858.442GBconfig             0.194GBtest               0.000GBmongos> use pro_db1switched to db pro_db1mongos> mongos> mongos> mongos> sh.status()--- Sharding Status ---   sharding version: {   "_id" : 1"minCompatibleVersion" : 5"currentVersion" : 6"clusterId" : ObjectId("6018fee5be7133283e1eb522")  }  shards:        {  "_id" : "audit-platform_ZzcNqamf_shard_1""host" : "audit-platform_ZzcNqamf_shard_1/10.131.212.121:20000,10.131.26.63:20000,10.131.26.65:20000""state" : 1 }        {  "_id" : "audit-platform_oDOgDaDZ""host" : "audit-platform_oDOgDaDZ/10.131.172.191:20000,10.131.172.192:20000,10.131.172.193:20000""state" : 1 }        {  "_id" : "audit-platform_qeNUgkwj""host" : "audit-platform_qeNUgkwj/10.129.89.117:20000,10.129.89.118:20000,10.131.95.42:20000""state" : 1 }  active mongoses:        "3.6.10" : 3  autosplit:        Currently enabled: yes  balancer:        Currently enabled:  yes        Currently running:  no        Failed balancer rounds in last 5 attempts:  0        Migration Results for the last 24 hours:                 7356 : Failed with error 'aborted', from audit-platform_ZzcNqamf_shard_1 to audit-platform_oDOgDaDZ  databases:        {  "_id" : "pro_db1""primary" : "audit-platform_ZzcNqamf_shard_1""partitioned" : true }                pro_db1.product_2014_20                        shard key: { "biz_id" : "hashed" }                        unique: false                        balancing: true                        chunks:                                audit-platform_ZzcNqamf_shard_1 19576                                audit-platform_oDOgDaDZ 19576                                audit-platform_qeNUgkwj 19574                        too many chunks to print, use verbose if you want to force print                pro_db1.product_2041                        shard key: { "biz_id" : "hashed" }                        unique: false                        balancing: true                        chunks:                                audit-platform_ZzcNqamf_shard_1 235108                                audit-platform_oDOgDaDZ 19017                                audit-platform_qeNUgkwj 19018                        too many chunks to print, use verbose if you want to force print        {  "_id" : "config""primary" : "config""partitioned" : true }                config.system.sessions                        shard key: { "_id" : 1 }                        unique: false                        balancing: true                        chunks:                                audit-platform_ZzcNqamf_shard_1 1                        { "_id" : { "$minKey" : 1 } } -->> { "_id" : { "$maxKey" : 1 } } on : audit-platform_ZzcNqamf_shard_1 Timestamp(1, 0)         {  "_id" : "test""primary" : "audit-platform_ZzcNqamf_shard_1""partitioned" : false }
mongos> 

(2)go to shard "platform_ZzcNqamf_shard_1" primary node ,log info:

2022-01-17T17:28:38.518+0800 I SHARDING [conn82] about to log metadata event into changelog: { _id: "10-131-26-65.mongodb-fatpod-audit-platform.cp01-2022-01-17T17:28:38.518+0800-61e536c69f56bc1b2025e6ef", server: "10-131-26-65.mongodb-fatpod-audit-platform.cp01", clientAddr: "10.130.145.75:50376", time: new Date(1642411718518), what: "moveChunk.from", ns: "pro_db1.product_2041", details: { min: { biz_id: -6656423545813652642 }, max: { biz_id: -6656354113587583176 }, step 1 of 6: 0, step 2 of 6: 170, step 3 of 6: 204, to: "audit-platform_oDOgDaDZ", from: "audit-platform_ZzcNqamf_shard_1", note: "aborted" } }

(3)check configdb,find error:

2022-01-18T10:10:30.240+0800 I SHARDING [Balancer] Balancer move pro_db1.product_2041: [{ biz_id: -6656423545813652642 }, { biz_id: -6656354113587583176 }), from audit-platform_ZzcNqamf_shard_1, to audit-platform_oDOgDaDZ failed :: caused by :: OperationFailed: Data transfer error: migrate failed: DuplicateKey: operation was interrupted

How should I solve this problem 



 Comments   
Comment by Dmitry Agranat [ 18/Jan/22 ]

Hi 961892352@qq.com,

MopngoDB 3.6 version has reached EOL in April 2021. As such, we'd like to encourage you to start by asking our community for help by posting on the MongoDB Developer Community Forums.

If the discussion there leads you to suspect a bug in the supported MongoDB server, then we'd want to investigate it as a possible bug here in the SERVER project.

Regards,
Dima

Comment by li jiaman [ 18/Jan/22 ]

mongodb shard集群chunk迁移失败-20220118.txt   pretty style

Generated at Thu Feb 08 05:55:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.