|
In examining this failure from Gregory's patch build, I've now realized it is possible for the moveChunk command to fail as a result of a duplicate key error on the recipient when document shard key values are being updated. This is because receiving a chunk only waits for documents with shard key values in that range to have been cleaned up by the range deleter. If the shard key value for a document was updated after the chunk had been moved off the shard, then it is possible for the range deleter to not yet have cleaned up that document when trying to the range back to the original shard.
[ShardedClusterFixture:job0:shard1:primary] | 2020-04-24T19:58:07.502+00:00 I SHARDING 22080 [MoveChunk] "About to log metadata event","attr":{"namespace":"changelog","event":{"_id":"ip-10-122-65-30:20004-2020-04-24T19:58:07.502+00:00-5ea344cffb7035f4f1ebd055","server":"ip-10-122-65-30:20004","shard":"shard-rs1","clientAddr":"","time":{"$date":"2020-04-24T19:58:07.502Z"},"what":"moveChunk.from","ns":"test0_fsmdb0.fsmcoll0","details":{"min":{"skey":400.0},"max":{"skey":500.0},"step 1 of 6":0,"step 2 of 6":3,"step 3 of 6":12,"step 4 of 6":1998,"step 5 of 6":13,"step 6 of 6":24,"to":"shard-rs0","from":"shard-rs1","note":"success"}}}
|
[ShardedClusterFixture:job0:shard1:primary] | 2020-04-24T19:58:07.612+00:00 I SHARDING 22080 [MoveChunk] "About to log metadata event","attr":{"namespace":"changelog","event":{"_id":"ip-10-122-65-30:20004-2020-04-24T19:58:07.612+00:00-5ea344cffb7035f4f1ebd0e2","server":"ip-10-122-65-30:20004","shard":"shard-rs1","clientAddr":"","time":{"$date":"2020-04-24T19:58:07.612Z"},"what":"moveChunk.from","ns":"test0_fsmdb0.fsmcoll0","details":{"min":{"skey":500.0},"max":{"skey":600.0},"step 1 of 6":0,"step 2 of 6":3,"step 3 of 6":13,"step 4 of 6":6,"step 5 of 6":14,"step 6 of 6":49,"to":"shard-rs0","from":"shard-rs1","note":"success"}}}
|
...
|
[fsm_workload_test:random_moveChunk_update_shard_key] 2020-04-24T19:58:07.624+0000 [jsTest] Running update as a retryable write. The document will move across chunks. Original document values -- id: 450, shardKey: 450, counter: 0
|
[fsm_workload_test:random_moveChunk_update_shard_key] 2020-04-24T19:58:07.624+0000 [jsTest] Intended new document values -- shardKey: 576, counter: 1
|
...
|
[ShardedClusterFixture:job0:shard0:primary] | 2020-04-24T19:58:07.624+00:00 D4 TXN 23984 [conn111] "New transaction started","attr":{"txnNumber":13,"lsid":{"uuid":{"$uuid":"59e19a65-4d33-418c-9328-fecd9f77846d"}}}
|
[ShardedClusterFixture:job0:mongos0] | 2020-04-24T19:58:07.624+00:00 D3 EXECUTOR 22607 [conn40] "Scheduling remote command request","attr":{"request":"RemoteCommand 1452 -- target:[localhost:20001] db:test0_fsmdb0 cmd:{ update: \"fsmcoll0\", bypassDocumentValidation: false, ordered: true, stmtIds: [ 0 ], updates: [ { q: { _id: 450.0, skey: 450.0 }, u: { _id: 450.0, skey: 576.0, counter: 1.0 }, multi: false, upsert: false } ], runtimeConstants: { localNow: new Date(1587758287624), clusterTime: Timestamp(1587758287, 168) }, shardVersion: [ Timestamp(41, 0), ObjectId('5ea344b2fb7035f4f1eba427') ], writeConcern: {}, lsid: { id: UUID(\"59e19a65-4d33-418c-9328-fecd9f77846d\"), uid: BinData(0, E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855) }, txnNumber: 13 }"}
|
[ShardedClusterFixture:job0:mongos0] | 2020-04-24T19:58:07.624+00:00 D2 ASIO 22597 [conn40] "Request finished with response","attr":{"requestId":1452,"isOK":true,"response":"{ n: 1, nModified: 1, opTime: { ts: Timestamp(1587758287, 169), t: 1 }, electionId: ObjectId('7fffffff0000000000000001'), ok: 1.0, readOnly: false, $gleStats: { lastOpTime: { ts: Timestamp(1587758287, 169), t: 1 }, electionId: ObjectId('7fffffff0000000000000001') }, lastCommittedOpTime: Timestamp(1587758287, 165), $configServerState: { opTime: { ts: Timestamp(1587758287, 168), t: 1 } }, $clusterTime: { clusterTime: Timestamp(1587758287, 169), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } }, operationTime: Timestamp(1587758287, 169) }"}
|
...
|
[ShardedClusterFixture:job0:shard0:primary] | 2020-04-24T19:58:07.697+00:00 I SHARDING 22080 [MoveChunk] "About to log metadata event","attr":{"namespace":"changelog","event":{"_id":"ip-10-122-65-30:20001-2020-04-24T19:58:07.697+00:00-5ea344cfbe81891a8941f55e","server":"ip-10-122-65-30:20001","shard":"shard-rs0","clientAddr":"","time":{"$date":"2020-04-24T19:58:07.697Z"},"what":"moveChunk.start","ns":"test0_fsmdb0.fsmcoll0","details":{"min":{"skey":500.0},"max":{"skey":600.0},"from":"shard-rs0","to":"shard-rs1"}}}
|
[ShardedClusterFixture:job0:shard1:primary] | 2020-04-24T19:58:07.717+00:00 I MIGRATE 21999 [chunkInserter] "Batch insertion failed","attr":{"error":"DuplicateKey{ keyPattern: { _id: 1 }, keyValue: { _id: 450.0 } }: Insert of { _id: 450.0, skey: 576.0, counter: 1.0 } failed. :: caused by :: E11000 duplicate key error collection: test0_fsmdb0.fsmcoll0 index: _id_ dup key: { _id: 450.0 }"}
|
[ShardedClusterFixture:job0:shard1:primary] | 2020-04-24T19:58:07.717+00:00 I SHARDING 22080 [migrateThread] "About to log metadata event","attr":{"namespace":"changelog","event":{"_id":"ip-10-122-65-30:20004-2020-04-24T19:58:07.717+00:00-5ea344cffb7035f4f1ebd0ff","server":"ip-10-122-65-30:20004","shard":"shard-rs1","clientAddr":"","time":{"$date":"2020-04-24T19:58:07.717Z"},"what":"moveChunk.to","ns":"test0_fsmdb0.fsmcoll0","details":{"min":{"skey":500.0},"max":{"skey":600.0},"step 1 of 6":4,"step 2 of 6":0,"to":"shard-rs1","from":"shard-rs0","note":"aborted"}}}
|
[ShardedClusterFixture:job0:shard0:primary] | 2020-04-24T19:58:07.719+00:00 I SHARDING 21993 [MoveChunk] "moveChunk data transfer progress","attr":{"response":{"waited":true,"active":false,"ns":"test0_fsmdb0.fsmcoll0","from":"shard-rs0/localhost:20001,localhost:20002,localhost:20003","fromShardId":"shard-rs0","min":{"skey":500.0},"max":{"skey":600.0},"shardKeyPattern":{"skey":1.0},"state":"fail","errmsg":"migrate failed: Location51008: operation was interrupted","counts":{"cloned":0,"clonedBytes":0,"catchup":0,"steady":0},"ok":1.0,"$gleStats":{"lastOpTime":{"ts":{"$timestamp":{"t":1587758285,"i":350}},"t":1},"electionId":{"$oid":"7fffffff0000000000000001"}},"lastCommittedOpTime":{"$timestamp":{"t":1587758287,"i":180}},"$configServerState":{"opTime":{"ts":{"$timestamp":{"t":1587758287,"i":187}},"t":1}},"$clusterTime":{"clusterTime":{"$timestamp":{"t":1587758287,"i":187}},"signature":{"hash":{"$binary":{"base64":"AAAAAAAAAAAAAAAAAAAAAAAAAAA=","subType":"0"}},"keyId":0}},"operationTime":{"$timestamp":{"t":1587758287,"i":180}}},"memoryUsedBytes":0,"docsRemainingToClone":0}
|
[ShardedClusterFixture:job0:shard0:primary] | 2020-04-24T19:58:07.719+00:00 I SHARDING 22080 [MoveChunk] "About to log metadata event","attr":{"namespace":"changelog","event":{"_id":"ip-10-122-65-30:20001-2020-04-24T19:58:07.719+00:00-5ea344cfbe81891a8941f57c","server":"ip-10-122-65-30:20001","shard":"shard-rs0","clientAddr":"","time":{"$date":"2020-04-24T19:58:07.719Z"},"what":"moveChunk.error","ns":"test0_fsmdb0.fsmcoll0","details":{"min":{"skey":500.0},"max":{"skey":600.0},"from":"shard-rs0","to":"shard-rs1"}}}
|
[ShardedClusterFixture:job0:shard0:primary] | 2020-04-24T19:58:07.737+00:00 I SHARDING 22080 [MoveChunk] "About to log metadata event","attr":{"namespace":"changelog","event":{"_id":"ip-10-122-65-30:20001-2020-04-24T19:58:07.737+00:00-5ea344cfbe81891a8941f592","server":"ip-10-122-65-30:20001","shard":"shard-rs0","clientAddr":"","time":{"$date":"2020-04-24T19:58:07.737Z"},"what":"moveChunk.from","ns":"test0_fsmdb0.fsmcoll0","details":{"min":{"skey":500.0},"max":{"skey":600.0},"step 1 of 6":0,"step 2 of 6":2,"step 3 of 6":13,"to":"shard-rs1","from":"shard-rs0","note":"aborted"}}}
|
...
|
[fsm_workload_test:random_moveChunk_update_shard_key] 2020-04-24T19:58:07.764+0000 "errmsg" : "Data transfer error: migrate failed: Location51008: operation was interrupted",
|
[fsm_workload_test:random_moveChunk_update_shard_key] 2020-04-24T19:58:07.764+0000 "codeName" : "OperationFailed",
|
[fsm_workload_test:random_moveChunk_update_shard_key] 2020-04-24T19:58:07.765+0000 "skey" : 500
|
[fsm_workload_test:random_moveChunk_update_shard_key] 2020-04-24T19:58:07.765+0000 "skey" : 600
|
[fsm_workload_test:random_moveChunk_update_shard_key] 2020-04-24T19:58:07.765+0000 "to" : "shard-rs1",
|
[fsm_workload_test:random_moveChunk_update_shard_key] 2020-04-24T19:58:07.765+0000 "_waitForDelete" : true
|
https://logkeeper.mongodb.org/lobster/build/8b5175147eca9145a219027f85c4d202/test/5ea344b2c2ab685b225d24b0#bookmarks=0%2C48508%2C48728%2C48793%2C48794%2C48797%2C48798%2C48804%2C49282%2C49335%2C49340%2C49345%2C49346%2C49388%2C49437%2C49439%2C49452%2C49455%2C49458%2C49459%2C50062&f~=000~59e19a65-4d33-418c-9328-fecd9f77846d&f~=100~%5C%5BMoveChunk%5C%5D.%2AAbout%20to%20log%20metadata%20event&l=1
|