[SERVER-12587] Better error responses for errors in the destination shard Created: 03/Feb/14  Updated: 06/Dec/22  Resolved: 20/Dec/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.5.5
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Randolph Tan Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Sharding
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

For example currently moveChunk response on a _recvChunkCommit looks like this:

{
	"cause" : {
		"cause" : {
			"active" : true,
			"ns" : "test.foo",
			"from" : "localhost:30000",
			"min" : {
				"_id" : { "$minKey" : 1 }
			},
			"max" : {
				"_id" : { "$maxKey" : 1 }
			},
			"shardKeyPattern" : {
				"_id" : 1
			},
			"state" : "fail",
			"errmsg" : "",
			"counts" : {
				"cloned" : NumberLong(41847),
				"clonedBytes" : NumberLong(420520503),
				"catchup" : NumberLong(0),
				"steady" : NumberLong(0)
			},
			"ok" : 0
		},
		"ok" : 0,
		"errmsg" : "_recvChunkCommit failed!"
	},
	"ok" : 0,
	"errmsg" : "move failed"
}

Note that cause.cause.errmsg is empty. And for this particular example, _recvChunkCommit timedout and the cause.cause field is populated from the _recvChunkCommit response.

The same also applies for any errors that occurred on the migrate thread that aborts the migration - the "to" shard would usually realize something went bad through _recvChunkStatus, but it does not contain the information on why it went bad. The only way to figure it out what went wrong currently is to check the config.changelog or the logs.



 Comments   
Comment by Gregory McKeon (Inactive) [ 20/Dec/18 ]

We've improved this error message significantly in 3.6.

Generated at Thu Feb 08 03:28:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.