[SERVER-21156] Catalog manager operations should retry talking to the config server on notMaster or network errors Created: 27/Oct/15  Updated: 11/Nov/15  Resolved: 11/Nov/15

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Sprint: Sharding C (11/20/15)
Participants:

 Description   

Uncovered through randomized CSRS config server replica set primary step down testing. The catalog manager operations do not retry on connection abort due to step down and fail with cryptic error message:

[js_test:add_invalid_shard] 2015-10-27T09:35:53.312-0400 s20014| 2015-10-27T09:35:53.313-0400 I SHARDING [conn1] going to add shard: { _id: "dummyRS", host: "testReplSet/kaloianmdesktop:20015,kaloianmdesktop:20016" }
[js_test:add_invalid_shard] 2015-10-27T09:35:53.312-0400 s20014| 2015-10-27T09:35:53.313-0400 I SHARDING [conn1] error adding shard: { _id: "dummyRS", host: "testReplSet/kaloianmdesktop:20015,kaloianmdesktop:20016" } err: An established connection was aborted by the software in your host machine.
[js_test:add_invalid_shard] 2015-10-27T09:35:53.313-0400 s20014| 2015-10-27T09:35:53.313-0400 I COMMAND  [conn1] addShard request '{ addshard: "testReplSet/kaloianmdesktop:20015,kaloianmdesktop:20016", name: "dummyRS" }' failed: An established connection was aborted by the software in your host machine.
[js_test:add_invalid_shard] 2015-10-27T09:35:53.313-0400 assert: command failed: {
[js_test:add_invalid_shard] 2015-10-27T09:35:53.313-0400        "ok" : 0,
[js_test:add_invalid_shard] 2015-10-27T09:35:53.313-0400        "errmsg" : "An established connection was aborted by the software in your host machine.",
[js_test:add_invalid_shard] 2015-10-27T09:35:53.313-0400        "code" : 6
[js_test:add_invalid_shard] 2015-10-27T09:35:53.313-0400 } : undefined
[js_test:add_invalid_shard] 2015-10-27T09:35:53.313-0400 _getErrorWithCode@src/mongo/shell/utils.js:23:13
[js_test:add_invalid_shard] 2015-10-27T09:35:53.315-0400 doassert@src/mongo/shell/assert.js:13:14
[js_test:add_invalid_shard] 2015-10-27T09:35:53.315-0400 assert.commandWorked@src/mongo/shell/assert.js:259:5
[js_test:add_invalid_shard] 2015-10-27T09:35:53.315-0400 @jstests\sharding\add_invalid_shard.js:27:1
[js_test:add_invalid_shard] 2015-10-27T09:35:53.315-0400 @jstests\sharding\add_invalid_shard.js:4:2



 Comments   
Comment by Kaloian Manassiev [ 11/Nov/15 ]

Closing this ticket since it has been split into multiple tickets for each of the operations which need to be retried:

Generated at Thu Feb 08 03:56:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.