[SERVER-39862] SetShardVersion can return unknown error code while refreshing from config server Created: 27/Feb/19  Updated: 29/Oct/23  Resolved: 11/Sep/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Bug Priority: Major - P3
Reporter: Randolph Tan Assignee: Kevin Pulo
Resolution: Fixed Votes: 0
Labels: sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Minor Change
Operating System: ALL
Sprint: Sharding 2019-08-26, Sharding 2019-09-23
Participants:
Linked BF Score: 24

 Description   

and this can cause the mongos/config server not to retry. Example error from build failure:

2019-02-26T23:03:43.155+0000 E QUERY    [js] Error: drop failed: {
	"ok" : 0,
	"errmsg" : "could not refresh metadata for test.jstests_multikey_geonear with requested shard version 0|0||000000000000000000000000, stored shard version is 1|3||5c75c5c93c4a30e466e75ced :: caused by :: InterruptedAtShutdown: interrupted at shutdown",
	"code" : 8,
	"codeName" : "UnknownError",
 [conn39] end connection 127.0.0.1:40655 (0 connections now open)
	"operationTime" : Timestamp(1551222223, 5),
	"$clusterTime" : {
		"clusterTime" : Timestamp(1551222223, 5),
		"signature" : {
			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
			"keyId" : NumberLong(0)
		}
	}
}

One way to fix this is to change all paths that "return false" to assert with code.



 Comments   
Comment by Githook User [ 11/Sep/19 ]

Author:

{'name': 'Kevin Pulo', 'username': 'devkev', 'email': 'kevin.pulo@mongodb.com'}

Message: SERVER-39862 make setShardVersion pass along error code
Branch: master
https://github.com/mongodb/mongo/commit/307adf151421e9e0570a11367c1b5e997be68638

Comment by Randolph Tan [ 15/Aug/19 ]

kaloian.manassiev In this case the drop command sends the ssv here and return an error for the drop command if the ssv fails.

Comment by Kaloian Manassiev [ 27/Feb/19 ]

I think this is happening because the setShardVersion command doesn't include the code from the returned status here.

I didn't even know we are retrying on setShardVersion though, because that command is invoked from the version manager using the legacy connection code path. Or is the problem that whatever command is running upstream from this is not retrying when SSV fails - because only Map/Reduce should be doing that?

Generated at Thu Feb 08 04:53:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.