[SERVER-48504] Combination of StaleShardVersion and ShardNotFound exception make the ChunkManagerTargeter crash Created: 29/May/20  Updated: 29/Oct/23  Resolved: 30/Jul/20

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 4.4.0-rc7
Fix Version/s: 4.7.0, 4.4.2, 4.2.18

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Tommaso Tocci
Resolution: Fixed Votes: 0
Labels: non-blocking, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2
Steps To Reproduce:

The bug has been triggered by the jstests/concurrency/fsm_workloads_add_remove_shards/clusterwide_ops_with_add_drop_shards.js test that is manually dropping the `config.shards` collection of the mongos causing the ShardNotFound error.

Sprint: Sharding 2020-07-13, Sharding 2020-07-27, Sharding 2020-08-10
Participants:
Linked BF Score: 20

 Description   

If a StaleShardVersion is encountered while running a BatchWriteExec::executeBatch, we will call ChunkManagerTargeter::noteStaleShardResponse that will populate the _remoteShardVersions map  with the stale error info.
Then a refresh of the targeter is attempted that will trigger a refresh of the routingInfo , this can throw a ShardNotFound exception (catched here). At this point we will loop again in the executeBatch function and we will call ChunkManagerTargeter::noteCouldNotTarget, now since the _remoteShardVersions map is not empty we will hit this dassert.

 

The bug has been triggered by the jstests/concurrency/fsm_workloads_add_remove_shards/clusterwide_ops_with_add_drop_shards.js test that is manually dropping the `config.shards` collection of the mongos causing the ShardNotFound error. It is only happening in the Enterprise RHEL 6.2 DEBUG Code Coverage and the

{UB}

SAN Enterprise Ubuntu 18.04 DEBUG both in 4.4 and in master.

 

 

 



 Comments   
Comment by Githook User [ 18/Nov/21 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-48504 Combination of StaleShardVersion and ShardNotFound exception make the ChunkManagerTargeter crash
Branch: v4.2
https://github.com/mongodb/mongo/commit/4105bd383b018a7b3e1835767cde75584e5e0bae

Comment by Githook User [ 09/Sep/20 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-48504 Combination of StaleShardVersion and ShardNotFound exception make the ChunkManagerTargeter crash

(cherry picked from commit 3925fa9d2b8c00eca3c63acd442ed4ee0eae2b07)
Branch: v4.4
https://github.com/mongodb/mongo/commit/bb917c502bd71702a71057f9bf7237207cbac4ff

Comment by Tommaso Tocci [ 09/Sep/20 ]

EVG: https://evergreen.mongodb.com/version/5f588fee56234342068eeb20

Comment by Githook User [ 30/Jul/20 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-48504 Combination of StaleShardVersion and ShardNotFound exception make the ChunkManagerTargeter crash
Branch: master
https://github.com/mongodb/mongo/commit/3925fa9d2b8c00eca3c63acd442ed4ee0eae2b07

Generated at Thu Feb 08 05:17:18 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.