[SERVER-56950] Avoid shardRegistry reload infinite loop when overlapping with setFCV Created: 14/May/21  Updated: 29/Oct/23  Resolved: 26/May/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 5.0.0, 4.9.0
Fix Version/s: 5.0.0-rc1, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Pierlauro Sciarelli Assignee: Simon Gratzer (Inactive)
Resolution: Fixed Votes: 0
Labels: post-rc0, sharding-wfbf-day
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-57017 Enable sharded DDL plus FCV FSM in st... Closed
Problem/Incident
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0
Steps To Reproduce:

1.  ShardRegistry::_periodicReload causes a reload to occur. ShardRegistry::_getDataAsync advances the ReadThroughCache's timeInStore to some t1 with non-zero topologyTime. ReadThroughCache::acquireAsync creates an inProgressLookup with t1, and add a promise for it to inProgressLookup._outstanding.

2. ShardRegistry::_lookup starts running, and meanwhile the test runs setFCV from 4.9 to 4.4.

3. ShardRegistryData::createFromCatalogClient returns. useActualTopologyTime() is false so it returns the cached data's topology time (i.e. Timestamp(0,0) since this is the first reload) as result.t. 

4. Inside ReadThroughCache::_doLookupWhileNotValid, inProgressLookup.getPromisesLessThanTime returns nothing because the first promise in _outstanding is the promise for t1 which has non-zero topologyTime (i.e. t1 > result.t) so the for loop breaks early here.

5. The promisesToSet is empty so mustDoAnotherLoop is true. The _inProgressLookup for t1 remains in the cache, and another round of lookup starts, again no promises can be fulfilled because of 4.

6. Future reloads join this infinitely looping inProgressLookup. (That's why in the hang analyzer output, there are multiple mongo::ShardRegistry::_periodicReload threads).

Sprint: Sharding EMEA 2021-05-31
Participants:
Linked BF Score: 170

 Description   

When setFCV(v4.4) overlaps with a ShardRegistry reload - right after the useActualTopologyTime check - the ShardRegistry can fall into an infinite loop of lookups because the topology time is not gossiped after the setFCV succeeds.

Purpose of this ticket is to avoid this overlap to result in a livelock.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 01/Jun/21 ]

Author:

{'name': 'Simon Gratzer', 'email': 'simon.gratzer@mongodb.com'}

Message: SERVER-56950 Avoid shardRegistry reload infinite loop when overlapping with setFCV (BACKPORT-9211)
Branch: v5.0
https://github.com/mongodb/mongo/commit/037968db481282012b18b1873c7f72f14b9da48d

Comment by Githook User [ 26/May/21 ]

Author:

{'name': 'Simon Gratzer', 'email': 'simon.gratzer@mongodb.com'}

Message: SERVER-56950 Avoid shardRegistry reload infinite loop when overlapping with setFCV

This reverts commit c6ebe28e7ed60bdb8675204144bbb765891a4ca2.
Branch: master
https://github.com/mongodb/mongo/commit/56143e2127e63e5f949986f4e845dd0ff18a6b92

Comment by Githook User [ 25/May/21 ]

Author:

{'name': 'Max Hirschhorn', 'email': 'max.hirschhorn@mongodb.com', 'username': 'visemet'}

Message: Revert "SERVER-56950 Avoid shardRegistry reload infinite loop when overlapping with setFCV"

This reverts commit 5ffdb69a0d691549c0d6cd780c2d8be238e588a6.
Branch: master
https://github.com/mongodb/mongo/commit/c6ebe28e7ed60bdb8675204144bbb765891a4ca2

Comment by Githook User [ 21/May/21 ]

Author:

{'name': 'Simon Gratzer', 'email': 'simon.gratzer@mongodb.com'}

Message: SERVER-56950 Avoid shardRegistry reload infinite loop when overlapping with setFCV
Branch: master
https://github.com/mongodb/mongo/commit/5ffdb69a0d691549c0d6cd780c2d8be238e588a6

Comment by Simon Gratzer (Inactive) [ 21/May/21 ]

https://mongodbcr.appspot.com/778750001/

Generated at Thu Feb 08 05:40:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.