[SERVER-47639] Fix race with async getHosts request and concurrent topology change Created: 17/Apr/20  Updated: 29/Oct/23  Resolved: 23/Jun/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.4.1, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Lamont Nelson Assignee: Cheahuychou Mao
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-48925 Exclude servers with unknown server d... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Steps To Reproduce:

Given an initial topology T0, getHosts request R, and topology change event T that can satisfy R.

0. initial state: T0 cannot satisfy R
1. The RSM receives R concurrent with T.
1a. RSM checks topology and sees T0: https://github.com/mongodb/mongo/blob/23e6f954d7ef5ab73f5540b46c6b3794b7ecfbdc/src/mongo/client/streamable_replica_set_monitor.cpp#L252
2. R is scheduled to be fulfilled async: https://github.com/mongodb/mongo/blob/23e6f954d7ef5ab73f5540b46c6b3794b7ecfbdc/src/mongo/client/streamable_replica_set_monitor.cpp#L257
3. T occurs concurrently with step 2 prior to enqueuing R
4. R is enqueued: https://github.com/mongodb/mongo/blob/23e6f954d7ef5ab73f5540b46c6b3794b7ecfbdc/src/mongo/client/streamable_replica_set_monitor.cpp#L279
5. If no further topology changes occur the query will timeout with FailedToSatisfyReadPreference

Sprint: Service arch 2020-05-04, Service arch 2020-05-18, Sharding 2020-06-29
Participants:
Linked BF Score: 16

 Description   

See the steps in "Steps To Reproduce" below.

A fix for this is to invoke server selection at the time of step 4 with a fresh view of the topology while synchronizing with the TopologyManager.

The topology views (and corresponding topology change events) are serialized, so at step 4 we would either see T0 or T and would block any future topology changes while R is being enqueued.

In the first case (we see T0), server selection would return no result, the request would be enqueued, and the query will be satisfied after T is applied to the topology manager.

In the second case (we see T), the query will be satisfied without enqueuing the request.



 Comments   
Comment by Githook User [ 04/Aug/20 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-47639 Fix race with async getHosts request and concurrent topology change

(cherry picked from commit 36bf915c32d551ad557ec7a1fa41890037e9f54f)
Branch: v4.4
https://github.com/mongodb/mongo/commit/9ffd3632f5fed15e0920fe2fbbff2142676d7537

Comment by Githook User [ 23/Jun/20 ]

Author:

{'name': 'Cheahuychou Mao', 'email': 'cheahuychou.mao@mongodb.com', 'username': 'cheahuychou'}

Message: SERVER-47639 Fix race with async getHosts request and concurrent topology change
Branch: master
https://github.com/mongodb/mongo/commit/36bf915c32d551ad557ec7a1fa41890037e9f54f

Generated at Thu Feb 08 05:14:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.