[CDRIVER-2160] Single-threaded server selection may block for minHeartbeatFrequencyMS unnecessarily Created: 15/May/17  Updated: 28/Oct/23  Resolved: 14/Feb/18

Status: Closed
Project: C Driver
Component/s: libmongoc
Affects Version/s: 1.5.5
Fix Version/s: 1.10.0

Type: Bug Priority: Major - P3
Reporter: Jeremy Mikola Assignee: A. Jesse Jiryu Davis
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

When mongoc_topology_select_server_id() decides to scan the topology in single-threaded mode, it calculates "how soon are we allowed to scan?" by adding minHeartbeatFrequencyMS to the last scan time. This calculation occurs irrespective of whether any nodes will actually be scanned. For example, previous connection failures may result in all nodes being excluded from this scan due to cooldownMS. Ideally, the loop should check if there will be nodes to scan before a possible sleep for minHeartbeatFrequencyMS or call to _mongoc_topology_do_blocking_scan().

Consider the case in mongodb/mongo-php-driver#592. A PHP worker with a newly created libmongoc client attempts to connect to a closed port on an accessible host, which fails immediately and leaves the node in the "Unknown" state. A subsequent requests to that worker will re-use the persisted libmongoc client and block until minHeartbeatFrequencyMS has elapsed since the previous request's failure. In a high-concurrency environment, this means that each request will block for approximately minHeartbeatFrequencyMS instead of failing early, despite any attempts to tweak serverSelectionTimeoutMS (note: connection/socket timeouts aren't relevant here).

The spec's pseudo-code for Scanning Order does not address minHeartbeatFrequencyMS; however, it does refer to aborting early if there would be no nodes to scan, as is the case here.


On a related note, there is logic to abort early with a server selection failure if the calculated time would exceed our server selection timeout, however, this logic does not apply in serverSelectionTryOnce mode and it would not affect the scenario described in the linked PHP ticket.



 Comments   
Comment by Githook User [ 14/Feb/18 ]

Author:

{'email': 'jesse@mongodb.com', 'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis'}

Message: CDRIVER-2160 don't sleep if scanner isn't ready

If we've seen network errors trying to check all servers less than 5
seconds ago, then don't sleep minHeartbeatFrequencyMS (500ms) before
giving up on server selection; give up right away.
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/277496fd3eed0f74c2aab4c4fea2a930309f2c34

Generated at Wed Feb 07 21:14:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.