[CDRIVER-1219] Bugs in single-threaded selection timeout Created: 02/May/16  Updated: 22/Jul/16  Resolved: 22/Jul/16

Status: Closed
Project: C Driver
Component/s: libmongoc
Affects Version/s: 1.3.0
Fix Version/s: 1.4.0

Type: Bug Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: A. Jesse Jiryu Davis
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified


 Description   

Two bugs with mongoc_topology_select in 1.3.x when serverSelectionTryOnce is turned off.

Bug #1: instead of sleeping a half second between server checks, mongoc_topology_select sleeps half a second longer between each server check until the next sleep would exceed the server selection timeout.

Consider the 1.3.5 code in mongoc_topology_select. "try_once" is false, connectTimeoutMS is 500, socketTimeoutMS is 60,000:

/* call the start time "0" */
loop_start = loop_end = 0
expire_at = 30 seconds
next_update = something negative
topology->stale = true
 
first loop iteration:
scan_ready = something negative
sleep_usec = something negative
spend, let's say, 5 ms in _mongoc_topology_do_blocking_scan
topology->last_scan = now = 5ms
 
second loop iteration:
scan_ready = 505ms
sleep_usec = 505ms
sleep for 505ms
spend 5ms in _mongoc_topology_do_blocking_scan
topology->last_scan = now = 510ms
 
third loop iteration:
scan_ready = 1010ms
sleep_usec = 1010ms
sleep for 1010ms
spend 5ms in _mongoc_topology_do_blocking_scan

Bug #2: when scan_ready has advanced enough to exceed expire_at, the error message should be the standard "serverselectiontimeoutms timed out", not "minheartbeatfrequencyms not reached yet".



 Comments   
Comment by A. Jesse Jiryu Davis [ 22/Jul/16 ]

Recent improvements seem to have fixed this bug; I've reenabled the tests on Windows and run them a dozen times without failure.

Comment by Githook User [ 22/Jul/16 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-1219 reenable timeout tests on windows
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/8364c3009721523243c8cf3d894272bb97186d58

Comment by Githook User [ 14/May/16 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-1219 disable timeout tests on Windows
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/b19cb5f6e000f1c9039a88a4007c7410c249fbc7

Comment by A. Jesse Jiryu Davis [ 14/May/16 ]

The tests added for server selection with a down secondary time out on Windows. Is it a mock server bug or a bug in our timeout implementation on Windows?

Comment by Githook User [ 04/May/16 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-1219 single-thread timeout bugs

If serverSelectionTryOnce is false, selection should time out with
"serverselectiontimeoutms timed out", not "minheartbeatfrequencyms not
reached yet".

Fix timing logic to rescan every 500ms while selection fails.
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/28ff626ff360b81e503b155510412cedea40152c

Comment by Githook User [ 04/May/16 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-1219 test selection with a down secondary
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/1808e2f4d07148921a230181c04e533cd448f242

Comment by Githook User [ 04/May/16 ]

Author:

{u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}

Message: CDRIVER-1219 test selection with a down secondary
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/ba27f22122b368ead126ca0cca3d5a8e0e1f9fe8

Comment by A. Jesse Jiryu Davis [ 03/May/16 ]

A new test of the C Driver only (not involving PHP) does not reproduce the bug. Instead it works as expected: single-threaded selection, whether tryOnce is on or off, takes connectTimeoutMS if a secondary is down, then selects the primary.

For the next 5 seconds the secondary is in cooldown, so single-threaded selection is instant, then the topology scanner will try the secondary again until connectTimeoutMS expires. It seems the driver has correctly implemented the spec.

I now suspect the PHP driver's stream initiator prevents the topology scanner from properly implementing the spec: instead of applying connectTimeoutMS when it's doing its initial scan, it blocks the whole socketTimeoutMS before giving up on the secondary.

Generated at Wed Feb 07 21:11:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.