[CDRIVER-3636] maxWireVersion check for MaxStalenessSeconds does not exclude PossiblePrimary Created: 27/Apr/20  Updated: 28/Oct/23  Resolved: 13/Sep/21

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: 1.16.2
Fix Version/s: 1.19.1

Type: Bug Priority: Major - P3
Reporter: Anton Neznaienko Assignee: Kevin Albertson
Resolution: Fixed Votes: 1
Labels: post-5.0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

php-cli 7.4.5

mongodb

MongoDB support => enabled
MongoDB extension version => 1.7.4
MongoDB extension stability => stable
libbson bundled version => 1.16.2
libmongoc bundled version => 1.16.2
libmongoc SSL => enabled
libmongoc SSL library => Secure Transport
libmongoc crypto => enabled
libmongoc crypto library => Common Crypto
libmongoc crypto system profile => disabled
libmongoc SASL => enabled
libmongoc ICU => disabled
libmongoc compression => enabled
libmongoc compression snappy => enabled
libmongoc compression zlib => enabled
libmongocrypt bundled version => 1.0.3
libmongocrypt crypto => enabled
libmongocrypt crypto library => Common Crypto

Directive => Local Value => Master Value
mongodb.debug => no value => no value


Case:

 Description   

Summary
The Max Staleness specification requires drivers check wire version for support of maxStalenessSeconds:

Clients are required to throw an error if maxStalenessSeconds is set, and any server in the topology has maxWireVersion less than 5.

Servers of type PossiblePrimary must not be considered in this check, since they have no known maxWireVersion.

libmongoc does not exclude PossiblePrimary in mongoc_topology_description_lowest_max_wire_version. This may result in incorrect errors when MaxStalenessSeconds is specified in the read preference. If there is a PossiblePrimary server in the topology description, server selection will fail until the server is rediscovered.

Original report

Recently we've added maxStalenessSeconds option to PHP Driver configuration and noticed occasional erroneous log messages.

Not all servers support maxStalenessSeconds message is displayed in server logs.
All replica set members are 4.2.5 and that should not happen.
Before Not all servers support maxStalenessSeconds message we usually get a few messages like No suitable servers found (`serverSelectionTryOnce` set): [connection refused calling ismaster on 'xxx'] and No servers yet eligible for rescan.
We suppose that previous messages are caused by our 100ms timeout, so they are expected to happen from time to time under load.

But Not all servers support maxStalenessSeconds should not happen IMHO.
Digging down to code, I've prepared a reproduction script (it's in PHP, but the problem lies in libmongoc, as I understand).

<?php
 
use MongoDB\Driver\Query;
use MongoDB\Driver\ReadPreference;
 
ini_set('mongodb.debug', '/var/tmp');
 
$manager = new MongoDB\Driver\Manager('mongodb://localhost:27100/?replicaSet=replset&compressors=snappy&tls=false&socketTimeoutMS=2&connectTimeoutMS=2&serverSelectionTimeoutMS=2&heartbeatFrequencyMS=500');
$readPreference = new ReadPreference(ReadPreference::RP_SECONDARY, null, ['maxStalenessSeconds' => 90]);
$query = new Query(['k' => 'v']);
 
while (true) {
    try {
        $doc = $manager->executeQuery('delay.coll', $query, ['readPreference' => $readPreference]);
    } catch (\Exception $e) {
        if ($e->getMessage() === 'Not all servers support maxStalenessSeconds') {
            echo "got it!\n";
            sleep(86400);
        }
    }
}

Running a few instances of this script under debugger I was able to stop at bp set at https://github.com/mongodb/mongo-c-driver/blob/1.16.2/src/libmongoc/src/mongoc/mongoc-topology.c#L655

Here is a lldb dump out of there:

n734:mongodb-driver $ lldb php
(lldb) target create "php"
Current executable set to 'php' (x86_64).
(lldb) b mongoc-topology.c:655
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) r delay.php
Process 94260 launched: '/usr/local/bin/php' (x86_64)
1 location added to breakpoint 1
 
mongodb.so was compiled with optimization - stepping may behave oddly; variables may not be available.
Process 94260 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000105a5b7c5 mongodb.so`mongoc_topology_compatible(td=0x0000000105032000, read_prefs=<unavailable>, error=0x00007ffeefbfe390) at mongoc-topology.c:655:10 [opt]
   652           mongoc_topology_description_lowest_max_wire_version (td);
   653  
   654        if (max_wire_version < WIRE_VERSION_MAX_STALENESS) {
-> 655           bson_set_error (error,
   656                           MONGOC_ERROR_COMMAND,
   657                           MONGOC_ERROR_PROTOCOL_BAD_WIRE_VERSION,
   658                           "Not all servers support maxStalenessSeconds");
Target 0: (php) stopped.
(lldb) bt all
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
  * frame #0: 0x0000000105a5b7c5 mongodb.so`mongoc_topology_compatible(td=0x0000000105032000, read_prefs=<unavailable>, error=0x00007ffeefbfe390) at mongoc-topology.c:655:10 [opt]
    frame #1: 0x0000000105a5ba52 mongodb.so`mongoc_topology_select_server_id(topology=0x0000000105032000, optype=MONGOC_SS_READ, read_prefs=0x0000000103cd95d0, error=0x00007ffeefbfe390) at mongoc-topology.c:859:15 [opt]
    frame #2: 0x0000000105a5b826 mongodb.so`mongoc_topology_select(topology=0x0000000105032000, optype=<unavailable>, read_prefs=<unavailable>, error=0x00007ffeefbfe390) at mongoc-topology.c:737:7 [opt]
    frame #3: 0x0000000105a22549 mongodb.so`mongoc_client_select_server(client=0x0000000103cdae00, for_writes=false, prefs=0x0000000103cd95d0, error=0x00007ffeefbfe390) at mongoc-client.c:2758:9 [opt]
    frame #4: 0x0000000105a92c9f mongodb.so`php_phongo_manager_select_server(for_writes=false, zreadPreference=0x000000010585fa40, zsession=<unavailable>, client=0x0000000103cdae00, server_id=0x00007ffeefbfe60c) at Manager.c:295:20 [opt]
    frame #5: 0x0000000105a925c6 mongodb.so`zim_Manager_executeQuery(execute_data=<unavailable>, return_value=0x0000000105817170) at Manager.c:573:7 [opt]
    frame #6: 0x0000000103dd51b1 xdebug.so`xdebug_execute_internal + 465
    frame #7: 0x00000001003d748a php`ZEND_DO_FCALL_SPEC_RETVAL_USED_HANDLER + 380
    frame #8: 0x00000001003bbafb php`execute_ex + 35
    frame #9: 0x0000000103dd4f54 xdebug.so`xdebug_execute_ex + 1044
    frame #10: 0x00000001003bbcb3 php`zend_execute + 352
    frame #11: 0x000000010037eb66 php`zend_execute_scripts + 277
    frame #12: 0x00000001003272e5 php`php_execute_script + 482
    frame #13: 0x000000010040543e php`do_cli + 3833
    frame #14: 0x00000001004043a3 php`main + 1211
    frame #15: 0x00007fff68bfbcc9 libdyld.dylib`start + 1
(lldb) fr v
(const mongoc_topology_description_t *) td = 0x0000000105032000
(const mongoc_read_prefs_t *) read_prefs = <variable not available>
 
(bson_error_t *) error = 0x00007ffeefbfe390
(int64_t) max_staleness_seconds = 90
(int32_t) max_wire_version = 0

As you see max_wire_version is zero. I think that despite mongoc_topology_description_lowest_max_wire_version doc says that owning toplogy mutex should be held, neither mongoc_topology_compatible nor https://github.com/mongodb/mongo-c-driver/blob/1.16.2/src/libmongoc/src/mongoc/mongoc-topology.c#L859 holds mutex.



 Comments   
Comment by Githook User [ 13/Sep/21 ]

Author:

{'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com', 'username': 'kevinAlbs'}

Message: CDRIVER-3636 exclude PossiblePrimary from MSS check (#860)

Do not check PossiblePrimary servers in the maxWireVersion check
for topology support of maxStalenessSeconds
Branch: r1.19
https://github.com/mongodb/mongo-c-driver/commit/7e3420006a4112e5c56720aa9f518a0d0ea57f98

Comment by Githook User [ 13/Sep/21 ]

Author:

{'name': 'Kevin Albertson', 'email': 'kevin.albertson@mongodb.com', 'username': 'kevinAlbs'}

Message: CDRIVER-3636 exclude PossiblePrimary from MSS check (#860)

Do not check PossiblePrimary servers in the maxWireVersion check
for topology support of maxStalenessSeconds
Branch: master
https://github.com/mongodb/mongo-c-driver/commit/0024b5dfc52b53725a2cfff5faff592dc3a6e611

Comment by Anton Neznaienko [ 08/Sep/21 ]

Thanks Kevin,

Will try as soon as it's released!

Comment by Kevin Albertson [ 04/Sep/21 ]

Hi neanton@gmail.com, thank you again for the detailed repro! Modeling that example, I was able to reproduce in the C driver. This appears to be a bug in the maxWireVersion check for MaxStalenessSeconds. A fix should be included in the next patch release, 1.19.1.

Comment by Kevin Albertson [ 24/Feb/21 ]

Hi neanton@gmail.com! Thank you for the detailed report and repro, and for verifying this with the latest PHP extension. We will look into this soon.

Comment by Anton Neznaienko [ 24/Feb/21 ]

This issue is still present with the latest mongodb php extension:

 

mongodb
 
MongoDB support => enabled
MongoDB extension version => 1.9.0
MongoDB extension stability => stable
libbson bundled version => 1.17.2
libmongoc bundled version => 1.17.2
libmongoc SSL => enabled
libmongoc SSL library => OpenSSL
libmongoc crypto => enabled
libmongoc crypto library => libcrypto
libmongoc crypto system profile => disabled
libmongoc SASL => disabled
libmongoc ICU => disabled
libmongoc compression => enabled
libmongoc compression snappy => enabled
libmongoc compression zlib => enabled
libmongoc compression zstd => disabled
libmongocrypt bundled version => 1.0.4
libmongocrypt crypto => enabled
libmongocrypt crypto library => libcrypto
 
Directive => Local Value => Master Value
mongodb.debug => no value => no value

Generated at Wed Feb 07 21:18:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.