HA monitoring does not cover new replica set members

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Minor - P4
    • 2.2.27
    • Affects Version/s: 2.2.25, 2.2.26
    • Component/s: None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      It looks like the HA monitoring in mongodb-core is only performed on the servers that are in the replica set when the driver first connects. Servers that get added to the replica set later are not monitored. I think the monitoring should cover all available servers.

      This caught us by surprise recently when we replaced some members of a replica set, and connections to the new servers kept timing out because there was no monitoring traffic to keep them active.

      The issue can be reproduced with a simple client like the following:

      1. Run the client with the URL of a replica set.
      2. After the client has connected, add another secondary to the replica set.
      3. Check the serverHeartbeatStarted messages to see if the new secondary is being monitored.

      #!/usr/bin/env node
      'use strict';
      
      const MongoClient = require('mongodb').MongoClient;
      
      const SDAM_EVENTS = [
        'serverOpening',
        'serverClosed',
        'serverDescriptionChanged',
        'topologyOpening',
        'topologyClosed',
        'topologyDescriptionChanged',
        'serverHeartbeatStarted',
        'serverHeartbeatSucceeded',
        'serverHeartbeatFailed'
      ];
      
      
      const url = process.argv[2];
      if (!url) {
        console.error('Usage: tiny_mongo_client.js mongodb://url');
        process.exit(1);
      }
      
      (new MongoClient()).connect(url).then((db) => {
        console.info(`Connected to ${url}`);
        SDAM_EVENTS.forEach((name) => {
          db.topology.on(name, (event) => { console.warn(`${new Date()} ${name}`, event); });
        });
      });
      

      With driver versions 2.2.25 and 2.2.26, no serverHeartbeatStarted messages are logged for the new secondary. Instead, the connection times out and is reopened every 30 seconds. I have not tested older versions to compare the behavior.

      I've also seen odd behavior where if you add and remove a secondary, and then add a different secondary, the driver may wait 10 seconds after the timeout before reopening the connection. But I think that should be less of a concern if the connection doesn't time out.

            Assignee:
            Christian Amor Kvalheim
            Reporter:
            Greg Singer [X]
            None
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: